Skip to content

fix: do not fail builds when build-step span purge cannot find span#1241

Open
Flamki wants to merge 3 commits into
jenkinsci:mainfrom
Flamki:fix-994-nonfatal-missing-buildstep-span
Open

fix: do not fail builds when build-step span purge cannot find span#1241
Flamki wants to merge 3 commits into
jenkinsci:mainfrom
Flamki:fix-994-nonfatal-missing-buildstep-span

Conversation

@Flamki
Copy link
Copy Markdown

@Flamki Flamki commented Feb 23, 2026

What problem does this solve?

Fixes build failures reported in #994 when OpenTelemetry cannot find a build-step span to purge (for example with promoted-builds/freestyle execution paths).

Before this change, OtelTraceService.removeBuildStepSpan(...) threw IllegalStateException unconditionally when the span action was missing. That exception propagates from MonitoringBuildStepListener.finished(...) and can mark the build as failed.

What changed?

  • Updated OtelTraceService.removeBuildStepSpan(...) to:
    • treat invalid spans as recoverable,
    • log a warning instead of throwing when STRICT_MODE is disabled (default runtime behavior),
    • preserve strict fail-fast behavior when STRICT_MODE is enabled (test/debug mode).
  • Reused the same invalid-span rule for pipeline span purge path for consistency.
  • Added regression tests in OtelTraceServiceTest for:
    • missing valid span + strict mode OFF => no throw,
    • missing valid span + strict mode ON => throws,
    • missing invalid span + strict mode ON => no throw.

How to test

./mvnw spotless:check
./mvnw -Dtest=OtelTraceServiceTest test

Both commands pass locally.

Breaking changes

None.

Notes

Wrap queue item collection in SYSTEM ACL for metrics callback so restricted anonymous contexts do not force jenkins.queue.count statuses to 0.

Add a regression test for issue jenkinsci#1174 that verifies anonymous queue visibility is empty while metrics collection sees queued items.
@ArpanC6
Copy link
Copy Markdown

ArpanC6 commented Mar 18, 2026

Great fix @Flamki! This addresses a real pain point for users running
promoted-builds or freestyle jobs with OpenTelemetry.

A few observations after reviewing the changes:

  1. STRICT_MODE approach — The dual behavior (warn vs throw) based on
    STRICT_MODE is a clean pattern. It preserves backward compatibility for
    production while keeping strict validation available for testing/debugging.

  2. Test coverage — The 3 regression tests cover the key scenarios well:

    • Missing span + STRICT_MODE OFF → no throw
    • Missing span + STRICT_MODE ON → throws
    • Invalid span + STRICT_MODE ON → no throw
  3. Code Coverage improvement — +8.10% line coverage and +6.48% branch
    coverage is a significant improvement alongside the fix.

  4. One suggestion — Would it be worth adding a log message at FINE
    level (in addition to the WARNING) when a span is not found in non-strict
    mode? This could help with debugging in production without being too noisy.

All CI checks passing. This looks ready for review!

@Flamki
Copy link
Copy Markdown
Author

Flamki commented Mar 19, 2026

Thanks for the suggestion and review.

I’ve implemented it in commit f1bc1dc: in non-strict mode, we now log at FINE in addition to the existing WARNING when a span cannot be found during purge. I applied this in both relevant purge paths in OtelTraceService, with no behavioral changes beyond logging.

I also re-ran ./mvnw "-Dtest=OtelTraceServiceTest" test and it passed.

@ArpanC6
Copy link
Copy Markdown

ArpanC6 commented Mar 19, 2026

Great update @Flamki! Adding the FINE-level log alongside the existing
WARNING is exactly the right approach — it gives operators
debug-level visibility without adding noise in production.
Tests passing too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants