Skip to content

Adopt OTel cicd.* semantic conventions incl cicd.* metrics#1096

Draft
cyrille-leclerc wants to merge 15 commits into
mainfrom
add-otel-cicd-metrics
Draft

Adopt OTel cicd.* semantic conventions incl cicd.* metrics#1096
cyrille-leclerc wants to merge 15 commits into
mainfrom
add-otel-cicd-metrics

Conversation

@cyrille-leclerc
Copy link
Copy Markdown
Contributor

@cyrille-leclerc cyrille-leclerc commented Apr 21, 2025

Add OTel cicd.* semantic conventions including cicd.* metrics.

https://opentelemetry.io/docs/specs/semconv/cicd/cicd-metrics/

TODO

  • Boolean toggle to disable generation of older Jenkins metrics that are replaced by the cicd.* metrics.

Testing done

Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

Lessons learned adopting the OTel CI/CD specs

cicd.pipeline.run.duration

  • What are the histogram buckets? We picked 1D, 2D, 4D, 8D, 16D, 32D, 64D, 128D, 256D, 512D, 1024D, 2048D, 4096D, 8192D seconds
  • metric attributes
    • cicd.pipeline.name:
      • to protect against metric cardinality explosion, we have allow & deny lists of names reported pipeline duration. Other pipelines are grouped under cicd.pipeline.name=#other#
      • In Jenkins, we have an "id" called "name" and a "display name", we use the "name" for this attribute.
    • cicd.pipeline.run.state: as we record the metric at the end of the execution of the pipeline run, we always have cicd.pipeline.run.duration= finalizing and the real value should be finalized
    • error.type: not yet implemented

cicd.pipeline.run.active

  • Metric attributes
    • cicd.pipeline.name:
      • Cardinality protection with allow & deny lists defaulting to cicd.pipeline.name=#other#
      • Is it justified to capture the granularity of this metric per pipeline name?
    • cicd.pipeline.run.state: nothing special to say

cicd.worker.count

  • OTel specs say it's an UpDownCounter but implementation in Jenkins is more of a Gauge, a cumulative value captured by a daemon thread.
  • Jenkins also has the granularity of tagging build agents by labels (eg macos, gpu, a-special-of a build-agent...) and we also report executor status per label. It's not covered by the OTel CI/CD specs.
  • Metric attributes
    • cicd.worker.state
      • The specs give the example of down but the generated java code is offline

cicd.pipeline.run.errors

  • Metric attributes
    • cicd.pipeline.name
      • Cardinality protection with allow & deny lists defaulting to cicd.pipeline.name=#other#
      • Is it justified to capture the granularity of this metric per pipeline name?
    • error.type: basic instrumentation: just the Jenkins statuses unstable and failure

cicd.system.errors

  • Note implemented yet, we don't differentiate well pipeline errors versus CI/CD system errors.

Other metrics

We need other metrics to report on jenkins health, particularly on build nodes which are holding the build workers

Span attributes

TODO

@cyrille-leclerc cyrille-leclerc changed the title Add OTel cicd.* metrics Adopt OTel cicd.* semantic conventions incl cicd.* metrics Apr 22, 2025
@kuisathaverat kuisathaverat added the enhancement New feature or request label Apr 22, 2025
@kuisathaverat
Copy link
Copy Markdown
Contributor

kuisathaverat commented Apr 24, 2025

About the cardinality protection with allow & deny lists defaulting to cicd.pipeline.name=#other#, Would it be optional? I mean, if I want to grab all my jobs because I do not think the cardinality will be a problem in my case, but I do not want to add my hundred jobs to that allow list, I have an option to choose so that all job names are reported.
Another option that may make more sense is to create groups of jobs, so for example, we group the jobs by team or project or other category I choose.

@cyrille-leclerc
Copy link
Copy Markdown
Contributor Author

cyrille-leclerc commented Apr 24, 2025

About the cardinality protection with allow & deny lists defaulting to cicd.pipeline.name=#other#, Would it be optional?

We have implemented with a regular expression. It would be easy to provide a GUI to allow allo pipeline names. Current config to create metrics for all metrics:

otel.instrumentation.jenkins.run.metric.duration.allow_list=.*

# Conflicts:
#	src/main/java/io/jenkins/plugins/opentelemetry/init/JenkinsExecutorMonitoringInitializer.java
# Conflicts:
#	src/main/java/io/jenkins/plugins/opentelemetry/init/JenkinsExecutorMonitoringInitializer.java
#	src/main/java/io/jenkins/plugins/opentelemetry/job/MonitoringRunListener.java
#	src/main/java/io/jenkins/plugins/opentelemetry/job/runhandler/DefaultRunHandler.java
#	src/main/java/io/jenkins/plugins/opentelemetry/job/runhandler/JobDslRunHandler.java
#	src/main/java/io/jenkins/plugins/opentelemetry/job/runhandler/MatrixRunHandler.java
#	src/main/java/io/jenkins/plugins/opentelemetry/semconv/ExtendedJenkinsAttributes.java
@kamphaus
Copy link
Copy Markdown

kamphaus commented Jan 6, 2026

Anything I can help with?

@cyrille-leclerc
Copy link
Copy Markdown
Contributor Author

With pleasure.
From what I remember, what's remaining is:

  • Integrate a feature toggle, probably as part of the OTEL_SEMCONV_STABILITY_OPT_IN env var, to enable the old metrics for backward compatibility, the new one to comply with semconv, or both for migration. A challenge is that we our logic should probably be slightly different from existing implementations as we want to be on the old metrics or the mixed mode by default.
  • Documentation

I'll be happy to do a knowledge transfer on a video call to help

@ArpanC6
Copy link
Copy Markdown

ArpanC6 commented Mar 19, 2026

Hi @cyrille-leclerc, thank you for the detailed notes on adopting
the OTel CI/CD semantic conventions.

I noticed this PR is out-of-date with main and @kamphaus has built
on top of it in PR #1251. A few observations after reviewing both:

  1. Cardinality protection — The regex-based allow/deny list
    approach with otel.instrumentation.jenkins.run.metric.duration.allow_list=.*
    as an opt-out is elegant. kuisathaverat's suggestion of grouping
    pipelines by category (team/project) could be a useful enhancement
    on top of this.

  2. cicd.worker.count as UpDownCounter vs Gauge — This is an
    interesting tension. Since Jenkins reports executor state via a
    daemon thread polling model rather than event-driven increments/decrements,
    using a Gauge-style ObservableUpDownCounter seems like the right
    pragmatic choice even if the spec says UpDownCounter.

  3. cicd.pipeline.run.state — The observation that we always
    record finalizing at metric emission time rather than finalized
    is worth documenting clearly for users.

I would be happy to help with documentation or testing once
the feature toggle implementation is complete.

@kamphaus
Copy link
Copy Markdown

cicd.worker.count as UpDownCounter vs Gauge ... ObservableUpDownCounter seems like the right
pragmatic choice even if the spec says UpDownCounter

The semantic conventions do not make a distinction between UpDownCounter and ObservableUpDownCounter so cicd.worker.count implemented as ObservableUpDownCounter is compliant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants