-
Notifications
You must be signed in to change notification settings - Fork 203
Add more useful metrics to otel collector #1546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
| enabled: false | ||
| system.cpu.utilization: | ||
| enabled: false | ||
| # default, only need to be disabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you keep it explicit? Or is there a reason why not to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that, it just seemed arbitrary which series we redundantly defined and which ones we omitted. mind if I add all of them explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it can be harder to maintain, but I don’t think the host metrics change often.
Having them explicitly defined makes reviews simpler, and it also helps anyone browsing the repo understand exactly which metrics we expose without needing to check the OTEL docs or deploy the observability stack.
It also makes our intent clear—what we’ve chosen to enable or disable—so if new metrics appear later, it’s obvious that it isn't on intentionally and it's safe to turn off.
# Conflicts: # iac/provider-gcp/.terraform.lock.hcl
This roughly triples the number of series we collect wrt to hostmetrics, or an increase of ~30% of my whole dev cluster. It provides lots of useful information that's worth tracking though.
Note
Expands otel-collector hostmetrics (CPU/load/filesystem/memory/network/processes), adds CPU time aggregation, refines filters/labels, and removes an unused feature gate.
iac/provider-gcp/nomad/configs/otel-collector.yaml:metricstransform/single_cputo aggregatesystem.cpu.timebynode.idandstate; include inmetrics/hostpipeline.filter/drop_by_deviceto regex-match allsystem.network.*on veth/docker/lo and drop all loop-device metrics; strip filesystem labels via explicit list.otelcol.*from OTLP include set.iac/provider-gcp/nomad/jobs/otel-collector.hcl:--feature-gates=pkg.translator.prometheus.NormalizeNamearg..editorconfigfor YAML (2-space, spaces).Written by Cursor Bugbot for commit c93a45a. This will update automatically on new commits. Configure here.