Skip to content

fix: reduce metrics aggregator cardinality#4183

Closed
MasterPtato wants to merge 1 commit intomainfrom
02-11-fix_reduce_metrics_aggregator_cardinality
Closed

fix: reduce metrics aggregator cardinality#4183
MasterPtato wants to merge 1 commit intomainfrom
02-11-fix_reduce_metrics_aggregator_cardinality

Conversation

@MasterPtato
Copy link
Contributor

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4183 February 12, 2026 01:16 Destroyed
@railway-app
Copy link

railway-app bot commented Feb 12, 2026

🚅 Deployed to the rivet-pr-4183 environment in rivet-frontend

Service Status Web Updated (UTC)
ladle ❌ Build Failed (View Logs) Web Feb 19, 2026 at 10:54 pm
website 😴 Sleeping (View Logs) Web Feb 17, 2026 at 6:36 am
frontend-cloud ❌ Build Failed (View Logs) Web Feb 12, 2026 at 1:18 am
frontend-inspector ❌ Build Failed (View Logs) Web Feb 12, 2026 at 1:18 am
mcp-hub ✅ Success (View Logs) Web Feb 12, 2026 at 1:17 am

Copy link
Contributor Author

MasterPtato commented Feb 12, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Feb 12, 2026

PR Review: fix: reduce metrics aggregator cardinality

Summary

This PR reduces Prometheus metrics cardinality through three areas: fixing Grafana dashboard label queries, adding tracing instrumentation, and skipping zero-value metric label registrations. The changes are well-targeted and correct.


Grafana label_values() Queries (7 dashboards)

Change: label_values(rivet_project)label_values(up, rivet_project)

This is the right fix. Calling label_values() without a metric selector causes Prometheus to scan all time series for that label, which is the primary driver of cardinality issues in label-value queries. Anchoring to up limits the scan to a single well-known metric.

Potential concern: This assumes up carries the rivet_project and rivet_datacenter labels. This is valid if these labels are set at the scrape level (e.g., via Prometheus relabeling rules or static_configs), but if they're added at the application level (per-metric label registration), up won't have them and the dropdowns will be empty. Worth confirming the label source if not already verified.


Metrics Aggregator (metrics_aggregator.rs)

Change: Added if desired_slots != 0 guard before SERVERLESS_DESIRED_SLOTS.with_label_values(...).add(...)

This is correct. Since SERVERLESS_DESIRED_SLOTS.reset() is called before the scan loop (line 212), all previously registered label combinations are cleared. Without the guard, iterating over every namespace/runner combination in the DB would create a new time series for each one, even those with zero slots. The guard prevents that cardinality inflation.

One minor note: the guard only prevents cardinality for the active aggregation cycle. If a namespace/runner had non-zero desired slots in a previous cycle, was reset, and now has zero, the metric will simply not appear in this cycle, which is the correct and expected behavior.


Tracing Annotations

Change: #[tracing::instrument(skip_all)] added to handle_request in api_public.rs and both handle_request/handle_websocket in pegboard-runner/src/lib.rs

These are reasonable observability improvements. skip_all is the right choice here to avoid accidentally logging large request bodies or sensitive headers. Note that these changes are somewhat outside the stated scope of "reduce metrics aggregator cardinality" — consider whether they belong in a separate commit for cleaner history.


Code Formatting in Commented-Out Code

The formatting changes to the commented-out aggregate_active_actors function are cosmetic only. The logic change (adding if active_actors != 0 guard) is consistent with the desired_slots != 0 fix applied to the active code path.


No Issues Found

  • No security concerns
  • No bugs introduced
  • No backwards-compatibility issues
  • Behavior is correct: zero-value metrics are correctly suppressed post-reset

Overall this is a clean, well-scoped fix. The only follow-up to verify is whether up has the custom labels in the Prometheus scrape configuration.

@MasterPtato MasterPtato force-pushed the 02-11-fix_reduce_metrics_aggregator_cardinality branch from 293011a to aa559af Compare February 18, 2026 02:13
@MasterPtato MasterPtato force-pushed the 02-11-fix_docker_cache_issues_dashboards branch from 3dbb6ec to 69d4538 Compare February 18, 2026 02:13
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4183 February 18, 2026 02:13 Destroyed
@pkg-pr-new
Copy link

pkg-pr-new bot commented Feb 18, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@4183

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@4183

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@4183

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@4183

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@4183

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@4183

@rivetkit/sqlite-vfs

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sqlite-vfs@4183

@rivetkit/traces

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/traces@4183

@rivetkit/workflow-engine

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/workflow-engine@4183

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@4183

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@4183

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@4183

commit: 6f3d2b9

@graphite-app graphite-app bot changed the base branch from 02-11-fix_docker_cache_issues_dashboards to graphite-base/4183 February 18, 2026 02:27
@graphite-app graphite-app bot force-pushed the graphite-base/4183 branch from 69d4538 to 3d5b7c7 Compare February 18, 2026 02:27
@graphite-app graphite-app bot force-pushed the 02-11-fix_reduce_metrics_aggregator_cardinality branch from aa559af to 4bf0ef8 Compare February 18, 2026 02:27
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4183 February 18, 2026 02:27 Destroyed
@graphite-app graphite-app bot changed the base branch from graphite-base/4183 to main February 18, 2026 02:28
@graphite-app graphite-app bot force-pushed the 02-11-fix_reduce_metrics_aggregator_cardinality branch from 4bf0ef8 to 81cc8b3 Compare February 18, 2026 02:28
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4183 February 18, 2026 02:28 Destroyed
@MasterPtato MasterPtato force-pushed the 02-11-fix_reduce_metrics_aggregator_cardinality branch from 81cc8b3 to 6f3d2b9 Compare February 19, 2026 20:46
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4183 February 19, 2026 20:46 Destroyed
@MasterPtato MasterPtato force-pushed the 02-11-fix_reduce_metrics_aggregator_cardinality branch from 6f3d2b9 to d6286fd Compare February 19, 2026 22:53
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4183 February 19, 2026 22:53 Destroyed
@graphite-app
Copy link
Contributor

graphite-app bot commented Feb 19, 2026

Merge activity

  • Feb 19, 10:54 PM UTC: MasterPtato added this pull request to the Graphite merge queue.
  • Feb 19, 10:54 PM UTC: CI is running for this pull request on a draft pull request (#4241) due to your merge queue CI optimization settings.
  • Feb 19, 10:55 PM UTC: Merged by the Graphite merge queue via draft PR: #4241.

graphite-app bot pushed a commit that referenced this pull request Feb 19, 2026
# Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

## Type of change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update

## How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

## Checklist:

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant