Skip to content

Conversation

@teskje
Copy link
Contributor

@teskje teskje commented Sep 23, 2025

This PR extends the usage metrics reporting in clusterd to also report the current memory usage and heap limit. environmentd then makes use of the new information to populate two new columns in mz_cluster_replica_metrics_history: heap_bytes and heap_limit. The new columns will be useful in implementing the new "Memory Utilization" UI in the Console, especially in self-managed environments.

Motivation

  • This PR adds a known-desirable feature.

Closes https://github.com/MaterializeInc/database-issues/issues/9692

Tips for reviewer

Thanks to #33720, we won't be losing any data already in mz_cluster_replica_metrics_history by making this change. The new columns will have NULL for old measurements instead.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@teskje teskje force-pushed the heap-metrics branch 12 times, most recently from 8850a63 to 1519e89 Compare September 25, 2025 14:26
@teskje teskje force-pushed the heap-metrics branch 4 times, most recently from 65e8c30 to 0ddc09e Compare September 30, 2025 21:05
@teskje teskje marked this pull request as ready for review October 1, 2025 16:56
@teskje teskje requested review from a team as code owners October 1, 2025 16:56
@teskje teskje requested review from SangJunBak and antiguru October 1, 2025 16:56

let mut memory = None;
let mut swap = None;
for line in meminfo.lines() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do something like ProcStatus for parsing here.

SqlScalarType::TimestampTz { precision: None }.nullable(false),
)
.with_column("heap_bytes", SqlScalarType::UInt64.nullable(true))
.with_column("heap_limit", SqlScalarType::UInt64.nullable(true))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment: Fields only at the end + nullable

Copy link
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed in person, LGTM!

teskje added 3 commits October 2, 2025 07:59
This commit extends the usage metrics provided by clusterd's
`/api/usage-metrics` endpoint with the memory usage, as well as the
effective heap (memory+swap) limit.

As for the existing usage collection, we mainly focus on Linux support,
for macOS we only provide the required stubs to make sure the code
compiles and run.
This change makes the kubernetes orchestrator collect the new heap
metrics reported by clusterd processes and report them to the
controller, where they are added to the
`mz_cluster_replica_metrics_history` relation.
@teskje teskje merged commit df4f15b into MaterializeInc:main Oct 2, 2025
133 checks passed
teskje added a commit to teskje/materialize that referenced this pull request Oct 6, 2025
…ics"

This reverts commit df4f15b, reversing
changes made to 56611d3.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants