Skip to content

gpu: update metric metadata #20788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

gjulianm
Copy link
Contributor

@gjulianm gjulianm commented Jul 18, 2025

What does this PR do?

Updates GPU monitoring metric metadata, adding the metrics added with the GPM collector and updating the description of the core/memory usage metrics. https://github.com/DataDog/datadog-agent/pull/38254/files

Motivation

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@gjulianm gjulianm self-assigned this Jul 18, 2025
@gjulianm gjulianm marked this pull request as ready for review July 18, 2025 11:00
@gjulianm gjulianm requested a review from a team as a code owner July 18, 2025 11:00
@gjulianm gjulianm added the qa/skip-qa Automatically skip this PR for the next QA label Jul 18, 2025
val06
val06 previously approved these changes Jul 18, 2025
gpu/metadata.csv Outdated
Comment on lines 24 to 26
gpu.fp16_active,gauge,,percent,,Percentage of the time that the 16-bit floating point calculation engine was active,0,gpu,fp16_active,,
gpu.fp32_active,gauge,,percent,,Percentage of the time that the 32-bit floating point calculation engine was active,0,gpu,fp32_active,,
gpu.fp64_active,gauge,,percent,,Percentage of the time that the 64-bit floating point calculation engine was active,0,gpu,fp64_active,,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add here the device architecture limitation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, added!

@temporal-github-worker-1 temporal-github-worker-1 bot dismissed val06’s stale review July 18, 2025 11:24

Review from val06 is dismissed. Related teams and files:

  • ebpf-platform
    • gpu/metadata.csv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants