Skip to content

feat: Add machine metadata to health OTLP log exports #2876

Description

@jayzhudev

Is this a new feature, an enhancement, or a change to existing functionality?

Enhancement

How would you describe the priority of this feature request

Medium

Please provide a clear description of problem this feature solves

An XID log body may not contain:

  • machine serial
  • GPU driver version
  • component type

Without the above fields, the log analyzer cannot reliably produce NVL domain health reports that can be properly emitted to NICo API and be associated to corresponding machines.

Feature Description

Add missing metadata to health OTLP exports so downstream consumers can associate XID/NVRM log records with the correct machine, driver, NVLink domain, and component type.

Describe your ideal solution

No response

Describe any alternatives you have considered

No response

Additional context

No response

Code of Conduct

  • I agree to follow NVIDIA Infra Controller's Code of Conduct
  • I have searched the open feature requests and have found no duplicates for this feature request

Metadata

Metadata

Assignees

Labels

featureFeature (deprecated - use issue type, but it's needed for reporting now)rack health
No fields configured for Enhancement.

Projects

Status
In Progress

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions