Add new CTS test to validate if unique telemetry is reported by sysman #327
+526
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
on multi card systems for each device(GPU)
Related-To: VLCLJ-2646
Note: left the debug prints for testing purpose, will remove in the final code.
Brief Function Logic Explanations
Data Collection Functions
collectMemoryData()
Purpose: Collects memory telemetry from a device
Enumerates memory modules using zesDeviceEnumMemoryModules()
For each module: gets bandwidth counters (read/write) and memory state (free/used)
Stores in deviceData.memoryBandwidth and deviceData.memoryStates
Returns gracefully if no memory modules found
collectPowerData()
Purpose: Collects power consumption telemetry from a device
Enumerates power domains using zesDeviceEnumPowerDomains()
For each domain: gets energy counters using zesPowerGetEnergyCounter()
Stores in deviceData.powerEnergy
Returns gracefully if no power domains found
collectTemperatureData()
Purpose: Collects temperature readings from a device
Enumerates temperature sensors using zesDeviceEnumTemperatureSensors()
For each sensor: gets temperature value using zesTemperatureGetState()
Stores in deviceData.temperatures
Returns gracefully if no temperature sensors found
collectPciData()
Purpose: CRITICAL - Collects PCI info and creates unique device ID
Gets PCI properties using zesDevicePciGetProperties()
Creates BDF string: "bus:device:function" (e.g., "3:0:0")
Gets PCI traffic stats using zesDevicePciGetStats()
Returns false if PCI properties fail (test-critical failure)
Validation Functions
validateUniquePciBdf()
Purpose: CORE PMT VALIDATION - Ensures no duplicate PCI addresses
Uses std::set to detect duplicate BDF identifiers
Returns false if duplicate found → PMT mapping error detected
Most critical validation - proves each device has unique address
validateMemoryDataIsolation()
Purpose: Ensures memory counters differ between all device pairs
Double loop: Compares every device pair (i vs j where j > i)
Checks memory bandwidth: read/write counters must differ
Checks memory state: free memory should differ between devices
EXPECT_FALSE on identical data → detects PMT cross-contamination
validatePowerDataIsolation()
Purpose: Ensures power readings differ between all device pairs
Double loop: Compares every device pair
Checks energy counters: power consumption values must differ
EXPECT_FALSE on identical energy → detects shared power data
validateTemperatureDataIsolation()
Purpose: Validates temperature readings are realistic per device
Double loop: Validates each device's temperature range
Range check: 0°C < temperature < 150°C per device
No uniqueness requirement (idle GPUs may have similar temps)
Ensures PMT thermal interface is accessible
validatePciDataIsolation()
Purpose: CRITICAL - Validates PCI bus uniqueness and traffic isolation
EXPECT_NE on PCI bus numbers → different devices must be on different buses
Compares PCI traffic stats: RX/TX/packet counters must differ
EXPECT_FALSE on identical stats → detects PMT interface sharing
Core PMT mapping validation - validates the commit's fix