feat: add power estimation #153

kaim-eng · 2025-12-04T16:03:17Z

This is a new start of previous PR104
#104

I've been trying to rebase the dev branch of 104 to main ToT for several days but going nowhere.
This PR is basically redo what we have for 104 but from 12/3 ToT.

copy-pr-bot · 2025-12-04T16:03:21Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

- Add PerformanceResult class: float-like object that carries both latency and power data - Add power_w column to all output DataFrames (ColumnsStatic, ColumnsAgg, ColumnsDisagg) - Extend PerfDatabase with query_*_with_power methods for power-aware queries - Update all backend classes to support power estimation queries - Modify operations to return PerformanceResult with power data - Update inference session/summary to track and aggregate power consumption - Extend CLI report tables to display power_w column in results - Maintain backward compatibility with legacy database formats (defaults to 0.0W) This feature enables power consumption estimation alongside performance metrics, allowing users to optimize for both throughput and power efficiency. Signed-off-by: Kai Ma <[email protected]>

- Fix line length violations by splitting long strings - Remove useless if-else condition in perf_database - Clean up whitespace and formatting issues - All ruff checks now pass Signed-off-by: Kai Ma <[email protected]>

Update test_data_loaders.py to check for dictionary format with 'latency' and 'power' keys instead of plain float values. This aligns tests with the power estimation feature changes to data loader functions. Signed-off-by: Kai Ma <[email protected]>

Update test_interpolation.py to handle both dictionary format (with 'latency' and 'power' keys) and legacy float format. This ensures backward compatibility while supporting the new power-aware data structure. Signed-off-by: Kai Ma <[email protected]>

Signed-off-by: Kai Ma <[email protected]>

tianhaox · 2025-12-08T16:17:21Z

src/aiconfigurator/sdk/operations.py

        self._scale_factor = scale_factor

-    def query(self, database: PerfDatabase, **kwargs):
+    def query(self, database: PerfDatabase, **kwargs) -> float:


for return, should we indicate PerformanceResult, or just float

Yes. I thought about this for quite some time. Currently, we return Performance_Result to minimize the impact to existing code
src/aiconfigurator/sdk/performance_result.py‎
It currently default return to float unless we tell it to return both perf and power.
I will change this to Performance_Result to be accurate

Signed-off-by: Kai Ma <[email protected]>

kaim-eng requested review from AichenF, Arsene12358, Ethan-ES, YijiaZhao, davilu-nvidia, ilyasher, jasonqinzhou, simone-chen, tianhaox and xutizhou as code owners December 4, 2025 16:03

kaim-eng changed the title ~~New 104~~ feat: add power estimation Dec 4, 2025

github-actions bot added the feat label Dec 4, 2025

kaim-eng force-pushed the new-104 branch 2 times, most recently from 4e3bc8f to ae43171 Compare December 4, 2025 16:38

kaim-eng requested a review from saturley-hall as a code owner December 4, 2025 16:55

kaim-eng added 7 commits December 4, 2025 19:00

Fix linting issues in power estimation code

575e2e8

- Fix line length violations by splitting long strings - Remove useless if-else condition in perf_database - Clean up whitespace and formatting issues - All ruff checks now pass Signed-off-by: Kai Ma <[email protected]>

collect power data

a4f2d69

Signed-off-by: Kai Ma <[email protected]>

fix ruff check

ae11269

Signed-off-by: Kai Ma <[email protected]>

fix ruff check --format

5eb5a6b

Signed-off-by: Kai Ma <[email protected]>

kaim-eng force-pushed the new-104 branch from 3e5858b to 5eb5a6b Compare December 5, 2025 03:01

kaim-eng added 2 commits December 5, 2025 09:26

fix moe collector no power output bug

7a1beb6

Signed-off-by: Kai Ma <[email protected]>

ruff reformat

230f4fc

Signed-off-by: Kai Ma <[email protected]>

tianhaox reviewed Dec 8, 2025

View reviewed changes

kaim-eng added 3 commits December 8, 2025 12:08

limit runtime for short run kernel

b41348c

Signed-off-by: Kai Ma <[email protected]>

move internal data from power to energy

f5dbec9

Signed-off-by: Kai Ma <[email protected]>

Merge branch 'main' into new-104 and resolve conflicts

ca05b54

kaim-eng added 4 commits December 8, 2025 16:06

fix testcase, expect energy

0e2fee5

Signed-off-by: Kai Ma <[email protected]>

add --measure_power to comm kernels

9beca27

Signed-off-by: Kai Ma <[email protected]>

add comm related query_*_with_energy

a943fc0

Signed-off-by: Kai Ma <[email protected]>

fixe ruff format

ffc0aef

Signed-off-by: Kai Ma <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add power estimation #153

feat: add power estimation #153

kaim-eng commented Dec 4, 2025

Uh oh!

copy-pr-bot bot commented Dec 4, 2025

Uh oh!

tianhaox Dec 8, 2025

Uh oh!

kaim-eng Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add power estimation #153

Are you sure you want to change the base?

feat: add power estimation #153

Conversation

kaim-eng commented Dec 4, 2025

Uh oh!

copy-pr-bot bot commented Dec 4, 2025

Uh oh!

tianhaox Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

kaim-eng Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants