[FEA]: ncu metrics within cuda-python

### Is this a duplicate?

- [x] I confirmed there appear to be no [duplicate issues](https://github.com/NVIDIA/cuda-python/issues) for this request and that I agree to the [Code of Conduct](CODE_OF_CONDUCT.md)

### Area

General cuda-python

### Is your feature request related to a problem? Please describe.

I would like to be able to gather ncu metrics such as bank conflicts from within the cuda-python system. It's now really easy to do a grid search i.e. of cta sizes in different dimensions and jit compile them and bench them. It would be amazing if it were possible to launch each kernel and gather bank conflict metrics without leaving the script. 

### Describe the solution you'd like

Maybe something configured similarly to `cuda.core.experimental.Program` and `cuda.core.experimental.ProgramOptions` where you could specify metrics and regex patterns for launches within a context maybe using `with ...` syntax in python or `start()` `stop()`. 

### Describe alternatives you've considered

I'll probably end up for the time being just writing a script that has a bench mode and jit compiles the cartesian product of a bunch of params, benches them and then has the script launch itself as a subprocess in `ncu-mode` for iteration counts and problem sizes compatible with ncu. I'll probably write that out to csv and then parse that within the current script.

### Additional context

I've been working with cutlass/cute using a lightweight python database to cache jit compiled kernels with cuda-python and it's been a really nice dev process. I'd prefer this ecosystem even if I was developing the kernels for native c/c++. I can shmoo over cta sizes and watch for registers spills and bench, if ncu could be integrated it would be easy to shmoo over mma tiling permutations, swizzle patterns etc...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEA]: ncu metrics within cuda-python #681

Is this a duplicate?

Area

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA]: ncu metrics within cuda-python #681

Description

Is this a duplicate?

Area

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions