Backlog: CUDA follow-up work

## Context
Follow-up backlog for the CUDA target/runtime work introduced by PR #1.

The current PR adds the CUDA target, native CUDA runtime module support, CUDA-aware NTT runtime primitives, CUDA kernel tests, and a dedicated Linux CUDA CI job. The items below are intentionally tracked outside the PR so they do not block the initial CUDA support merge.

## Backlog
- [ ] Register and maintain a dedicated self-hosted CUDA runner pool so the CUDA job can run as a required PR check.
- [ ] Expand CUDA CI from the focused kernel coverage toward the full CUDA kernel test class once runner capacity and runtime stability are sufficient.
- [ ] Add matrix coverage for additional CUDA architectures or toolkit versions if compatibility across GPU generations becomes a release requirement.
- [ ] Continue reducing duplicated CPU/CUDA codegen and runtime paths as the NTT target abstraction stabilizes.

## Notes
- Full `UnitTestCUDAKernels` execution requires a Linux runner with an NVIDIA GPU, CUDA toolkit, `nvcc`, `clang/clang++`, and the labels `self-hosted`, `linux`, `x64`, `cuda`.
- The initial CUDA CI job uses CUDA architecture `120`, matching the local validation environment used for PR #1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backlog: CUDA follow-up work #2

Context

Backlog

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Backlog: CUDA follow-up work #2

Description

Context

Backlog

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions