Skip to content

Backlog: CUDA follow-up work #2

Description

@sunnycase

Context

Follow-up backlog for the CUDA target/runtime work introduced by PR #1.

The current PR adds the CUDA target, native CUDA runtime module support, CUDA-aware NTT runtime primitives, CUDA kernel tests, and a dedicated Linux CUDA CI job. The items below are intentionally tracked outside the PR so they do not block the initial CUDA support merge.

Backlog

  • Register and maintain a dedicated self-hosted CUDA runner pool so the CUDA job can run as a required PR check.
  • Expand CUDA CI from the focused kernel coverage toward the full CUDA kernel test class once runner capacity and runtime stability are sufficient.
  • Add matrix coverage for additional CUDA architectures or toolkit versions if compatibility across GPU generations becomes a release requirement.
  • Continue reducing duplicated CPU/CUDA codegen and runtime paths as the NTT target abstraction stabilizes.

Notes

  • Full UnitTestCUDAKernels execution requires a Linux runner with an NVIDIA GPU, CUDA toolkit, nvcc, clang/clang++, and the labels self-hosted, linux, x64, cuda.
  • The initial CUDA CI job uses CUDA architecture 120, matching the local validation environment used for PR Add CUDA Target, Runtime, and Kernel CI Support #1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions