Skip to content

Add Python build/pytest CI workflow on colossus runner#167

Open
scal444 wants to merge 4 commits into
NVIDIA-Digital-Bio:mainfrom
scal444:ci-python-build-test
Open

Add Python build/pytest CI workflow on colossus runner#167
scal444 wants to merge 4 commits into
NVIDIA-Digital-Bio:mainfrom
scal444:ci-python-build-test

Conversation

@scal444
Copy link
Copy Markdown
Collaborator

@scal444 scal444 commented May 7, 2026

Builds the Python package and runs pytest on a single matrix entry (py3.13 / rdkit 2025.9.2). Source build for now; in principle the conda artifact from conda-build.yml could be reused, but cross-workflow artifact passing is awkward.

Mirrors the cpp-build-test workflow's container + runner setup, then
builds the Python package via pip + scikit-build and runs the pytest
suite (excluding the `long` substructure integration tests, matching
pip-build.yml's smoke job). Single matrix entry pinned to py3.13 /
rdkit 2025.9.2, the latest CUDA 12.x pair tracked in conda-build.yml.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 7, 2026

Greptile Summary

This PR adds a new GitHub Actions CI workflow that builds nvmolkit from source and runs the pytest suite on the colossus self-hosted runner (V100, sm_70) inside a CUDA 12.6 Ubuntu 22.04 container. Note that the PR description references py3.13/rdkit 2025.9.2, but the implemented matrix entry is py3.12/rdkit 2025.3.6 — the code comments explain this clearly (torch ≥2.5 dropped sm_70, and py3.13 wheels start at 2.5).

  • Installs Miniforge + conda deps via the shared setup_dependencies.sh script, then pip-installs torch pinned to <2.5 and other test dependencies before building the package with --no-build-isolation.
  • Runs pytest from /tmp (preventing accidental source-tree imports) with -k 'not long' to skip slow tests, mirroring the structure of the existing C++ CI workflow.

Confidence Score: 5/5

This workflow addition is safe to merge — it only adds a new CI job and does not touch any production code.

The workflow is well-structured with clear comments explaining all non-obvious constraints. The only findings are minor shell-flag inconsistencies with no realistic runtime impact.

No files require special attention.

Important Files Changed

Filename Overview
.github/workflows/python-build-test.yml New CI workflow that source-builds nvmolkit and runs pytest on a colossus V100 runner; structure mirrors the existing C++ workflow with minor inconsistencies in shell flags.

Reviews (3): Last reviewed commit: "Drop CI matrix back to py3.12 for V100 r..." | Re-trigger Greptile

Comment on lines +69 to +76
python -m pip install \
"scikit-build>=0.18" \
"numpy>=1.23" \
"torch>=2.1" \
triton \
pandas \
psutil \
optuna
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 triton is specified without a version pin, but Triton releases are tightly coupled to specific PyTorch versions. With torch>=2.1 as a very wide lower bound, pip could resolve a Triton version that is incompatible with whichever PyTorch version is actually installed, leading to import failures at test time. Pinning Triton alongside torch (or at least bounding it with something like triton>=2.1,<4) mirrors how the PyTorch project distributes matching Triton builds.

Suggested change
python -m pip install \
"scikit-build>=0.18" \
"numpy>=1.23" \
"torch>=2.1" \
triton \
pandas \
psutil \
optuna
python -m pip install \
"scikit-build>=0.18" \
"numpy>=1.23" \
"torch>=2.1" \
"triton>=2.1" \
pandas \
psutil \
optuna

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptileai isn't that not true because whatever torch we bring in will pin triton?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. PyTorch declares triton as a pinned dependency in its package metadata (e.g., torch==2.6.0 requires triton==3.2.0), so pip's dependency resolver will install the exact triton version that the resolved torch requires. The explicit triton entry in your requirements just resolves to whatever version satisfies torch's constraint — there's no incompatibility risk. My concern was unfounded.

Tip: You can give me more context with [.greptile/rules.md](https://www.greptile.com/docs/code-review/custom-standards) so I get this right next time.

scal444 added 3 commits May 7, 2026 12:50
PyPI torch wheels from 2.5+ dropped sm_70 from their default arch list,
which produces cudaErrorNoKernelImageForDevice on the colossus V100.
py3.13 torch wheels start at 2.5, which dropped sm_70 from default
arches. Use py3.12 + rdkit 2025.3.6 instead.
The default PyPI torch wheel ships an older bundled libcudart that
predates cudaGraphAddNode (added in CUDA 12.4), and the cu128/cu129
channels dropped sm_70.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 7, 2026

Greptile encountered an error while reviewing this PR. Please reach out to support@greptile.com for assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant