Add downloadable agent skill and unit test for snippets by scal444 · Pull Request #175 · NVIDIA-Digital-Bio/nvMolKit

scal444 · 2026-05-18T15:57:07Z

No description provided.

…-Bio#140) Introduce the data structures needed to return on-device coordinates from ETKDG/MMFF/UFF without packing them back into RDKit conformers. C++: - New header src/device_coord_result.h defining `CoordinateOutput` enum and the flat CSR-style `DeviceCoordResult` struct (positions, atom_starts, mol/conf indices, optional energies/converged, gpu_id). - nvmolkit/array_helpers.h: add `makePyArrayBorrowed` and an `owned` flag on `PyArray` for non-owning views over persistent device buffers. Python: - nvmolkit/types.py: add `CoordinateOutput` enum and `DeviceCoordResult` class with `per_molecule()` (lazy nested torch views) and `dense()` (padded materialization). `AsyncGpuResult` accepts an optional `gpu_id` so buffers on a non-default device are wrapped onto the correct torch device. Signed-off-by: Kevin Boyd <kboyd@nvidia.com>

Introduces a small `coord_collect` library used by upcoming ETKDG/MMFF/UFF device-output paths to consolidate per-GPU result buffers onto a single target GPU. - `copyDeviceToDeviceAsync`: peer-to-peer (or system fallback) async copy with explicit cross-stream ordering via a CUDA event, plus a same-GPU fast path that degenerates to `cudaMemcpyAsync`. - `enablePeerAccess`: best-effort bidirectional peer access enable, used once during driver setup so subsequent `cudaMemcpyPeerAsync` calls take the fast path. - gtest coverage for same-GPU copy, zero-byte no-op, self peer enable, and a multi-GPU copy that skips on single-GPU hosts. Signed-off-by: Kevin Boyd <kboyd@nvidia.com>

nMols is the authoritative outer-list size for per-molecule views; it includes molecules that produced zero conformers, which the old max()+1 approach silently dropped. This fixes per_molecule() silently truncating the outer list when high-indexed molecules produce zero conformers (e.g. partial ETKDG success). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds CoordinateOutput::DEVICE to embedMolecules. In DEVICE mode, optimized ETKDG coordinates stay on the GPU and are returned as a flat-CSR DeviceCoordResult collected onto a single target GPU. RDKit conformer lists are not modified in this mode. Conformer pruning is rejected when DEVICE is selected (follow-up). Implementation: - New `etkdg_device_collect` library: thread-local DeviceCoordCollector, shared cap state for per-molecule conformer count limits, a 4D->3D packing kernel (`appendActive`), an ETKDGCollectDeviceCoordsStage, and `finalizeOnTarget` that uses copyDeviceToDeviceAsync to consolidate per-thread partials onto the target GPU. - `embedMolecules` now returns `std::optional<DeviceCoordResult>`, accepts `output` and `targetGpu` parameters, swaps in the collect stage in DEVICE mode, and skips the host conformer-pruning post-pass. - Added a sortedMols->original-input index map so user-visible molIndices reflect the caller's input order rather than the internal sorted order. Tests (test_etkdg_device_output): shape, multi-molecule indexing, multi- conformer indexing, and pruning-rejection. Bit-exact comparison with the host path is intentionally not asserted because ETKDG is non-deterministic across calls due to random initialization of multiple parallel attempts. Signed-off-by: Kevin Boyd <kboyd@nvidia.com>

Add nMols parameter to ETKDG finalizeOnTarget so the returned DeviceCoordResult.nMols reflects the full input batch size, including molecules that produced zero conformers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds CoordinateOutput::DEVICE to MMFFMinimizeMoleculesConfs and UFFMinimizeMoleculesConfs. In DEVICE mode, optimized coordinates, energies, and convergence flags stay on the GPU and are returned via the new `device` field of {MMFF,UFF}MinimizeResult, collected onto a single target GPU. RDKit conformer lists are not modified in this mode. Implementation: - New `ff_device_collect` library (src/minimizer/): per-thread FFDeviceCoordCollector accumulating positions, energies, and converged flags on the executing GPU; `appendBatch` issues async D2D copies into the collector and downloads only the small statuses buffer to derive int8 convergence flags; `finalizeOnTarget` consolidates per-thread partials onto the target GPU using copyDeviceToDeviceAsync, builds CSR atom_starts on the host, and uploads the index arrays. - `bfgs_mmff.cpp` / `bfgs_uff.cpp`: in DEVICE mode, skip the host position/energy copy + writeBackResults and instead append directly into a thread-local FFDeviceCoordCollector; finalize after the OMP loop. - `MMFFMinimizeResult` / `UFFMinimizeResult`: add an optional `device` field to carry the on-device result. Tests (test_ff_device_output): MMFF / UFF shape on ethanol, multi-mol correct CSR indexing for MMFF, empty-input handling. All four pass on single-GPU; existing MMFF/UFF tests remain untouched. Signed-off-by: Kevin Boyd <kboyd@nvidia.com>

Adds an optional `deviceInput` parameter to MMFFMinimizeMoleculesConfs and UFFMinimizeMoleculesConfs. When supplied, starting coordinates are taken from the on-GPU DeviceCoordResult instead of from each conformer's host positions, broadcast to every executing GPU before BFGS runs. Each input mol still needs at least one RDKit conformer for force-field construction (consulted only for hybridization/VDW special-case checks; the actual positions used for minimization come from device_input). Implementation: - ff_device_collect: DeviceInputIndex builder validates the device-input conformer set matches the host-flattened (mol, conf) labels exactly and precomputes per-conformer source offsets. broadcastDeviceInputBatch then issues one cross-GPU async copy per batch slot using copyDeviceToDeviceAsync. - bfgs_mmff / bfgs_uff: build the index up front, share it with workers, precompute per-batch (srcIdx, atomCount) lists, and overwrite positions on each executing GPU after the host->device upload. Constraints + device_input is rejected with a clear error message (handling that combination is a follow-up). Tests (test_ff_device_output): MMFF round-trip with device_input matches no-input within BFGS tolerance, mismatched conformer-count and constraints combinations are rejected. Signed-off-by: Kevin Boyd <kboyd@nvidia.com>

Adds DEVICE-mode output to MMFFBatchedForcefield and UFFBatchedForcefield so chained Python workflows can read on-GPU energies / gradients / positions / minimize results without the existing device->host->device triple-pump. C++ (NativeMMFFBatchedForcefield / NativeUFFBatchedForcefield): - Track the wrapper's home GPU at construction time. - New `computeEnergyDevice` / `computeGradientsDevice` return borrowed PyArrays over the wrapper's persistent energy/grad buffers. - New `positionsDevice` returns a borrowed (n_atoms, 3) PyArray over the wrapper's persistent positions buffer. - New `indexBuffers` exposes (atom_starts, mol_indices, conf_indices) as AsyncGpuResults. - New `minimizeDevice` plumbs through to MMFF/UFF MinimizeMoleculesConfs with output=DEVICE, refreshes the wrapper's positions buffer in-place via a single D2D memcpy on the same stream (no host roundtrip), and returns a Python DeviceCoordResult. Python (`MMFFBatchedForcefield`, `UFFBatchedForcefield`): - `compute_energy(output=DEVICE)` returns AsyncGpuResult of total energies. - `compute_gradients(output=DEVICE)` returns flat AsyncGpuResult of gradients in CSR layout. - New `positions()` and `indices()` accessors for the wrapper's persistent device state. - `minimize(output=DEVICE, target_gpu=None)` returns DeviceCoordResult. - Imports `_arrayHelpers` so PyArray returns are usable. CMake: - Add `${CMAKE_SOURCE_DIR}/src` to bfgs_mmff/bfgs_uff PUBLIC includes so consumers can reach `device_coord_result.h`. - Link `_batchedForcefield` against `coord_collect` and `device`. Tests (test_batched_forcefield_device.py): compute_energy / gradients / indices / positions accessor / minimize device-mode equivalence with host mode for both MMFF and UFF. 10 new tests pass; 30 existing batched-FF tests continue to pass. Signed-off-by: Kevin Boyd <kboyd@nvidia.com>

Add nMols parameter to FF finalizeOnTarget so the returned DeviceCoordResult.nMols reflects the full input batch size. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Ship a SKILL.md file under agent-skills/nvmolkit-usage/ that downstream developers can copy into their own .cursor/skills/ to help an agent write correct nvMolKit-using code. The skill covers the public Python entry points, the AsyncGpuResult contract, runtime requirements, runnable recipes (Morgan + Tanimoto, ETKDG embed, MMFF minimize, single- GPU HardwareOptions tuning), and reference tables for HardwareOptions and SubstructSearchConfig. Advertise the skill from README.md and add a short page to the docs site (docs/agent_skill.rst, wired into the Guides toctree) so it's discoverable from the rendered docs.

…s toctree The skill PR had been carrying device-output additions to nvmolkit/batchedForcefield.{cpp,py} via a merge from the device-output branch; those belong in their own PR and are reverted here to match main. docs/agent_skill.rst was an orphan file - added to the Guides toctree in docs/index.rst, dropped the stale per-feature enumeration, and replaced the dangling "see the install section above" with a proper cross-reference to a new :ref:`installation` label.

greptile-apps · 2026-05-18T16:00:06Z

Greptile Summary

This PR ships a Cursor/agent skill (SKILL.md) that teaches AI coding agents the nvMolKit public API, and adds a test that extracts every Python code block from the skill file and runs each one as a subprocess against the installed package.

agent-skills/nvmolkit-usage/SKILL.md: New skill file with entry-point table, result-type documentation, and six self-contained runnable recipes covering fingerprints, similarity, ETKDG embedding, MMFF/UFF minimization, and BatchedForcefield constraints.
nvmolkit/tests/test_skill.py: New parametrized test that parses Python code blocks from SKILL.md with a regex and runs each via subprocess.run, asserting zero exit code; the parametrize call eagerly evaluates _extract_python_blocks at collection time (flagged in a prior review thread).
docs/: New agent_skill.rst page and minor index.rst wiring (.. _installation: label, toctree entry).

Confidence Score: 5/5

Safe to merge — all changes are purely additive with no modifications to existing production code paths.

All six SKILL.md recipes are self-contained and verified by the new test. Documentation and README changes are purely additive. The only open concern (collection-time failure if SKILL.md moves) was already raised in the existing review thread.

nvmolkit/tests/test_skill.py — the eager parametrize evaluation at collection time.

Important Files Changed

Filename	Overview
nvmolkit/tests/test_skill.py	New test that extracts and runs every Python code block from SKILL.md via subprocess; parametrize call happens eagerly at collection time (see existing thread).
agent-skills/nvmolkit-usage/SKILL.md	New agent skill file with entry-point table, runtime requirements, result type docs, and six self-contained runnable recipes.
docs/agent_skill.rst	New RST page documenting how to install and use the Cursor agent skill; correctly cross-references the installation section.
docs/index.rst	Adds `.. _installation:` label and wires `agent_skill` into the toctree; straightforward docs wiring.
README.md	Adds a short section pointing users to the agent skill directory; no issues.

_{Reviews (3): Last reviewed commit: "Merge branch 'main' into agent/nvmolkit-..." | Re-trigger Greptile}

evasnow1992

Thank you for adding this skill. Changes look good to me!

scal444 and others added 18 commits April 30, 2026 12:07

Propagate nMols through ETKDG finalizeOnTarget

9c80624

Add nMols parameter to ETKDG finalizeOnTarget so the returned DeviceCoordResult.nMols reflects the full input batch size, including molecules that produced zero conformers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Propagate nMols through FF finalizeOnTarget

a9392d1

Add nMols parameter to FF finalizeOnTarget so the returned DeviceCoordResult.nMols reflects the full input batch size. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'main' into deviceout-pr3-ff-output-input

73c2fb4

Some consolidations and clarifications

91450d4

Fix compilation error from rebsae

fe0a8f2

Add unit test for skill

2672468

Merge branch 'main' into agent/nvmolkit-usage-skill

eb733ef

Merge branch 'main' into agent/nvmolkit-usage-skill

797c6e1

Update skill to match new features and bionemo spec draft

85ed9b8

scal444 requested a review from evasnow1992 May 18, 2026 15:57

greptile-apps Bot reviewed May 18, 2026

View reviewed changes

Comment thread nvmolkit/tests/test_skill.py

scal444 added 2 commits May 18, 2026 12:11

Address greptile feedback

a59a85c

Merge branch 'main' into agent/nvmolkit-usage-skill

555ba79

evasnow1992 approved these changes May 18, 2026

View reviewed changes

scal444 merged commit 434440f into NVIDIA-Digital-Bio:main May 18, 2026
8 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add downloadable agent skill and unit test for snippets#175

Add downloadable agent skill and unit test for snippets#175
scal444 merged 20 commits into
NVIDIA-Digital-Bio:mainfrom
scal444:agent/nvmolkit-usage-skill

scal444 commented May 18, 2026

Uh oh!

greptile-apps Bot commented May 18, 2026 •

edited

Loading

Greptile Summary

Uh oh!

Uh oh!

evasnow1992 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

scal444 commented May 18, 2026

Uh oh!

greptile-apps Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

evasnow1992 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented May 18, 2026 •

edited

Loading