diff --git a/README.md b/README.md index 74f25f3..1110a92 100644 --- a/README.md +++ b/README.md @@ -3,6 +3,10 @@ ## Documentation Please see the official [NVIDIA nvMolKit Documentation](https://nvidia-digital-bio.github.io/nvMolKit/) for an overview of features, examples, and a detailed API reference. +## Cursor / agent skill + +If you use [Cursor](https://cursor.com/) (or another agent that supports the `SKILL.md` format) to write code that calls nvMolKit, you can copy [`agent-skills/nvmolkit-usage/`](agent-skills/nvmolkit-usage/) into your project's `.cursor/skills/` (or `~/.cursor/skills/` for personal use). It gives the agent the public Python entry-point map, runtime requirements, and runnable recipes. + ## Installation Guide **IMPORTANT**: nvMolKit requires an NVIDIA GPU with compute capability 7.0 (V100) or higher to run. Check your GPU's compute capability [here](https://developer.nvidia.com/cuda-gpus). diff --git a/agent-skills/nvmolkit-usage/SKILL.md b/agent-skills/nvmolkit-usage/SKILL.md new file mode 100644 index 0000000..84b6ff6 --- /dev/null +++ b/agent-skills/nvmolkit-usage/SKILL.md @@ -0,0 +1,269 @@ +--- +name: nvmolkit-usage +description: Write code that calls the installed nvMolKit Python API for GPU-accelerated, batched RDKit-style operations - Morgan fingerprints, Tanimoto/cosine similarity, ETKDG conformer embedding, MMFF/UFF optimization, TFD, conformer RMSD, Butina clustering, and substructure search. Use when the user is importing `nvmolkit.*`, debugging an `nvmolkit` call, choosing between nvMolKit and RDKit for a batched cheminformatics workflow, or wiring nvMolKit results into a torch/numpy pipeline. Out of scope: building nvMolKit from source. +license: Apache-2.0 +metadata: + owner: Kevin Boyd (@scal444) + risk_tier: skill +--- + +# nvMolKit usage + +## What nvMolKit is + +GPU-accelerated, batched implementations of common RDKit operations. APIs mirror RDKit where possible but are batch-oriented: they take lists of `rdkit.Chem.Mol` (or lists of fingerprints) and process them in parallel on one or more GPUs. nvMolKit links against RDKit at build time; inputs and outputs are real RDKit `Mol` objects. + +## Where nvMolKit does well + +Reach for nvMolKit when: + +- The workload is **a large batch of molecules** processed together (typically thousands or more). +- The metric is **throughput / total wall time across the batch**, not per-molecule latency. +- The same operation is **repeated identically** across the batch (fingerprinting a library, embedding/minimizing many conformers, bulk pairwise similarity), so the GPU stays saturated. + +Plain RDKit is usually the better choice for single-molecule one-offs or workflows that can't be expressed as a batch. nvMolKit is not meant to replace RDKit for those cases. + +## Runtime requirements + +- An NVIDIA GPU with compute capability 7.0 (V100) or higher +- A CUDA driver compatible with CUDA 12.6+. +- A working `torch` install with CUDA support (nvMolKit returns GPU tensors via `torch`'s CUDA array interface). + +If CUDA is unavailable, nvMolKit calls raise. There is no CPU fallback - if the user needs one, use RDKit directly for that path. + +## Verify the install before writing real code + +Run this once to confirm nvMolKit is importable and a GPU op works end to end: + +```python +import nvmolkit +import torch +from rdkit import Chem +from nvmolkit.fingerprints import MorganFingerprintGenerator + +print("nvmolkit:", nvmolkit.__version__) +print("cuda available:", torch.cuda.is_available()) +print("device count:", torch.cuda.device_count()) + +mols = [Chem.MolFromSmiles(smi) for smi in ["CCO", "c1ccccc1", "CC(=O)O"]] +fpgen = MorganFingerprintGenerator(radius=2, fpSize=1024) +result = fpgen.GetFingerprints(mols) +torch.cuda.synchronize() +fps = result.torch() +print("fps shape:", tuple(fps.shape), "dtype:", fps.dtype) +# Expected: shape (3, 32), dtype torch.int32 (1024 bits packed into 32 int32s per row) +``` + +If this fails, point the user at the install guide on the docs site rather than guessing - see "Going deeper" below. + +## Entry points + +| Task | Module | Primary entry point | +|---|---|---| +| Morgan fingerprints | `nvmolkit.fingerprints` | `MorganFingerprintGenerator(radius, fpSize).GetFingerprints(mols)` | +| Bulk Tanimoto / cosine similarity | `nvmolkit.similarity` | `crossTanimotoSimilarity(...)`, `crossCosineSimilarity(...)`, plus `*MemoryConstrained` variants for results too large to fit in GPU memory | +| ETKDG conformer embedding | `nvmolkit.embedMolecules` | `EmbedMolecules(molecules, params, confsPerMolecule, ...)` | +| MMFF94 optimization (one-shot) | `nvmolkit.mmffOptimization` | `MMFFOptimizeMoleculesConfs(molecules, ...)` | +| UFF optimization (one-shot) | `nvmolkit.uffOptimization` | `UFFOptimizeMoleculesConfs(molecules, ...)` | +| Forcefield with custom options + constraints | `nvmolkit.batchedForcefield` | `MMFFBatchedForcefield(mols, properties=..., nonBondedThreshold=..., ignoreInterfragInteractions=..., hardwareOptions=...)`, `UFFBatchedForcefield(mols, vdwThreshold=..., ...)`. Per-molecule view `ff[i]` exposes `add_distance_constraint`, `add_position_constraint`, `add_angle_constraint`, `add_torsion_constraint`. Methods: `.compute_energy()`, `.compute_gradients()`, `.minimize(maxIters, forceTol)` | +| Pairwise conformer RMSD | `nvmolkit.conformerRmsd` | `GetConformerRMSMatrix(mol)`, `GetConformerRMSMatrixBatch(mols)` | +| Torsion Fingerprint Deviation (TFD) | `nvmolkit.tfd` | `GetTFDMatrix(mol)`, `GetTFDMatrices(mols)` | +| Butina clustering | `nvmolkit.clustering` | `butina(distance_matrix, cutoff)` (precomputed matrix), `fused_butina(fingerprints, cutoff)` (memory-efficient, on-the-fly) | +| Substructure search | `nvmolkit.substructure` | `hasSubstructMatch`, `countSubstructMatches`, `getSubstructMatches` | +| Hardware tuning (batch size, GPU IDs) | `nvmolkit.types` | `HardwareOptions(...)` passed to ETKDG / MMFF / UFF | +| Optional autotuning of `HardwareOptions` | `nvmolkit.autotune` | `tune_embed_molecules`, `tune_mmff_optimize`, `tune_uff_optimize`, `tune_batched_forcefield`. Requires the `optuna` package | + +## Result types and execution model + +Two return shapes carry GPU-resident output, depending on what the operation produces. + +### `AsyncGpuResult` + +Used by operations that return a single flat tensor (fingerprints, similarity matrices, RMSD/TFD vectors, Butina inputs). Key behaviors: + +- Asynchronous. The kernel may not have completed when the call returns. +- `result.torch()` returns a zero-copy `torch.Tensor` on the GPU. Caller is responsible for synchronizing before reading values on the host. +- `result.numpy()` synchronizes and returns a CPU numpy array. +- Exposes `__cuda_array_interface__`, so it can be passed directly into other nvMolKit functions (e.g. fingerprints → similarity) with no host round-trip. + +#### CUDA stream control + +A subset of the `AsyncGpuResult`-returning APIs accept an optional `stream: torch.cuda.Stream | None = None` argument so callers can submit nvMolKit work to a non-default stream and overlap it with their own kernels. When omitted, the call uses the current torch stream. + +APIs that take a `stream` argument: + +- `MorganFingerprintGenerator.GetFingerprints` +- `crossTanimotoSimilarity`, `crossCosineSimilarity`, and their `*MemoryConstrained` variants +- `butina`, `fused_butina` +- `GetConformerRMSMatrix`, `GetConformerRMSMatrixBatch` + +Other APIs (ETKDG, MMFF/UFF optimization, TFD, substructure search) are synchronous to the caller — no stream plumbing needed. + +Typical pattern: + +```python +import torch +from rdkit import Chem +from nvmolkit.fingerprints import MorganFingerprintGenerator +from nvmolkit.similarity import crossTanimotoSimilarity + +stream = torch.cuda.Stream() +fpgen = MorganFingerprintGenerator(radius=2, fpSize=1024) +mols = [Chem.MolFromSmiles(smi) for smi in ["CCO", "c1ccccc1", "CC(=O)O"]] + +with torch.cuda.stream(stream): + fps = fpgen.GetFingerprints(mols, stream=stream) + sim = crossTanimotoSimilarity(fps, stream=stream) +stream.synchronize() +print(sim.torch()) +``` + +### `Device3DResult` + +Used by ETKDG embedding and MMFF/UFF optimization (one-shot and `BatchedForcefield`) when called with `output=CoordinateOutput.DEVICE`. The GPU-resident equivalent of writing conformers back to `Mol` objects. Fields: + +- `values`: `AsyncGpuResult` of shape `(total_atoms, 3)` float64. Concatenated conformer coordinates in CSR-style layout. +- `atom_starts`, `mol_indices`, `conf_indices`: `AsyncGpuResult` int32 buffers describing the layout (`values[atom_starts[i]:atom_starts[i+1]]` is conformer `i`'s atoms). +- `energies`, `converged`: `AsyncGpuResult` buffers populated only for MMFF/UFF minimization (not for plain ETKDG). +- `gpu_id`: device the buffers live on. The `targetGpu` argument on each API picks this; `targetGpu=-1` uses the default consolidation device. +- `.per_molecule()` returns nested `list[list[torch.Tensor]]` of per-conformer views; `.dense(pad_value=nan)` materializes a padded `(n_mols, max_confs, max_atoms, 3)` tensor. + +The default mode (`CoordinateOutput.RDKIT_CONFORMERS`) still writes optimized coordinates back into each `Mol` and returns Python lists of energies/convergence flags. Reach for `CoordinateOutput.DEVICE` when chaining downstream GPU work (e.g. ETKDG → MMFF → similarity scoring) without host round-trips. + +## Configuration + +Two configuration objects expose the GPU/CPU knobs. + +### `HardwareOptions` (ETKDG, MMFF, UFF) + +`from nvmolkit.types import HardwareOptions`. Passed via `hardwareOptions=` to `EmbedMolecules`, `MMFFOptimizeMoleculesConfs`, `UFFOptimizeMoleculesConfs`, and the `BatchedForcefield` constructors. Every field has an "auto" sentinel; the defaults are usually fine. + +| Field | Type | Default | Meaning | +|---|---|---|---| +| `preprocessingThreads` | int | `-1` (all visible CPUs) | CPU threads for preprocessing | +| `batchSize` | int | `-1` (auto-tuned) | Number of conformers per GPU batch | +| `batchesPerGpu` | int | `-1` (auto) | Concurrent batches per GPU; must be `>0` or `-1` | +| `gpuIds` | `list[int]` | `[]` (all visible GPUs) | Specific device ordinals to target | + +Passing a `gpuIds` entry for a device that isn't visible raises `RuntimeError: invalid device ordinal`. For finding good values automatically across a representative sample, see `nvmolkit.autotune` (requires the `optuna` extra); each `tune_*` function returns a `TuneResult` whose `best_config` is a fully-populated `HardwareOptions` ready to pass back into the real call. + +`HardwareOptions` round-trips through `to_dict()` / `from_dict()` for persisting tuned configs to disk. + +### `SubstructSearchConfig` (substructure search) + +`from nvmolkit.substructure import SubstructSearchConfig`. Passed via `config=` to `hasSubstructMatch`, `countSubstructMatches`, and `getSubstructMatches`. + +| Field | Type | Default | Meaning | +|---|---|---|---| +| `batchSize` | int | `1024` | (target, query) pairs per GPU batch | +| `workerThreads` | int | `-1` (auto) | GPU runner threads per GPU | +| `preprocessingThreads` | int | `-1` (auto) | CPU threads for preprocessing | +| `maxMatches` | int | `0` (unlimited) | Max matches returned per (target, query) pair | +| `uniquify` | bool | `False` | Drop duplicate matches that differ only in atom enumeration order | +| `gpuIds` | `list[int] \| None` | `None` (current device only) | Specific device ordinals to target | + +Substructure search currently does not support chirality-aware matching, enhanced stereochemistry, or other advanced RDKit `SubstructMatchParameters` options. + +## Recipes + +### Morgan fingerprints + bulk Tanimoto similarity + +```python +import torch +from rdkit import Chem +from nvmolkit.fingerprints import MorganFingerprintGenerator +from nvmolkit.similarity import crossTanimotoSimilarity + +smiles = ["CCO", "CCN", "c1ccccc1", "CC(=O)O", "CCOCC"] +mols = [Chem.MolFromSmiles(smi) for smi in smiles] + +fpgen = MorganFingerprintGenerator(radius=2, fpSize=1024) +fps = fpgen.GetFingerprints(mols) + +sim = crossTanimotoSimilarity(fps) +torch.cuda.synchronize() +print(sim.torch()) +``` + +Inputs are `list[Mol]`. Output of `GetFingerprints` is an `AsyncGpuResult` wrapping an `(n_mols, fpSize / 32)` int32 tensor of packed bits. Pass it straight into `crossTanimotoSimilarity` for an `(n, n)` similarity matrix; pass two fingerprint sets for an `(n, m)` cross-matrix. For sets too large to materialize on the GPU, use `crossTanimotoSimilarityMemoryConstrained` (chunked compute, returns numpy on CPU). + +### ETKDG conformer embedding + +```python +from rdkit.Chem import AddHs, MolFromSmiles +from rdkit.Chem.rdDistGeom import ETKDGv3 +from nvmolkit.embedMolecules import EmbedMolecules + +mols = [AddHs(MolFromSmiles(smi)) for smi in ["C1CCCCC1", "C1CCCCC2CCCCC12", "COO"]] +params = ETKDGv3() +params.useRandomCoords = True + +EmbedMolecules(mols, params, confsPerMolecule=10, maxIterations=-1) + +for mol in mols: + print(mol.GetNumConformers()) +``` + +Inputs are `list[Mol]`, sanitized and with hydrogens added (`AddHs`). Conformers are added in-place. `params.useRandomCoords` must be `True` - nvMolKit's ETKDG only supports random-coord initialization. A handful of niche `EmbedParameters` options are not supported (bounds matrices, custom CPCI, coord maps, separate-fragment embedding); the Features section of the docs site lists the full restrictions. + +### MMFF94 minimization of a batch of conformers + +```python +from rdkit.Chem import AddHs, MolFromSmiles +from rdkit.Chem.rdDistGeom import ETKDGv3 +from nvmolkit.embedMolecules import EmbedMolecules +from nvmolkit.mmffOptimization import MMFFOptimizeMoleculesConfs + +mols = [AddHs(MolFromSmiles(smi)) for smi in ["CCO", "CCN", "c1ccccc1"]] +params = ETKDGv3(); params.useRandomCoords = True +EmbedMolecules(mols, params, confsPerMolecule=5) + +energies = MMFFOptimizeMoleculesConfs(mols, maxIters=500) +for mol, mol_energies in zip(mols, energies): + print(mol.GetNumConformers(), mol_energies) +``` + +Inputs are `list[Mol]` with conformers already populated (typically by ETKDG, RDKit's `EmbedMultipleConfs`, or a prior nvMolKit call). Coordinates are updated in place; the return is `list[list[float]]` of optimized energies aligned with the input molecule order and conformer index. UFF is identical in shape: swap in `from nvmolkit.uffOptimization import UFFOptimizeMoleculesConfs`. + +If any input molecule is `None` or lacks MMFF/UFF atom types, the call raises `ValueError`. The exception's `args[1]` is a dict with keys `"none"` and `"no_params"` listing the offending indices - useful for filtering a noisy input set. + +### Custom forcefield options + constraints (`BatchedForcefield`) + +Reach for `MMFFBatchedForcefield` / `UFFBatchedForcefield` instead of the one-shot `MMFFOptimizeMoleculesConfs` / `UFFOptimizeMoleculesConfs` when you need any of: + +- Custom `maxIters` / `forceTol` per call +- Per-molecule `nonBondedThreshold` (MMFF) or `vdwThreshold` (UFF), or per-molecule `ignoreInterfragInteractions` +- Per-molecule `MMFFMolProperties` objects (e.g. MMFF94s vs MMFF94) +- Distance, position, angle, or torsion constraints +- Standalone `compute_energy()` / `compute_gradients()` without minimization + +```python +from rdkit.Chem import AddHs, MolFromSmiles +from rdkit.Chem.rdDistGeom import EmbedMultipleConfs +from nvmolkit.batchedForcefield import MMFFBatchedForcefield + +mols = [AddHs(MolFromSmiles(smi)) for smi in ["CCO", "CCCCCC"]] +for mol in mols: + EmbedMultipleConfs(mol, numConfs=5) + +ff = MMFFBatchedForcefield( + mols, + nonBondedThreshold=[100.0, 20.0], + ignoreInterfragInteractions=True, +) + +ff[0].add_position_constraint(0, max_displ=0.1, force_constant=50.0) +ff[1].add_distance_constraint(0, 4, relative=False, min_len=1.8, max_len=2.2, force_constant=25.0) + +energies, converged = ff.minimize(maxIters=500, forceTol=1e-4) +for mol, mol_energies, mol_converged in zip(mols, energies, converged): + print(mol.GetNumConformers(), mol_energies, mol_converged) +``` + +All conformers of each input molecule are minimized in one batch. Constraints attached via `ff[i].add_*_constraint(...)` apply to every conformer of molecule `i`; constraint setters mark the wrapper dirty and the native forcefield rebuilds on the next call. Pass `output=CoordinateOutput.DEVICE` to `.minimize(...)` to keep optimized coordinates on the GPU (`Device3DResult`) instead of writing them back into RDKit conformers. UFF is the same shape: `UFFBatchedForcefield(mols, vdwThreshold=..., ...)`. + +## Going deeper + +- Full feature list, API reference, and guides: +- What changed in each release: +- Worked examples (Jupyter notebooks): the `examples/` directory in the GitHub repo diff --git a/docs/agent_skill.rst b/docs/agent_skill.rst new file mode 100644 index 0000000..9ed015b --- /dev/null +++ b/docs/agent_skill.rst @@ -0,0 +1,33 @@ +.. SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + +Cursor / agent skill +==================== + +If you use `Cursor `_ (or another agent that supports the +``SKILL.md`` format) to write code that calls nvMolKit, the repository ships an +agent skill you can copy into your own project. The skill teaches the agent the +public Python API surface with runnable recipes for common workflows. + +Get the skill +------------- + +The skill lives at +`agent-skills/nvmolkit-usage/ `_ +in the nvMolKit GitHub repo. + +Copy that directory into one of: + +* your project's ``.cursor/skills/`` to make it available in that project, or +* ``~/.cursor/skills/`` to make it available in every project on your machine. + +The skill is loaded by the agent when it sees a request that matches its +trigger terms (``nvmolkit``, ``Morgan fingerprint``, ``ETKDG``, ``MMFF``, +``UFF``, ``Tanimoto``, ``RDKit GPU``, etc.). It does not need any further +setup. + +Scope +----- + +The skill is aimed at developers calling the installed nvMolKit Python API. +Building nvMolKit from source is out of scope; for that, see the +:ref:`Installation ` section of the overview page. diff --git a/docs/index.rst b/docs/index.rst index 74301d3..32f1a0a 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -65,6 +65,8 @@ Source Code ----------- nvMolKit is open source under the Apache License, and is available on `GitHub `_. +.. _installation: + Installation ------------ @@ -211,6 +213,7 @@ Guides similarity forcefield autotune + agent_skill API Reference diff --git a/nvmolkit/tests/test_skill.py b/nvmolkit/tests/test_skill.py new file mode 100644 index 0000000..4d4ac75 --- /dev/null +++ b/nvmolkit/tests/test_skill.py @@ -0,0 +1,62 @@ +# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Verify every Python code block in the nvmolkit-usage agent skill executes against the installed nvMolKit.""" + +import re +import subprocess +import sys +from pathlib import Path + +import pytest + +REPO_ROOT = Path(__file__).resolve().parents[2] +SKILL_PATH = REPO_ROOT / "agent-skills" / "nvmolkit-usage" / "SKILL.md" + +_PY_BLOCK_RE = re.compile(r"```python\n(.*?)```", re.DOTALL) + + +def _extract_python_blocks(skill_path: Path) -> list[str]: + if not skill_path.is_file(): + raise FileNotFoundError( + f"nvmolkit-usage agent skill not found at {skill_path}. " + f"test_skill expects the skill to live at " + f"agent-skills/nvmolkit-usage/SKILL.md relative to the repo root; " + f"update SKILL_PATH if the skill has moved." + ) + text = skill_path.read_text() + return [match.group(1) for match in _PY_BLOCK_RE.finditer(text)] + + +@pytest.mark.parametrize( + "snippet_idx, snippet", + list(enumerate(_extract_python_blocks(SKILL_PATH))), +) +def test_skill_snippet_runs(snippet_idx: int, snippet: str, tmp_path: Path) -> None: + script_path = tmp_path / f"snippet_{snippet_idx}.py" + script_path.write_text(snippet) + result = subprocess.run( + [sys.executable, str(script_path)], + cwd=tmp_path, + capture_output=True, + text=True, + timeout=300, + ) + assert result.returncode == 0, ( + f"Skill snippet {snippet_idx} failed:\n" + f"--- snippet ---\n{snippet}\n" + f"--- stdout ---\n{result.stdout}\n" + f"--- stderr ---\n{result.stderr}\n" + )