Skip to content

fingerprints differ between batched and single molecule generation #195

@vfscalfani

Description

@vfscalfani

Hello,

When trying to learn how to use nvMolKit, I noticed differences in similarity values when comparing RDKit-generated fingerprints with nvMolKit-generated fingerprints using the same fingerprint settings.

I then tried generating fingerprints with nvMolKit as batched and then one at a time. This produces different fingerprints for some molecules (using same mols, generator settings, and num_threads=1). Notably, the number of threads did not seem to make a difference.

I would have expected the fingerprints to be the same regardless of batch or one at a time.

RDKit (2026.03.1) and nVMolkit (0.5.0) were installed via conda-forge. Here is a minimal example below, thank you for providing this library and any clarification you can provide.

mols = []
suppl = Chem.SDMolSupplier('chembl_36.sdf')
for mol in suppl:
    if mol is None:
        continue
    mols.append(mol)
    if len(mols) >= 1_000:
        break

# create nvmolkit fp generator
fpgen = MorganFingerprintGenerator(radius=3, fpSize=2048)

# batched
fps_batched = fpgen.GetFingerprints(mols, num_threads=1)
torch.cuda.synchronize()

# single
fps_single = [fpgen.GetFingerprints([mol], num_threads=1) for mol in mols]
torch.cuda.synchronize()

# convert to tensors
batched_tensor = fps_batched.torch()
single_tensors = [fp.torch() for fp in fps_single]
    
matches = [
    torch.equal(batched_tensor[i].cpu(), single_tensors[i][0].cpu())
    for i in range(len(mols))
]

print(f"Number of molecules: {len(mols)}")
print(f"Mismatched fingerprints: {matches.count(False)} / {len(mols)}")

Output:

Number of molecules: 1000
Mismatched fingerprints: 101 / 1000

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions