Add role-aware RDKit molecule export#59
Conversation
…it-role-aware-export
Added link to GitHub issue for better context on thiophene exports.
| return mol | ||
|
|
||
|
|
||
| PDB_CHAIN_IDS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" |
There was a problem hiding this comment.
Note: this block is added so that exported systems have valid PDB Chain identifiers. Problems arise when one creates a topology from a file that has unspecified chain information. By having this "wrap around" logic, we ensure each residue has a unique chain/residue index. One chain per "segment" only allows for 26 unique segments in the topology. This messes with downstream analysis, since there are potentially multiple matches for something like "Chain A Residue 1". The wraparound gives us 26 x 9999 ~= 260,000 unique residues in the system, where a residue is the same concept as a repeat_unit. In the future, we may consider switching to PDBx format exclusively, but this works fine for now.
| linker_refs: list[tuple[int, Primitive, ConnectorReference]] = field(default_factory=list) | ||
|
|
||
|
|
||
| class RDKitExportStrategy(ABC): |
There was a problem hiding this comment.
Note, added for completeness in case we want to export non-AA systems to RDKit. I don't know if that is needed, and this may be overkill. Happy to refactor to remove the ABC if needed.
joelaforet
left a comment
There was a problem hiding this comment.
Added comments for reviewers.
Summary
primitive_to_rdkit_mols(...), yielding one RDKitMolper SEGMENT.primitive_to_rdkit(...).Issue Coverage
Addresses #31 for the new role-aware SAAMR export path.
Issue #31 exposed a mismatch between RDKit's aromaticity model and MuPT's legacy
flattenedexport path. When a heteroaromatic repeat unit such as thiophene is read from SMILES, MuPT first builds/sanitizes an RDKit molecule inprimitive_from_smiles(). During RDKit import, MuPT stores each RDKit bond type directly on aConnectorinconnector_between_rdatoms(). For thiophene sulfur, that means twoAROMATICconnectors.The legacy
primitive_to_rdkit(...)exporter then flattens the hierarchy and callschild_prim.check_valence()before constructing the output RDKit molecule (legacy flatten/check path).check_valence()compares allowed element valence againstPrimitive.valence(implementation), andPrimitive.valenceis computed by summing numeric connector bond orders. Those numeric orders come from RDKit'sBondTypeAsDouble()table (BOND_ORDER), so two aromatic sulfur bonds are summed as1.5 + 1.5 = 3. MuPT therefore rejected a chemically valid thiophene sulfur before RDKit could receive and validate the exported graph.This PR does not remove
check_valence()or claim that valence checks are unimportant.check_valence()remains useful as a simple MuPT-side consistency check for obvious invalid atomic primitives. The problem is that this local scalar check is not expressive enough to be the final chemistry validator for RDKit export, especially for aromatic systems where RDKit's sanitization and aromaticity model determine valid valence. For the role-aware RDKit exporter, MuPT should validate the role/topology structure, preserve the original connector bond types, and then let RDKit validate the constructed chemistry.This PR avoids the failure mode for SAAMR export by bypassing the legacy flatten-and-check path. The new exporter validates and indexes the SAAMR role hierarchy with
build_saamr_role_topology_index()throughAllAtomRDKitExportStrategy, then constructs an RDKitMoldirectly from the collected atom and connector references. Bonds are added back to RDKit with their original connector bond types in_add_rdkit_bonds(), and the resulting RDKit molecule is validated with RDKit's property cache (_mol_from_rdkit_data()).Regression tests now cover thiophene, pyrrole, and pyromellitimide repeat units, plus an explicit RDKit sanitization check that exported thiophene retains aromatic bonds and a valid sulfur valence (tests).
Stack Position
This is PR 2 of 3 in the role-aware RDKit/SDF export stack.
Blocked by #58.
Resolves #31
Stack:
mupt.muptio.sdfwriter.After #58 Merges
After #58 merges into
main:issue-51/shared-saamr-topology-indexto basemain.maininto this branch, resolve, and push.After This Merges
issue-51/rdkit-role-aware-exportto basemain.Verification
python -m pytest mupt/tests/interfaces/test_shared_topology.py mupt/tests/interfaces/rdkit/test_exporters.py -q