Skip to content

Add residue index awareness to SerialAtomLabeller #40

@timbernat

Description

@timbernat

The current default implementation of SerialAtomLabeller assigns the "atom name" field (as defined in the PDB specification for each atom as its element symbol followed by a 0-padded int assigned from a running tally for each element. This guarantees unique atom names* (up to the 4 characters the PDB format permit for the name) and is suitable enough for temporary structure builds of polymers before exporting to a more robust format, e.g. SDF.

However, it has been pointed out and requested that PDB atom labelling support consistent atom names across residues (when that division of atoms makes sense), as is often used in proteins. This issue is mostly to keep a tab on this request should it be implemented in the future. In particular, this refactor would require:

  • Changes to how utils, for example, polymers.building.mbconvert call SerialAtomLabeller
  • Some indicator of residue names to be passed to SerialAtomLabeller.get_atom_label() - currently, this only expects the element symbol as input
  • A standardized notion, format, and location to store "residue" info on molecule-related objects of many types (e.g. RDKit Mols, mbuild Compounds, OpenFF Molecules, etc.), and fallbacks in case residue info cannot be found in the decided-upon format. This is the main reason why this change is tricky and not planned to be made short-term

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requesthelp wantedExtra attention is neededplanned-featureChanges or additons which are planned by the developer(s)priority:mediumModerate urgency, issue has major impacts on some but not most parts of the codebase

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions