Skip to content

RNA training omits O2' atom — predicted 2'-OH positions are non-physical #4

@Buddha7771

Description

@Buddha7771

Summary

rnapro/data/rna_dataset_allatom.py:75-103 defines ATOM_NAMES with 26 RNA atoms, but the docstring on line 140 says "Supports 27 standard RNA atoms". The missing atom is O2' (the ribose 2'-OH oxygen). Because this list is used to extract ground-truth coordinates from the training CSV (lines 401-417), the model receives no supervision for O2'. As a result, predicted O2' positions in inference output are non-physical and far from their parent ribose.

This affects all-atom geometric quality of every RNAPro RNA prediction.

Visualization

The image below shows the R1107(PDB id: 7QR3) prediction : backbone as gray sticks (cartoon overlay), all O2' atoms as red spheres. Many O2' atoms float far from any ribose:

Image

Root cause

rnapro/data/rna_dataset_allatom.py:75-103:

ATOM_NAMES = [
    "P", "OP1", "OP2", "O5'", "O3'",
    "C1'", "C2'", "C3'", "C4'", "O4'", "C5'",   # ← O2' missing here
    "N1", "C2", "O2", "N3", "C4", "N4", "C5", "C6", "O4",
    "N9", "N7", "C8", "N6", "N2", "O6"
]
# len(ATOM_NAMES) == 26 but the docstring on line 140 says "Supports 27 standard RNA atoms"

Used to construct the per-residue GT coordinate tensor (lines 401-417):

coord_cols = [f"{atom_name}_x_1" for atom_name in ATOM_NAMES] + ...
coords_matrix = group[coord_cols].values.astype(np.float32)
coords_reshaped = coords_matrix.reshape(n_residues, n_atoms, 3)   # n_atoms = 26

Atoms not in ATOM_NAMES are never loaded as GT, so the loss never penalises wrong O2' positions.

Impact

  • All-atom lDDT, all-atom RMSD, and any downstream task sensitive to 2'-OH positioning (ribozyme catalysis modelling, ligand docking, sugar pucker analysis) are unreliable.
  • Visual inspection of any prediction shows clearly unphysical O2' positions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions