Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Only H allowed to be in SMILES but not in PDBQT #305

Closed
kaiweiii opened this issue Jan 8, 2025 · 7 comments
Closed

RuntimeError: Only H allowed to be in SMILES but not in PDBQT #305

kaiweiii opened this issue Jan 8, 2025 · 7 comments

Comments

@kaiweiii
Copy link

kaiweiii commented Jan 8, 2025

I'm trying to export the poses from autodock vina to sdf for the further analysis in ProLIF.

I first read the pdbqt file using
pdbqt_mol = PDBQTMolecule.from_file("377-38-8_1erea_wh_out.pdbqt", skip_typing=True)
It worked and the variable pdbqt_mol showed
<Molecule named 377-38-8_1erea_wh_out containing 20 poses of 12 atoms>

I did not prepare the ligand using Meeko so I added the smiles using
pdbqt_mol._pose_data['smiles'] = {0: 'O=C(O)C(F)(F)C(F)(F)C(=O)O'}

When I ran
RDKitMolCreate.from_pdbqt_mol(pdbqt_mol)
Error came up as
`RuntimeError Traceback (most recent call last)
Cell In[42], line 1
----> 1 RDKitMolCreate.from_pdbqt_mol(pdbqt_mol)

File c:\Users\kaiwe\AppData\Local\Programs\Python\Python312\Lib\venv\meekoenv\Lib\site-packages\meeko\rdkit_mol_create.py:232, in RDKitMolCreate.from_pdbqt_mol(cls, pdbqt_mol, only_cluster_leads, keep_flexres)
230 pdbqt_mol._current_pose = i
231 coordinates = pdbqt_mol.positions(atom_idx)
--> 232 mol = cls.add_pose_to_mol(mol, coordinates, index_map)
233 coordinates_all_poses.append(coordinates)
235 # add Hs only after all poses are added as conformers
236 # because Chem.AddHs() will affect all conformers at once

File c:\Users\kaiwe\AppData\Local\Programs\Python\Python312\Lib\venv\meekoenv\Lib\site-packages\meeko\rdkit_mol_create.py:330, in RDKitMolCreate.add_pose_to_mol(cls, mol, ligand_coordinates, index_map)
328 atom = mol.GetAtomWithIdx(i)
329 if atom.GetAtomicNum() != 1:
--> 330 raise RuntimeError("Only H allowed to be in SMILES but not in PDBQT")
331 neigh = atom.GetNeighbors()
332 if len(neigh) != 1:

RuntimeError: Only H allowed to be in SMILES but not in PDBQT`

I don't understand this error because this molecule does not have any H. Below is the input pdbqt file of the chemical:

REMARK 3 active torsions: REMARK status: ('A' for Active; 'I' for Inactive) REMARK 1 A between atoms: C_1 and C_2 REMARK 2 A between atoms: C_2 and C_3 REMARK 3 A between atoms: C_3 and C_4 ROOT ATOM 1 C UNL d 1 -0.898 -0.151 -0.662 0.00 0.00 0.415 C ATOM 2 F UNL d 1 -1.332 -0.867 -1.721 0.00 0.00 -0.186 F ATOM 3 F UNL d 1 -1.397 1.090 -0.856 0.00 0.00 -0.186 F ENDROOT BRANCH 1 4 ATOM 4 C UNL d 1 -1.448 -0.767 0.625 0.00 0.00 0.382 C ATOM 5 O UNL d 1 -1.863 0.098 1.517 0.00 0.00 -0.476 OA ATOM 6 O UNL d 1 -1.552 -1.964 0.805 0.00 0.00 -0.244 OA ENDBRANCH 1 4 BRANCH 1 7 ATOM 7 C UNL d 1 0.661 -0.117 -0.658 0.00 0.00 0.415 C ATOM 8 F UNL d 1 1.150 -1.376 -0.622 0.00 0.00 -0.186 F ATOM 9 F UNL d 1 1.111 0.403 -1.820 0.00 0.00 -0.186 F BRANCH 7 10 ATOM 10 C UNL d 1 1.244 0.691 0.501 0.00 0.00 0.382 C ATOM 11 O UNL d 1 0.650 1.562 1.105 0.00 0.00 -0.244 OA ATOM 12 O UNL d 1 2.535 0.535 0.641 0.00 0.00 -0.476 OA ENDBRANCH 7 10 ENDBRANCH 1 7 TORSDOF 3

Is it because the way I added the smiles back to the variable pdbqt_mol does not work? If so, is there any suggestion to make this work with the ligand pdbqt not prepared by Meeko? Thank you so much!

@rwxayheee
Copy link
Contributor

rwxayheee commented Jan 8, 2025

Hi @kaiweiii
This error message means it's unable to retrieve one or more heavy atom coordinates from PDBQT file. In addition to the Smiles string, another minimum required information for pdbqt_mol._pose_data to have is ['smiles_index_map']. Normally, it's parsed from the REMARK SMILES IDX section if ligand was prepared with Meeko. See an example:

https://github.com/forlilab/Meeko/blob/release/example/tutorial1/imatinib_protomer-1.pdbqt

To assign it manually:

from meeko import PDBQTMolecule
from meeko import RDKitMolCreate
from rdkit import Chem

pdbqt_fn = "old_format.pdbqt"
pdbqt_mol = PDBQTMolecule.from_file(pdbqt_fn, skip_typing=True)

pdbqt_mol._pose_data['smiles'] = {0: 'O=C([O-])C(F)(F)C(F)(F)C(=O)[O-]'}
pdbqt_mol._pose_data['smiles_index_map'] = {0: [1,5,2,4,3,6,4,1,5,2,6,3,7,7,8,8,9,9,10,10,11,11,12,12]}

rdkit_mol = RDKitMolCreate.from_pdbqt_mol(pdbqt_mol)

sdf_fn = "from_old_format.sdf"
with Chem.SDWriter(sdf_fn) as writer:
    for mol in rdkit_mol:
        writer.write(mol)

ps. In PR #296 there's a change of this error message. I made it more generic because there are a few different reasons for failing to set atom coordinates from pdbqt mol, but at that point it's hard to tell what's the cause

@kaiweiii
Copy link
Author

kaiweiii commented Jan 8, 2025

Hi @rwxayheee Thank you so much for the prompt response.
Just one more question, can you explain or give me directions for creating the 'smiles_index_map'? I found the function remark_index_map in write.py. Is this function to generate the map list directly? What are those inputs?

@rwxayheee
Copy link
Contributor

rwxayheee commented Jan 8, 2025

Hi @kaiweiii
I'm not sure if the function in write.py can do this. Here's my solution, to reconstruct the index mapping by re-using another functionality match in the meeko universe that belongs to class ResidueTemplate. This is partly how Meeko perceives polymer (protein, nucleic acids) residues from PDB file and to ensure a reliable conversion to an RDKit molecule per residue:

Please see #305 (comment) for a full example that includes polar Hs.

Let me know what you think!

@rwxayheee
Copy link
Contributor

rwxayheee commented Jan 8, 2025

Hi, Just another note -
If your PDBQT has specific positions of polar hydrogens, their positions need to be updated in a separate step (add_hydrogens) that depends on pdbqt_mol._pose_data['smiles_h_parent']. There should be a way to reconstruct it, but I forgot about it since this ligand molecule happens to have no hydrogens.

Can you please let us know where you obtained the PDBQT files, if from an open database? Do you have many of these old-format PDBQT files that need to be processed into rdkit molecules? If so, I can provide a better procedure to handle this.

@rwxayheee
Copy link
Contributor

rwxayheee commented Jan 9, 2025

Here, I added the correction of the element column and the generation of h_parent mapping. The example input:
imatinib_protomer-1_old_format.pdbqt.txt

from rdkit import Chem
from meeko import ResidueTemplate

# read pdbqt file
pdbqt_fn = "imatinib_protomer-1_old_format.pdbqt"
with open(pdbqt_fn, "r") as f:
    pdbqt_lines = f.readlines()

# replace atom type A by C
col_id = 77
pdbqt_block = ""
for line in pdbqt_lines: 
    if len(line) > col_id and line[col_id] == "A":
        # Replace A by C
        line = line[:col_id] + "C" + line[col_id + 1:]
    pdbqt_block += line

# parse as PDB
mol_from_pdb = Chem.MolFromPDBBlock(pdbqt_block, removeHs = False)

# create ResidueTemplate
lig_smi = "Cc1ccc(NC(=O)c2ccc(C[N@H+]3CC[N@@H+](C)CC3)cc2)cc1Nc1nccc(-c2cccnc2)n1"
lig_rt = ResidueTemplate(lig_smi)

# recompute smiles index mapping
mapping = lig_rt.match(input_mol = mol_from_pdb)
zero_based = mapping[1]
index_map = [val for k, v in zero_based.items() for val in (k + 1, v + 1)]
print(index_map)

# recompute h parent mapping
h_parent = []
zero_based_inv = {v: k for k,v in zero_based.items()}
for atom in mol_from_pdb.GetAtoms():
    if atom.GetAtomicNum()==1:
        parent_atom = atom.GetNeighbors()[0]
        parent_index_in_smi = zero_based_inv[parent_atom.GetIdx()]
        h_parent.extend([parent_index_in_smi+1, atom.GetIdx()+1])
print(h_parent)

# create PDBQTMolecule
from meeko import PDBQTMolecule
from meeko import RDKitMolCreate
pdbqt_mol = PDBQTMolecule.from_file(pdbqt_fn, skip_typing=True)

# put in attributes
pdbqt_mol._pose_data['smiles'] = {0: lig_smi}
pdbqt_mol._pose_data['smiles_index_map'] = {0: index_map}
pdbqt_mol._pose_data['smiles_h_parent'] = {0: h_parent}

# create rdkit mol
rdkit_mol = RDKitMolCreate.from_pdbqt_mol(pdbqt_mol)

# output as SDF for inspection
sdf_fn = "mine_from_pdb.sdf"
with Chem.SDWriter(sdf_fn) as writer:
    for mol in rdkit_mol:
        writer.write(mol)

@kaiweiii Could you help us understand the scenarios where this function would be useful? We can add it as a utility function for backward support, but would like to learn more about you prepared the PDBQT file or source of the PDBQT files without REMARKS

@kaiweiii
Copy link
Author

kaiweiii commented Jan 9, 2025

@rwxayheee Thank you soooooo much for everything! Your method works and after reading the code carefully, I can understand what the index map means now. (It is the atom index+1 in the mol (from smiles), followed by the index+1 of the same atom in pdb(qt), then the next atom index in the mol, ...) :)

I did not know about Meeko when I started to use docking as a beginner last July. My methods might be silly. I usually have two ways to prepare the PDBQT from smiles:

  1. I first used obabel to convert smiles to mol2. In obabel, it allows me to choose the pH and to minimize the 3d structure with a selected force field (though the deprotonation does not always work). Then I converted the mol2 to PDBQT using the prepare_ligand4.py in AutoDock. I used this Python script because it has an option to keep the hydrogen atoms. We want to keep the hydrogen to somehow be consistent with the previous work.
  2. the second method is to convert the sdf file downloaded from PubChem to mol2, then to PDBQT. This is used when obabel generates a structure with super wrong atom connection or bond type.

I hope that can help! All of my result files are without REMARK so your suggestion here is super helpful. I guess this can also be useful when dealing with pdb file without the conect section. I think this is a very good way to convert PDBQT to sdf for analysis with any packages.

@kaiweiii kaiweiii closed this as completed Jan 9, 2025
@rwxayheee
Copy link
Contributor

Hi @kaiweiii
Thank you for the useful feedback. Happy to learn that it helps with handling PDBQT files without the remark section.

I first used obabel to convert smiles to mol2. In obabel, it allows me to choose the pH and to minimize the 3d structure with a selected force field (though the deprotonation does not always work). Then I converted the mol2 to PDBQT using the prepare_ligand4.py in AutoDock. I used this Python script because it has an option to keep the hydrogen atoms. We want to keep the hydrogen to somehow be consistent with the previous work.

We have another project, https://github.com/forlilab/scrubber, that tries to find reasonable protonation states prior to ligand file conversion. Currently, it works in a similar manner to obabel but we're developing better methods. If you're interested in ligand preparation with Meeko, Scrubber has a good interface to work seamlessly with Meeko. Check out some basic examples here we prepared, to start docking with Scrubber & Meeko:

https://meeko.readthedocs.io/en/release/colab_examples.html

When ligands are prepared with Meeko, they are converted to RDKit molecules first then port to a PDBQT file with Smiles and the mapping information, to prevent information loss. Later, this information will be re-used to re-construct the RDKit molecule. To me, writing this information in advance feels (slightly?) more efficient than reconstructing it by matching (max common substructure match).

Thank you again for bringing this to our attention. Your issue is helpful for people who have performed docking with PDBQT files without the remark section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants