Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

None type Issue When Running mk_prepare_ligand.py #268

Closed
hezhizhang-269 opened this issue Dec 8, 2024 · 8 comments
Closed

None type Issue When Running mk_prepare_ligand.py #268

hezhizhang-269 opened this issue Dec 8, 2024 · 8 comments

Comments

@hezhizhang-269
Copy link

Hello,

I encountered an error when running mk_prepare_ligand.py to process an .sdf file. While attempting to generate a .pdbqt file, several atoms in the molecule were reported as having None type, preventing the script from completing the .pdbqt file generation.

This is the largest ligand I have ever processed using mk_prepare_ligand.py. I am wondering if the issue might be related to the ligand's size exceeding the maximum number of atoms that Meeko can handle. Could this limitation be causing the atoms to be flagged as None during processing?

The input .sdf file is attached here 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH_h_added.sdf.txt.

The command I used to run the script and the outputs were as follows:

caomq@cscigpu06:~/1u6p$ mk_prepare_ligand.py -i 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH_h_added.sdf -o 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH_h_added.pdbqt
/home/caomq/.local/lib/python3.8/site-packages/scipy/fft/__init__.py:97: DeprecationWarning: The module numpy.dual is deprecated.  Instead of using dual, use the functions directly from numpy or scipy.
  from numpy.dual import register_func
/home/caomq/.local/lib/python3.8/site-packages/scipy/special/orthogonal.py:81: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  from numpy import (exp, inf, pi, sqrt, floor, sin, cos, around, int,
/home/caomq/.local/lib/python3.8/site-packages/pandas/core/computation/expressions.py:20: UserWarning: Pandas requires version '2.7.3' or newer of 'numexpr' (version '2.7.1' currently installed).
  from pandas.core.computation.check import NUMEXPR_INSTALLED
atom number 2994 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 2995 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 2997 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 2999 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3021 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3022 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3023 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3024 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3025 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3026 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3028 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3030 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3054 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3055 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3056 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3057 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3058 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3059 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3060 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3063 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3089 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3090 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3091 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3093 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3095 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3096 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3097 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3123 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3124 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3125 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3127 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3129 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3130 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3131 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3157 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3158 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3159 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3161 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3163 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3164 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3165 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3188 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3189 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3190 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3191 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3192 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3193 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3194 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3195 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3219 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3220 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3221 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3222 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3223 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3224 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3225 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH
atom number 3226 has None type, mol name: 1u6p_aptamer_Model_1_with_heterogeneous_atoms_removedH

I would greatly appreciate any guidance or recommendations you could provide.

@diogomart
Copy link
Contributor

Atom types are assigned with SMARTS patters and RDKit's default maximum of substructure matches is 1000. Your molecule is very large at over 3000 atoms and some did not get typed. Fixed in #270.

The molecule has two fragments: newer versions of meeko will raise an error for that (you should update to 0.6.1).

I'm curious: why are treating that molecule as a ligand instead of as a receptor?

@hezhizhang-269
Copy link
Author

Hi @diogomart,

Thanks for the information and the fix!

I’m preparing a set of aptamers, which are usually the smaller ones, so I should definitely reconsider treating them as the ligand.

Regarding the multiple fragments, how should I handle that? Should I just keep one of them in the SDF? Also, how does the current version of Meeko handle this? Does it silently prepare only the first or largest fragment?

@rwxayheee
Copy link
Contributor

rwxayheee commented Dec 9, 2024

Hi @hezhizhang-269
Not an expert in aptamer modeling, but this input structure appears to be two fragments because they are two molecules. Based on the original structure of PDB 1U6P, there is a missing residue named AP7, which is an alternate protonation form of Adenosine and might be considered a nonstandard residue in the modeling program you have used to generate the structure. You can very easily find the gap after alignment with the original PDB structure.

Meeko doesn't silently prepare only the first fragment. It raises an error like this if the input ligand contains two fragments:

Traceback (most recent call last):
  File "/Users/amyhe/micromamba/envs/has_espaloma/bin/mk_prepare_ligand.py", line 33, in <module>
    sys.exit(load_entry_point('meeko', 'console_scripts', 'mk_prepare_ligand.py')())
  File "/Users/amyhe/Desktop/0_forks/Meeko/meeko/cli/mk_prepare_ligand.py", line 641, in main
    molsetups = preparator.prepare(mol, rename_atoms=args.rename_atoms)
  File "/Users/amyhe/Desktop/0_forks/Meeko/meeko/preparation.py", line 509, in prepare
    setup = setup_class.from_mol(
  File "/Users/amyhe/Desktop/0_forks/Meeko/meeko/molsetup.py", line 1611, in from_mol
    raise ValueError(f"RDKit molecule has {len(Chem.GetMolFrags(mol))} fragments. Must have 1.")
ValueError: RDKit molecule has 2 fragments. Must have 1.

This is because the current flexibility model for ligand allows only one ROOT per ligand.

@hezhizhang-269
Copy link
Author

Hi @rwxayheee,

Thank you so much for pointing this out! It looks like I carelessly removed the AP7 residue while attempting to remove the ZN HETATM. I appreciate you catching that oversight on my part.

Also, I’m sorry for any confusion I caused earlier. Could you clarify what Meeko 0.5.0 specifically does in this context? I'm trying to understand its role better.

Thanks again for your help!

@rwxayheee
Copy link
Contributor

rwxayheee commented Dec 10, 2024

Hi @hezhizhang-269

Could you clarify what Meeko 0.5.0 specifically does in this context? I'm trying to understand its role better.

I'm not sure if I fully understand the question, so please feel free to comment if this doesn't address your question:

  • Role of Meeko
    You have used Meeko for ligand preparation for docking with AutoDock programs. This process typically includes processing of structures (nonpolar hydrogens are merged, and glue atoms could be added for macrocycles, etc.), and determination of ad4 atom types and flexibility model of the processed structure. During the process, Meeko reads the input structure and checks for integrity to make sure the output PDBQT file has the correct information.

  • Issues encountered
    The original issue you encountered (possibly with an older version) was that part of Meeko uses RDKit's substructure matches function during atom typing. It has a default upper limit of how many matches the search of substructure could go through. This limit is rarely exceeded in small organic molecules, but since your input system is large, after hitting the upper limit the atom typing was incomplete and that gave the atom type is None error.

  • Version difference and recommendations
    There shouldn't be a major difference between v0.5.0 and v0.6.1 with the output files generated - with both version, the ligand cannot have multiple fragments.
    I was also watching openbabel and based on your comment (Issue with Molecule Partitioning During PDB to PDBQT Conversion in Open Babel openbabel/openbabel#2736 (comment)), it seems like you wanted to perform rigid (ligand) docking. If that's the case, you could use also just use mk_prepare_receptor.py instead. Just make sure to adjust the format (add the ROOT/ENDROOT keywords). If the "ligand" was prepared with mk_prepare_receptor.py, it could possibly have multiple fragments ^^

Let us know if you need any further assistance!

@hezhizhang-269
Copy link
Author

Hi @rwxayheee,

Thank you so much for your detailed explanation! I’ve been reflecting on past circumstances where I observed my PDBQT files being significantly smaller than the original PDB or SDF files. At the time, I thought this might have been caused by multiple fragments in the input file for Meeko. This led me to wonder if Meeko 0.5.0 might silently prepare only one fragment from the input file.

I also really appreciate you addressing my question about Open Babel. My goal is to perform docking as efficiently and accurately as possible. While flexible docking using mk_prepare_ligand.py achieves the best performance (even though I’ve made some mistakes in certain cases as you can tell), it can take too long for larger PDB complexes.

Could you elaborate a bit more on how to perform rigid docking using Meeko? I’ve noticed that the "flexible" ligand PDBQT files I generate with Meeko usually include a pair of ROOT and ENDROOT encapsulating the first few atoms in the file, along with multiple BRANCH and ENDBRANCH entries. How should I modify these files to make the ligand rigid?

Thank you again for your insights and support!

@rwxayheee
Copy link
Contributor

Hi @hezhizhang-269
I just looked at the commit history. Based on this I feel that you might be right that the previous version didn't check for number of fragments!

I think mk_prepare_receptor.py could be used, in your case, to prepare a rigid ligand. Especially if by any chance you wish to have multiple fragments (having some gaps in the aptamers), the only possible way right now is to put both fragments into the same ROOT. A rigid ligand has no flexible BRANCHES, which means all atoms will be in ROOT, and the atom section can come from a receptor PDBQT file, a rigid macromolecule with Q (charge) and T (atom type) but no flexibility model.

Here's how I would prepare the RNA part of 1U6P into a rigid ligand PDBQT:

  1. Separate the RNA from the original structure from Protein Data Bank:
    1u6p_rna.pdb.txt

  2. Run mk_prepare_receptor.py, which will add hydrogens to the RNA, including AP7

Command: mk_prepare_receptor.py -i 1u6p_rna.pdb -o 1u6p_rna -p
Output:
1u6p_rna.pdbqt.txt

  1. Manipulate the PDBQT file - simply add the keywords ROOT, ENDROOT and TORSDOF 0 (and optionally some REMARK at the beginning). See my edits in:
    1u6p_rna.pdbqt.txt
    This kind of ligand PDBQT file is dockable (if the docking is not killed by other limit, like atom count or memory), although lacking the REMARK section for Smiles and index mapping information.

@hezhizhang-269
Copy link
Author

Hi @rwxayheee,

Thank you for the clarification and detailed explanation! I really appreciate your insights, and I’ll definitely try your suggested approach with mk_prepare_receptor.py after upgrading.

I’ll go ahead and close the issue now. Thanks again for your support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants