Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poses ordering in a sdf mk_exported from a dlg output. #311

Closed
xavgit opened this issue Jan 18, 2025 · 8 comments
Closed

Poses ordering in a sdf mk_exported from a dlg output. #311

xavgit opened this issue Jan 18, 2025 · 8 comments

Comments

@xavgit
Copy link

xavgit commented Jan 18, 2025

Hi,
inspecting an sdf mk_exported from a dlg output I've noticed that the poses
are not ordered starting with that having the lowest value of the free energy.

In my case :
$ less ../docking_results_sdf/DB03966_docking_res_ad4_sf.sdf | grep free_energy
{"is_sidechain": [false], "free_energy": -6.22, "intermolecular_energy": -10.4, "internal_energy": -5.62, "cluster_size": 1, "cluster_id": 10, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -6.67, "intermolecular_energy": -10.85, "internal_energy": -5.2, "cluster_size": 2, "cluster_id": 8, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -7.76, "intermolecular_energy": -11.94, "internal_energy": -5.06, "cluster_size": 1, "cluster_id": 2, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -7.26, "intermolecular_energy": -11.44, "internal_energy": -5.63, "cluster_size": 3, "cluster_id": 4, "rank_in_cluster": 2}
{"is_sidechain": [false], "free_energy": -6.98, "intermolecular_energy": -11.16, "internal_energy": -5.2, "cluster_size": 1, "cluster_id": 5, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -7.55, "intermolecular_energy": -11.73, "internal_energy": -5.03, "cluster_size": 1, "cluster_id": 3, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -8.03, "intermolecular_energy": -12.2, "internal_energy": -4.2, "cluster_size": 1, "cluster_id": 1, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -7.27, "intermolecular_energy": -11.44, "internal_energy": -5.74, "cluster_size": 3, "cluster_id": 4, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -6.94, "intermolecular_energy": -11.12, "internal_energy": -4.87, "cluster_size": 1, "cluster_id": 6, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -6.71, "intermolecular_energy": -10.89, "internal_energy": -5.41, "cluster_size": 1, "cluster_id": 7, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -6.35, "intermolecular_energy": -10.53, "internal_energy": -5.46, "cluster_size": 2, "cluster_id": 8, "rank_in_cluster": 2}
{"is_sidechain": [false], "free_energy": -6.03, "intermolecular_energy": -10.21, "internal_energy": -5.51, "cluster_size": 3, "cluster_id": 11, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -5.81, "intermolecular_energy": -9.99, "internal_energy": -5.59, "cluster_size": 3, "cluster_id": 11, "rank_in_cluster": 3}
{"is_sidechain": [false], "free_energy": -6.02, "intermolecular_energy": -10.19, "internal_energy": -5.52, "cluster_size": 3, "cluster_id": 11, "rank_in_cluster": 2}
{"is_sidechain": [false], "free_energy": -6.42, "intermolecular_energy": -10.6, "internal_energy": -5.65, "cluster_size": 1, "cluster_id": 9, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -7.12, "intermolecular_energy": -11.3, "internal_energy": -5.57, "cluster_size": 3, "cluster_id": 4, "rank_in_cluster": 3}

It is possible to add an option to mk_export.py to make the sdf having the poses
ordered starting from the lowest value of the free energy?
To have for example, not considering the sort -k4,4r, something like the following:

$ less ../docking_results_sdf/DB03966_docking_res_ad4_sf.sdf | grep free_energy | sort -k4,4r
{"is_sidechain": [false], "free_energy": -8.03, "intermolecular_energy": -12.2, "internal_energy": -4.2, "cluster_size": 1, "cluster_id": 1, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -7.76, "intermolecular_energy": -11.94, "internal_energy": -5.06, "cluster_size": 1, "cluster_id": 2, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -7.55, "intermolecular_energy": -11.73, "internal_energy": -5.03, "cluster_size": 1, "cluster_id": 3, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -7.27, "intermolecular_energy": -11.44, "internal_energy": -5.74, "cluster_size": 3, "cluster_id": 4, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -7.26, "intermolecular_energy": -11.44, "internal_energy": -5.63, "cluster_size": 3, "cluster_id": 4, "rank_in_cluster": 2}
{"is_sidechain": [false], "free_energy": -7.12, "intermolecular_energy": -11.3, "internal_energy": -5.57, "cluster_size": 3, "cluster_id": 4, "rank_in_cluster": 3}
{"is_sidechain": [false], "free_energy": -6.98, "intermolecular_energy": -11.16, "internal_energy": -5.2, "cluster_size": 1, "cluster_id": 5, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -6.94, "intermolecular_energy": -11.12, "internal_energy": -4.87, "cluster_size": 1, "cluster_id": 6, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -6.71, "intermolecular_energy": -10.89, "internal_energy": -5.41, "cluster_size": 1, "cluster_id": 7, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -6.67, "intermolecular_energy": -10.85, "internal_energy": -5.2, "cluster_size": 2, "cluster_id": 8, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -6.42, "intermolecular_energy": -10.6, "internal_energy": -5.65, "cluster_size": 1, "cluster_id": 9, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -6.35, "intermolecular_energy": -10.53, "internal_energy": -5.46, "cluster_size": 2, "cluster_id": 8, "rank_in_cluster": 2}
{"is_sidechain": [false], "free_energy": -6.22, "intermolecular_energy": -10.4, "internal_energy": -5.62, "cluster_size": 1, "cluster_id": 10, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -6.03, "intermolecular_energy": -10.21, "internal_energy": -5.51, "cluster_size": 3, "cluster_id": 11, "rank_in_cluster": 1}
{"is_sidechain": [false], "free_energy": -6.02, "intermolecular_energy": -10.19, "internal_energy": -5.52, "cluster_size": 3, "cluster_id": 11, "rank_in_cluster": 2}
{"is_sidechain": [false], "free_energy": -5.81, "intermolecular_energy": -9.99, "internal_energy": -5.59, "cluster_size": 3, "cluster_id": 11, "rank_in_cluster": 3}

Being used to Vina the first pose is the "best".

Thanks.

Saverio

@diogomart
Copy link
Contributor

Hi Saverio,

in the latest version (v0.6.1) poses are sorted by score by default. In earlier versions, you can pass option -c or --only_cluster_leads and that will sort them by score.

@xavgit
Copy link
Author

xavgit commented Jan 20, 2025

Hi,
thanks for the suggestion.

Saverio

@xavgit
Copy link
Author

xavgit commented Jan 23, 2025

Hi,
I've used the mk_export.py of the meeko 0.6.1 develop to avoid the nan problem previously
described and fixed by your team.
Out of 10042 mk_exported .dlg to sdf 9995 files are not ordered.

Here what I' ve done for one .dlg to report the problem:

$ cd sources/Meeko-develop/
xxxx@xxxx-MS-7E16:~/sources/Meeko-develop$ pip3 install -e .
Defaulting to user installation because normal site-packages is not writeable
Obtaining file:///home/xxxx/sources/Meeko-develop
Preparing metadata (setup.py) ... done
Installing collected packages: meeko
Attempting uninstall: meeko
Found existing installation: meeko 0.6.1
Uninstalling meeko-0.6.1:
Successfully uninstalled meeko-0.6.1
Running setup.py develop for meeko
xxxx@xxxx-MS-7E16:~/sources/Meeko-develop$ cd ~/Desktop/mk_export_problem/
xxxx@xxxx-MS-7E16:~/Desktop/mk_export_problem$ mk_export.py DB14859_docking_res_ad4_sf.dlg -s DB14859_docking_res_ad4_sf.sdf --all_dlg_poses
xxxx@xxxx-MS-7E16:~/Desktop/mk_export_problem$ less DB14859_docking_res_ad4_sf.sdf | grep free_energy | awk '{ print $4 }'
-7.97,
-7.25,
-5.68,
-8.23,
-6.24,
-5.27,
-6.6,
-8.5,
-6.77,
-5.79,
-8.13,
-8.6,
-7.96,
-9.04,
-6.07,
-7.59,

Where I'm wrong?

Thanks.

Saverio Lemme

DB14859_docking_res_ad4_sf.dlg.txt

@diogomart
Copy link
Contributor

With --all_dlg_poses they won't be ordered.

By default, without passing --all_dlg_poses, mk_export.py exports the clusters leads which are sorted by autodock-gpu. If you don't pass this option, poses will be sorted.

@xavgit
Copy link
Author

xavgit commented Jan 23, 2025

Hi,
I've used --all_dlg_poses as without it the number of poses
in the produced sdf were less than the one present in the
dlg.
In fact it can be shown with the following:

$ less DB14859_docking_res_ad4_sf.sdf | grep free_energy | awk '{ print $4 }' | wc -l
16
$ less DB14859_docking_res_ad4_sf_no_all_poses.sdf | grep free_energy | awk '{ print $4 }' | wc -l
10
where DB14859_docking_res_ad4_sf_no_all_poses.sdf is the file obtained without the option --all_dlg_poses.

Is possible to have an sdf file with all the poses in the dlg file and ordered?

Thanks.

Saverio

@rwxayheee
Copy link
Contributor

rwxayheee commented Jan 23, 2025

Hi @xavgit
Yes, the molecules in SDF can be re-ordered by free_energy values in the meeko molecule properties.

Please see a minimal Pythin script that uses RDKit to do this:

from rdkit import Chem
import json

input_sdf = "DB14859_docking_res_ad4_sf.sdf" 
output_sdf = "DB14859_docking_res_ad4_sf_sorted.2.sdf" 

def extract_free_energy(mol):
    meeko_prop = mol.GetProp('meeko')
    # Parse the property as JSON to extract "free_energy"
    meeko_data = json.loads(meeko_prop)
    return float(meeko_data.get('free_energy', float('inf')))

input_mols = [mol for mol in Chem.SDMolSupplier(input_sdf)]
sorted_mols = sorted(input_mols, key=extract_free_energy)

writer = Chem.SDWriter(output_sdf)
for mol in sorted_mols:
    writer.write(mol)
writer.close()

@xavgit
Copy link
Author

xavgit commented Jan 23, 2025

Hi,
thanks for the python script.

In the meantime I made a very basic script to solve the order problem,
but is not as elegant as yours.

Thanks.

Saverio

@rwxayheee
Copy link
Contributor

Hi, thank you for your kind words.

There are many ways to do the sorting (can use the tools in bash for text manipulation too).
The molecule attribute is stored and the intention of this design is to be accessed by json.loads.
There’s a related issue: #138

Closing this issue as resolved. But please feel free to re-open if you have any questions, thoughts and suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants