osprey uses a custom database for storing energies for different conformations as a cache so that multiple energy calculations need not be run for the same conformation. This database is known in the code as the confdb. For background, the database class is here. Several algorithms use the confdb, including K*, as can be seen here. Typically in the K* algorithm the confdb uses three files, complex.confdb, protein.confdb, and ligand.confdb. You may have noticed these files created in the directory in which you run K* in osprey.

These files stick around after the K* computation is done. This has some benefits, as it allows resuming a K* run that may have crashed but had already computed the energies to a lot of conformations. On the other hand, if you are not resuming a crashed run and have changed the conformation space (the input structure, forcefield, rotamers, choice of mutable and flexible residues, etc.) having K* use the confdb from a previous run is invalid and will cause the algorithm to crash or return incorrect results. Reusing invalid confdbs has caused subtle problems in the past, so a bit over a year ago we added code that, upon creating/opening a confdb, deletes the confdb file if it currently exists. The logic that does this can be seen here.
If you look in the KStar.java file, you will see that there are many places that the confdb is opened, which means that the sequences and conformations in the previously computed confdb are deleted. This wouldn't be a problem on its own, but the primary way many osprey designers print out low-energy conformational ensembles and statistics is via the use of a SequenceAnalyzer. Here is the java implementation and here and here are two examples of the SequenceAnalyzer being used in a python design file. The Sequence Analyzer extracts coordinates and energies from the confdb, and since currently the confdb is deleted and recreated after the computation of each K* score in the KStar class, the SequenceAnalyzer is not able to find conformations except for the final sequence.
We would like to retain the functionality of automatically deleting existing confdbs at the start of the K* computation, but we also want the confdbs to maintain all of the sequences and conformations it stores throughout the entire K* prediction process. To do so, we should make the following changes:
When kstar.run() is called, and if the currently existing resume field on false, then delete the three files. If the resume field is true, then do not delete the files (assume that the user is trying to resume a crashed run). Importantly, remove code the deletes the confdb in KStar's openConfDB() method.
To test that your changes work as intended, first run the kstar.py script in the examples directory. Then make your code changes, recompile, and run that same script again. The number of sequences and low-energy molecular ensembles printed out should increase, many times from 0 to some number you specify.
When complete and tested, push your changes to a branch in this repository, open a pull request into the main branch, and let one of the osprey developers know. After reviewing your change we will merge it into the main branch.
(an introduction on how to get started on contributing to osprey can be read here)
osprey uses a custom database for storing energies for different conformations as a cache so that multiple energy calculations need not be run for the same conformation. This database is known in the code as the confdb. For background, the database class is here. Several algorithms use the confdb, including K*, as can be seen here. Typically in the K* algorithm the confdb uses three files, complex.confdb, protein.confdb, and ligand.confdb. You may have noticed these files created in the directory in which you run K* in osprey.
These files stick around after the K* computation is done. This has some benefits, as it allows resuming a K* run that may have crashed but had already computed the energies to a lot of conformations. On the other hand, if you are not resuming a crashed run and have changed the conformation space (the input structure, forcefield, rotamers, choice of mutable and flexible residues, etc.) having K* use the confdb from a previous run is invalid and will cause the algorithm to crash or return incorrect results. Reusing invalid confdbs has caused subtle problems in the past, so a bit over a year ago we added code that, upon creating/opening a confdb, deletes the confdb file if it currently exists. The logic that does this can be seen here.
If you look in the KStar.java file, you will see that there are many places that the confdb is opened, which means that the sequences and conformations in the previously computed confdb are deleted. This wouldn't be a problem on its own, but the primary way many osprey designers print out low-energy conformational ensembles and statistics is via the use of a SequenceAnalyzer. Here is the java implementation and here and here are two examples of the SequenceAnalyzer being used in a python design file. The Sequence Analyzer extracts coordinates and energies from the confdb, and since currently the confdb is deleted and recreated after the computation of each K* score in the KStar class, the SequenceAnalyzer is not able to find conformations except for the final sequence.
We would like to retain the functionality of automatically deleting existing confdbs at the start of the K* computation, but we also want the confdbs to maintain all of the sequences and conformations it stores throughout the entire K* prediction process. To do so, we should make the following changes:
When
kstar.run()is called, and if the currently existingresumefield onfalse, then delete the three files. If theresumefield istrue, then do not delete the files (assume that the user is trying to resume a crashed run). Importantly, remove code the deletes the confdb in KStar'sopenConfDB()method.To test that your changes work as intended, first run the kstar.py script in the examples directory. Then make your code changes, recompile, and run that same script again. The number of sequences and low-energy molecular ensembles printed out should increase, many times from 0 to some number you specify.
When complete and tested, push your changes to a branch in this repository, open a pull request into the main branch, and let one of the osprey developers know. After reviewing your change we will merge it into the main branch.
(an introduction on how to get started on contributing to osprey can be read here)