Skip to content

Evaluation scripts should get the locations of maps or input pdb files used from the results metadata, not from a config file or command line arguments #213

@marcuscollins

Description

@marcuscollins

Right now, all of our evaluation scripts require a path to an input data directory, which must contain the maps and input CIF files used for ensemble generation, and a configuration file which, among other things, points to relative paths in that directory which contain the specific map and input files for each protein. Rows of this configuration file look like
23,5MHX,chain A and resi 158-167,5MHX_single_001_density_input.cif,5MHX_uniform_1.00A.ccp4,processed/5MHX,1.0
(in this example there is only one atom selection string; there are usually several, semicolon-separated selections). This line defines a ProteinConfig object (https://github.com/diff-use/sampleworks/blob/main/src/sampleworks/eval/grid_search_eval_utils.py#L224). Assuming the input data directory is "/data/inputs", the evaluation scripts look for the input cif file at /data/inputs/5MHX_single_001_density_input.cif and the input maps used for guidance in /data/inputs/processed/5MHX/ (an additional pattern is used to locate the exact map, see https://github.com/diff-use/sampleworks/blob/main/scripts/eval/rscc_grid_search_script.py#L98

Rather than constructing these paths after the fact, we should obtain them from the ensemble generation trial metadata directly. This data is stored in a file job_metadata.json in each output directory, and will soon be incorporated directly into our output CIF files #209. We should extract the required paths directly from those locations and use them, rather than trying to reconstruct them ad hoc after the fact.

Note that this depends on the paths in the metadata being actual paths on the working filesystem. Since our jobs are usually run inside Docker containers, the paths stored in the metadata today are ephemeral container paths, not the final locations of files, which depend on what external volumes are mounted to the container. See #210

Metadata

Metadata

Assignees

No one assigned

    Labels

    CIF issuesAll issues related to the writing, reading, or parsing of CIF files or objects.engineeringTask that is best suited to software engineers, not research scientists

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions