Evaluation scripts should get the locations of maps or input pdb files used from the results metadata, not from a config file or command line arguments

Right now, all of our evaluation scripts require a path to an input data directory, which must contain the maps and input CIF files used for ensemble generation, and a configuration file which, among other things, points to relative paths in that directory which contain the specific map and input files for each protein. Rows of this configuration file look like
`23,5MHX,chain A and resi 158-167,5MHX_single_001_density_input.cif,5MHX_uniform_1.00A.ccp4,processed/5MHX,1.0` 
(in this example there is only one atom selection string; there are usually several, semicolon-separated selections). This line defines a ProteinConfig object (https://github.com/diff-use/sampleworks/blob/main/src/sampleworks/eval/grid_search_eval_utils.py#L224). Assuming the input data directory is "/data/inputs", the evaluation scripts look for the input cif file at `/data/inputs/5MHX_single_001_density_input.cif` and the input maps used for guidance in `/data/inputs/processed/5MHX/` (an additional pattern is used to locate the exact map, see https://github.com/diff-use/sampleworks/blob/main/scripts/eval/rscc_grid_search_script.py#L98

Rather than constructing these paths after the fact, we should obtain them from the ensemble generation trial metadata directly. This data is stored in a file `job_metadata.json` in each output directory, and will soon be incorporated directly into our output CIF files https://github.com/diff-use/sampleworks/pull/209. We should extract the required paths directly from those locations and use them, rather than trying to reconstruct them ad hoc after the fact. 

Note that this depends on the paths in the metadata being actual paths on the working filesystem. Since our jobs are usually run inside Docker containers, the paths stored in the metadata today are ephemeral container paths, not the final locations of files, which depend on what external volumes are mounted to the container. See https://github.com/diff-use/sampleworks/issues/210

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation scripts should get the locations of maps or input pdb files used from the results metadata, not from a config file or command line arguments #213

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Evaluation scripts should get the locations of maps or input pdb files used from the results metadata, not from a config file or command line arguments #213

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions