Create CLIs for full OpenFold pipeline#67
Conversation
AlphaFold Feature Dictionary CLI
|
Added New CLI tool to run data pipeline converting protein sequences to AlphaFold feature dictionaries (MSA + template search + feature generation) Addresses Issue #38 |
|
Added new CLI tool to cluster protein sequences using mmseqs2 with PDB‑style parameters Added new CLI tool to generate FASTA files from alignment directories or alignment‑DB index files Addresses Issue #38 |
Added details about additional utility CLIs for FASTA clustering and alignment extraction, including usage examples and output descriptions.
|
I think this PR does a good job of making the pipeline more usable from the command line instead of relying on individual scripts being run manually. One thing I liked is that it covers multiple stages of the workflow, including alignment, inference, visualization, and feature generation, so it feels more like a complete interface rather than just one extra utility. I also thought the stronger argument validation and run-manifest idea were good additions, since those make debugging and reproducibility a lot easier for longer HPC workflows. One thing I would still suggest checking is whether the growing number of CLI scripts could create overlap or duplicated logic over time, especially since the PR already mentions removing duplicate argparsers. It might help to think about whether some shared validation or common argument handling should be centralized early so the CLI layer stays easier to maintain as more commands get added. |
Summary
Details
Created a new directory under project root called cli/ for organizing CLI scripts
Introduces three new CLI scripts for user execution of all stages of the OpenFold pipeline: precomputing alignments, running inference, and conducting visualization
The CLI offers HPC compatibility and the inference portion allows for dry running to verify commands before consuming GPU hours
Added vizfold_cli_precompute_align.py for better standardized CLI entry
Introduces CLI to run the full data pipeline (MSA + template search + feature generation)
Added new CLI tool to cluster protein sequences using mmseqs2 with PDB‑style parameters
Added new CLI tool to generate FASTA files from alignment directories or alignment‑DB index files
Added extensive documentation under cli/USAGE.md
Example Usage
Full example usage can be found in the docs
Some examples:
Purpose