Create CLIs for full OpenFold pipeline by nipun-khanna · Pull Request #67 · AI2Science/vizfold-foundation

nipun-khanna · 2026-04-23T20:13:55Z

Summary

Adds new cli scripts that can be used as a CLI for the full OpenFold pipeline (align, infer, visualize, data/feature gen)
Adds detailed documentation for the CLI (with plans to extend these docs as new CLIs are added)

Details

Created a new directory under project root called cli/ for organizing CLI scripts
- For reviewers: If you think the cli/ directory is unnecessary or the scripts should be moved elsewhere please comment
Introduces three new CLI scripts for user execution of all stages of the OpenFold pipeline: precomputing alignments, running inference, and conducting visualization
The CLI offers HPC compatibility and the inference portion allows for dry running to verify commands before consuming GPU hours
Added vizfold_cli_precompute_align.py for better standardized CLI entry
- Added validate_args() which checks all inputs before computations and notifies issues at once.
- Added a run manifest that records all arguments used to the output directory.
- Added functionality to capture intermediate outputs to save states and see where the pipeline failed.
Introduces CLI to run the full data pipeline (MSA + template search + feature generation)
- Supports both monomer and multimer pipelines via --multimer flag
- Integration with:
  - JackHMMer for MSA generation
  - HHSearch (monomer) / HMMSearch (multimer) for template search
  - AlphaFold/OpenFold DataPipeline and DataPipelineMultimer
- Outputs a serialized feature_dict.pickle for downstream inference
- Strong input validation for required databases and binaries
- Clear logging for each pipeline stage
Added new CLI tool to cluster protein sequences using mmseqs2 with PDB‑style parameters
- Produces standardized cluster files for downstream VizFold workflows
- Integration with:
  - mmseqs2 easy-cluster pipeline
  - PDB‑style identity thresholds and coverage settings
- Outputs a reformatted text file where each line lists all {PDB_ID}_{CHAIN_ID} entries in a cluster
- Includes strong input validation for FASTA paths, mmseqs2 binary, and output directories
- Clear logging for each clustering stage
Added new CLI tool to generate FASTA files from alignment directories or alignment‑DB index files
- Supports both directory‑based alignments and compressed alignment‑DB formats
- Integration with:
  - Mgnify, UniRef90, and BFD/Uniclust alignment file formats
  - Multi‑threaded extraction for large alignment sets
- Outputs a consolidated FASTA containing one entry per chain
- Includes strong validation for alignment sources and output paths
- Clear logging for each extraction stage
Added extensive documentation under cli/USAGE.md

Example Usage
Full example usage can be found in the docs

Some examples:

python vizfold_cli_feature_dict.py
sequences.fasta templates_dir/ output_dir/
--uniref90_database_path /data/uniref90.fas

python vizfold_cli_fasta_to_clusterfile.py
sequences.fasta clusters.txt /path/to/mmseqs
--seq-id 0.4

python vizfold_cli_align_fasta.py output.fasta
--alignment-dir alignments/

Purpose

Addresses Issue Expose CLI Interfaces for Pre-trained OpenFold Inference #38 to support CLI for a crucial project script

AlphaFold Feature Dictionary CLI

MukilSundaravadivel · 2026-04-26T04:47:27Z

Added New CLI tool to run data pipeline converting protein sequences to AlphaFold feature dictionaries (MSA + template search + feature generation)
Supports both monomer and multimer pipelines via --multimer flag
Integration with:
JackHMMer for MSA generation
HHSearch (monomer) / HMMSearch (multimer) for template search
AlphaFold/OpenFold DataPipeline and DataPipelineMultimer
Outputs a serialized feature_dict.pickle for downstream inference
Strong input validation for required databases and binaries
Clear logging for each pipeline stage
Usage example
python vizfold_cli_feature_dict.py
sequences.fasta templates_dir/ output_dir/
--uniref90_database_path /data/uniref90.fas

Addresses Issue #38

Bailsnob · 2026-04-28T01:36:34Z

Added new CLI tool to cluster protein sequences using mmseqs2 with PDB‑style parameters
Produces standardized cluster files for downstream VizFold workflows
Integration with:
• mmseqs2 easy-cluster pipeline
• PDB‑style identity thresholds and coverage settings
Outputs a reformatted text file where each line lists all {PDB_ID}_{CHAIN_ID} entries in a cluster
Includes strong input validation for FASTA paths, mmseqs2 binary, and output directories
Clear logging for each clustering stage
Usage example
python vizfold_cli_fasta_to_clusterfile.py
sequences.fasta clusters.txt /path/to/mmseqs
--seq-id 0.4

Added new CLI tool to generate FASTA files from alignment directories or alignment‑DB index files
Supports both directory‑based alignments and compressed alignment‑DB formats
Integration with:
• Mgnify, UniRef90, and BFD/Uniclust alignment file formats
• Multi‑threaded extraction for large alignment sets
Outputs a consolidated FASTA containing one entry per chain
Includes strong validation for alignment sources and output paths
Clear logging for each extraction stage
Usage example
python vizfold_cli_align_fasta.py output.fasta
--alignment-dir alignments/

Addresses Issue #38

Added details about additional utility CLIs for FASTA clustering and alignment extraction, including usage examples and output descriptions.

PranavNarala1 · 2026-04-29T22:02:04Z

I think this PR does a good job of making the pipeline more usable from the command line instead of relying on individual scripts being run manually. One thing I liked is that it covers multiple stages of the workflow, including alignment, inference, visualization, and feature generation, so it feels more like a complete interface rather than just one extra utility. I also thought the stronger argument validation and run-manifest idea were good additions, since those make debugging and reproducibility a lot easier for longer HPC workflows. One thing I would still suggest checking is whether the growing number of CLI scripts could create overlap or duplicated logic over time, especially since the PR already mentions removing duplicate argparsers. It might help to think about whether some shared validation or common argument handling should be centralized early so the CLI layer stays easier to maintain as more commands get added.

Dhruv Reddy Kota and others added 6 commits March 7, 2026 14:53

updated cli script

49bc6ec

Add CLI scripts for alignment precomputation and visualization pipeline

b790243

Impl inference cli with full pipeline

00071fe

add generate_alphafold_feature_dict.py cli

af07e85

remove duplicate argparsers

426dede

Merge pull request #1 from bengal-tech365/feature-dict-cli

e75cdfe

AlphaFold Feature Dictionary CLI

nipun-khanna changed the title ~~Create CLI for alignment, inference, and visualization~~ Create CLI for alignment, inference, visualization, and feature gen Apr 27, 2026

nipun-khanna changed the title ~~Create CLI for alignment, inference, visualization, and feature gen~~ Create CLIs for full OpenFold pipeline Apr 27, 2026

adds CLIs for clustering and FASTA generation

54f8044

Document additional utility CLIs for dataset preparation

635b468

Added details about additional utility CLIs for FASTA clustering and alignment extraction, including usage examples and output descriptions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create CLIs for full OpenFold pipeline#67

Create CLIs for full OpenFold pipeline#67
nipun-khanna wants to merge 8 commits into
AI2Science:mainfrom
bengal-tech365:main

nipun-khanna commented Apr 23, 2026 •

edited

Loading

Uh oh!

MukilSundaravadivel commented Apr 26, 2026 •

edited

Loading

Uh oh!

Bailsnob commented Apr 28, 2026

Uh oh!

PranavNarala1 commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

nipun-khanna commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MukilSundaravadivel commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bailsnob commented Apr 28, 2026

Uh oh!

PranavNarala1 commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nipun-khanna commented Apr 23, 2026 •

edited

Loading

MukilSundaravadivel commented Apr 26, 2026 •

edited

Loading