Skip to content

mikhaeelatefrizk/bindsight

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bindsight

Expression → Binder. The first open-source pipeline that takes RNA-seq counts and outputs ranked de novo protein binder candidates, with full provenance back to the patient cohort.

HF Space Open in Streamlit DOI License: MIT Python: 3.11+ Tests: 175+ CI Workflow: Snakemake

👉 Try it live

Primary (16 GB CPU, no auto-sleep): huggingface.co/spaces/Mikhaeelatefrizk/bindsight Mirror (1 GB CPU, may briefly wake from sleep): bindsight.streamlit.app

Zero install — runs in your browser. Click the Demo tab and watch the full pipeline rediscover HER2 + EGFR from synthetic RNA-seq counts in ~60 seconds (cached for ~0.1 s on every revisit).

🚀 v0.1.0 — discovery half end-to-end on CPU; design + validation wired for free Colab; web UI deployed on Streamlit Cloud.

New here?What is bindsight? (5-min read) · How to use it · Use cases · Designing on Colab · Hugging Face Space backup


Three ways to try it

1. Web app — Hugging Face Space (zero install) · Streamlit mirror

Anyone visiting either URL above gets:

  • The Home page with what bindsight is
  • A Demo button that runs the full pipeline live and renders a report
  • A Run on my data page (upload counts.tsv + design.tsv → get results)
  • A Browse a run page to inspect any output directory

The Hugging Face Space is the primary mirror (16 GB CPU, no auto-sleep). The Streamlit Cloud deploy at bindsight.streamlit.app is the same app on free-tier infrastructure (1 GB CPU); it may take ~60–120 s to wake from sleep on its first visit after a quiet stretch.

2. Local web app (one command)

pip install -e ".[discover,report]"
bindsight ui
# → opens http://localhost:8501 with the same multi-page interface

3. CLI (60 seconds)

bindsight demo

Runs the full discovery half on a shipped 10-gene tumor-vs-normal cohort, produces a real HTML report you can open in a browser. The pipeline rediscovers HER2 (ERBB2) and EGFR as the top antibody-tractable surface antigens — entirely from RNA-seq counts. ~30 seconds, no internet, no GPU.

$ bindsight demo
╭───────────── Demo run ──────────────╮
│ The pipeline should rediscover      │
│ ERBB2 (HER2) and EGFR as top        │
│ antibody-tractable surface antigens.│
╰─────────────────────────────────────╯
INFO  DEGs: 10 total, 5 significant
INFO  surfaceome filter: 5 → 2
INFO  wrote runs/demo/report.html
╭───────── bindsight demo ─────────────╮
│ Demo complete!                      │
│ Report HTML: runs/demo/report.html  │
╰─────────────────────────────────────╯

Why this exists

Two ecosystems in computational biology operate side-by-side and barely talk to each other:

  • Genomics (DESeq2, edgeR, Seurat, scanpy, TCGA, recount3) stops at "here are the interesting genes."
  • Protein design (RFdiffusion, ProteinMPNN, BindCraft, BoltzGen, AlphaFold, Boltz-2) starts from "given a target..."

The bridge between them — "this gene is up in disease, low in healthy tissue, surface-exposed, has a known targetable site, here is a docked binder seed and a designed binder ranked by predicted affinity, with the receipts back to the patient cohort" — is missing. People build it ad-hoc, per project, never reproducibly. bindsight ships that bridge as one tool.

What it does

  RNA-seq counts (bulk or sc)                       Designed protein binders
              │                                              ▲
              │                                              │
              ▼                                              │
   Differential expression  ──►  Surface-exposed  ──►  De novo backbone
   (pydeseq2 or DESeq2)         (SURFY)              (RFdiffusion / BindCraft / BoltzGen)
                                     │                       │
                                     ▼                       ▼
                              Targetable sites          Sequence design
                              (SURFACE-Bind)            (ProteinMPNN)
                                     │                       │
                                     ▼                       ▼
                              AlphaFoldDB structure     Affinity + structure
                                                        validation
                                                        (Boltz-2 / Chai-1r)
                                                              │
                                                              ▼
                                                  Multi-objective ranking
                                                              │
                                                              ▼
                                       Quarto report + RO-Crate (Zenodo)
                                       with full PROV-O provenance

Who it's for

  • Translational researchers who want a free, reproducible "data → designed binder" pipeline.
  • Clinical biologists who need an audit trail back from a binder to the patient cohort.
  • Method developers who want a held-out evaluation harness (rediscovery of known antigens) to benchmark new designers/validators.
  • Pharma early-discovery teams who want an open comparator they can extend with proprietary designers via the plugin interface.

What's distinctive

Existing protein-design tools bindsight
Input Target structure RNA-seq counts
Provenance PDB + maybe a log PROV-O JSON-LD + RO-Crate, audit trail to patient cohort
Hardware HPC assumed CPU laptop + offload to free Colab / Modal / Kaggle
Cost-awareness None --dry-run estimates GPU $ before running
Negative results Discarded Catalogued (failure_taxonomy.parquet)
Citability Code dump DOI per release, JSON-Schema-validated outputs, JOSS-style

For the full landscape comparison, see ARCHITECTURE.md.

What works today (v0.1.0)

Capability Status How to try
Web UI — multi-page Streamlit app (Home / Demo / Run on my data / Browse / About) ✅ ready bindsight ui or Streamlit Cloud
bindsight demo — full discovery on shipped example + paper-style report ✅ ready bindsight demo
bindsight discover — your own RNA-seq cohort → ranked targets ✅ ready bindsight discover my.yaml --out runs/x
bindsight rank — multi-objective composite scoring of validated binders ✅ ready bindsight rank runs/x
bindsight report --format html — paper-style HTML, embedded volcano + tables + provenance ✅ ready bindsight report runs/x
bindsight report --format streamlit — interactive dashboard for one run ✅ ready bindsight report runs/x --format streamlit
bindsight run — full pipeline orchestrator (discover → design → validate → rank → report → export) ✅ ready bindsight run my.yaml --out runs/x
bindsight export — RO-Crate zip for Zenodo deposit ✅ ready bindsight export runs/x --out runs/x.crate.zip
bindsight design --dry-run — GPU cost estimate for any backend ✅ ready bindsight design runs/x --backend modal --dry-run
bindsight doctor — diagnose deps, caches, vendored data ✅ ready bindsight doctor
bindsight verify-licenses — per-component license inventory ✅ ready bindsight verify-licenses
GPU design notebook — RFdiffusion + ProteinMPNN + Boltz-2 wired in templated Colab notebook ✅ ready bindsight design runs/x --backend colab opens a notebook with real install + inference cells
Manual Colab recipe — step-by-step for the GPU half ✅ ready docs/colab-design-howto.md

Status & roadmap

  • v0.1.0 (current) — discovery + rank + report + export + web UI + real Colab notebook patterns
  • 🔬 v0.1.x — first-user GPU validation (someone with a real GPU runs the Colab notebook end-to-end and reports any install/inference issues; we patch fast)
  • v0.2.0 — live Modal/Colab job submission via API, BindCraft + BoltzGen plugins fully wired, scRNA-seq input
  • v1.0.0 — JOSS submission + validation paper (rediscovery of HER2/EGFR/MSLN/CLDN6 from blinded TCGA cohorts)

See ARCHITECTURE.md § Phased Roadmap for details.

Install

bindsight is not yet on PyPI. Install from source (Windows / macOS / Linux, Python 3.11+):

git clone <repo-url> bindsight
cd bindsight
python -m venv .venv

# Windows
.venv\Scripts\activate

# macOS / Linux
source .venv/bin/activate

pip install -e ".[dev,discover,report]"
bindsight --version
bindsight doctor                # confirm install is clean
bindsight demo                  # run the 60-second demo

For Conda users, envs/discover.yaml provides the same set of dependencies:

mamba env create -f envs/discover.yaml
mamba activate bindsight-discover
pip install -e ".[dev,report]"

Quickstart (target: v0.0.x)

# 1. Discover targets from a TCGA cohort (CPU only, ~10 minutes on a laptop)
bindsight discover examples/tcga_luad.yaml --out runs/luad_v01

# 2. Inspect the discovered targets
bindsight report runs/luad_v01 --format html
open runs/luad_v01/report.html

# 3. (v0.1+) Design binders for the top 5 targets via Colab GPU
bindsight design runs/luad_v01 --backend colab --trajectories 50

# 4. (v0.1+) Validate with Boltz-2
bindsight validate runs/luad_v01 --backend colab --validator boltz2

# 5. (v0.1+) Rank, report, export as RO-Crate
bindsight rank runs/luad_v01
bindsight report runs/luad_v01 --format html --include-binders
bindsight export runs/luad_v01 --format ro-crate --out runs/luad_v01.crate.zip

Repository layout

bindsight/                 # Python package
├── io/                   # Parquet, FASTA, PDB, mmCIF, manifest readers
├── deg/                  # pydeseq2 wrapper (+ optional R bridge)
├── targets/              # Open Targets, HPA, GTEx, recount3 clients
├── surfaceome/           # SURFY filter + SURFACE-Bind client
├── structures/           # AlphaFoldDB + RCSB/PDBe fetch
├── epitopes/             # SURFACE-Bind site lookup; fpocket fallback (v0.2)
├── design/               # Designer plugin interface
├── runners/              # Colab / Modal / Kaggle / local-Docker adapters
├── validate/             # Boltz-2 default; Chai-1r, AF2-IG opt-in
├── rank/                 # Multi-objective scoring
├── provenance/           # PROV-O JSON-LD schema + RO-Crate emitter
├── report/               # Quarto template + Streamlit app
└── cli.py                # Click entrypoint

envs/                     # Conda environment files (one per stage)
examples/                 # Example pipeline configs (TCGA-LUAD, etc.)
tests/                    # Pytest smoke + integration tests + fixtures
docs/                     # mkdocs-material site source
.github/workflows/        # CI + Zenodo deposit on tag

ARCHITECTURE.md           # Architectural source of truth
LICENSING.md              # Per-dependency license inventory
CONTRIBUTING.md           # How to contribute
CHANGELOG.md              # Per-version changes
CITATION.cff              # Zenodo / GitHub citation metadata
Snakefile                 # Snakemake DAG
pyproject.toml            # Python packaging

Documentation

  • ARCHITECTURE.md — system design, module contracts, design rationale
  • LICENSING.md — per-dependency license inventory and commercial-use guidance
  • CONTRIBUTING.md — dev setup, testing, commit conventions
  • CHANGELOG.md — per-version changes
  • docs/ — long-form docs (built with mkdocs build)

Acknowledgments

bindsight is an opinionated wrapper. Real intellectual credit belongs to the upstream tool authors. See LICENSING.md for the full inventory; the work this builds on most directly:

  • SURFACE-Bind (Khakzad et al., PNAS 2025) — the targetable-sites catalog that makes the bridge tractable
  • pydeseq2 (Muzellec et al., Bioinformatics 2023) — Python DESeq2 implementation
  • RFdiffusion (Watson et al., Nature 2023) — backbone generation
  • ProteinMPNN (Dauparas et al., Science 2022) — sequence design
  • Boltz-2 (Wohlwend et al., 2025) — structure + affinity prediction
  • BindCraft (Pacesa et al., Nature 2025) — one-shot binder design
  • Snakemake (Mölder et al., F1000Research 2021) — workflow orchestration

Citation

If you use bindsight in your work, please cite it via the Zenodo DOI:

Atef Rizk, M. (2026). bindsight: a reproducible bridge from RNA-seq to de novo protein binder design (v0.1.0). Zenodo. https://doi.org/10.5281/zenodo.20121496

DOI

BibTeX:

@software{atefrizk_bindsight_2026,
  author       = {Atef Rizk, Mikhaeel},
  title        = {bindsight: a reproducible bridge from RNA-seq to de novo protein binder design},
  year         = {2026},
  publisher    = {Zenodo},
  version      = {v0.1.0},
  doi          = {10.5281/zenodo.20121496},
  url          = {https://doi.org/10.5281/zenodo.20121496},
  orcid        = {https://orcid.org/0009-0006-1069-9558}
}

GitHub also exposes a "Cite this repository" button on the right sidebar of the repo page that auto-generates citations in BibTeX, APA, and other formats from CITATION.cff. Please also cite the upstream tools you used (the per-run manifest emits a software.bib to make this easy).

About the author

bindsight is built and maintained by Mikhaeel Atef Rizk — PharmD graduate of the German University in Cairo (GUC), currently finishing the Egyptian post-PharmD applied-pharmacy term (Imtiyaz). Earlier in 2026 he had a research rotation at the German International University in Berlin (GIU Berlin) where he picked up R / RStudio.

Sister projects on GitHub

bindsight sits at the deep end of an ongoing bioinformatics portfolio:

  • bioinformatics-portfolio — an end-to-end bioinformatics portfolio with three subprojects, each fully reproducible from raw data to figures:
    • 01-rnaseq-fox-domestication — RNA-seq differential expression on GEO GSE76517, replicating the Kukekova et al. PNAS 2018 silver-fox domestication study
    • 02-tcga-survival-kidney-cancer — TCGA-KIRC clinical survival analysis identifying EPAS1 / HIF-2α as a prognostic biomarker (target of FDA-approved belzutifan)
    • 03-scrnaseq-pbmc-seurat — Seurat v5 single-cell RNA-seq workflow on the 10x PBMC 3k dataset, recovering 8 immune populations
  • affect-labeling-review — a pre-registered systematic review + meta-analysis of affect labeling (Lieberman et al. 2007 paradigm). Real random-effects meta-analysis (k=9), PRISMA 2020, RoB 2 / ROBINS-I, ~14,000-word manuscript, open data + open code, .zenodo.json for citable archival
  • awesome-protein-design-software — curated list of protein-design / structure-prediction software (RFdiffusion, ProteinMPNN, Boltz, AlphaFold, ESMFold, etc.)
  • Awesome-Bioinformatics — curated list of bioinformatics libraries and tools

License

MIT. See LICENSE and LICENSING.md for component-level details.