Expression → Binder. The first open-source pipeline that takes RNA-seq counts and outputs ranked de novo protein binder candidates, with full provenance back to the patient cohort.
Primary (16 GB CPU, no auto-sleep): huggingface.co/spaces/Mikhaeelatefrizk/bindsight Mirror (1 GB CPU, may briefly wake from sleep): bindsight.streamlit.app
Zero install — runs in your browser. Click the Demo tab and watch the full pipeline rediscover HER2 + EGFR from synthetic RNA-seq counts in ~60 seconds (cached for ~0.1 s on every revisit).
🚀 v0.1.0 — discovery half end-to-end on CPU; design + validation wired for free Colab; web UI deployed on Streamlit Cloud.
New here? → What is bindsight? (5-min read) · How to use it · Use cases · Designing on Colab · Hugging Face Space backup
1. Web app — Hugging Face Space (zero install) · Streamlit mirror
Anyone visiting either URL above gets:
- The Home page with what bindsight is
- A Demo button that runs the full pipeline live and renders a report
- A Run on my data page (upload counts.tsv + design.tsv → get results)
- A Browse a run page to inspect any output directory
The Hugging Face Space is the primary mirror (16 GB CPU, no auto-sleep). The Streamlit Cloud deploy at bindsight.streamlit.app is the same app on free-tier infrastructure (1 GB CPU); it may take ~60–120 s to wake from sleep on its first visit after a quiet stretch.
pip install -e ".[discover,report]"
bindsight ui
# → opens http://localhost:8501 with the same multi-page interfacebindsight demoRuns the full discovery half on a shipped 10-gene tumor-vs-normal cohort, produces a real HTML report you can open in a browser. The pipeline rediscovers HER2 (ERBB2) and EGFR as the top antibody-tractable surface antigens — entirely from RNA-seq counts. ~30 seconds, no internet, no GPU.
$ bindsight demo
╭───────────── Demo run ──────────────╮
│ The pipeline should rediscover │
│ ERBB2 (HER2) and EGFR as top │
│ antibody-tractable surface antigens.│
╰─────────────────────────────────────╯
INFO DEGs: 10 total, 5 significant
INFO surfaceome filter: 5 → 2
INFO wrote runs/demo/report.html
╭───────── bindsight demo ─────────────╮
│ Demo complete! │
│ Report HTML: runs/demo/report.html │
╰─────────────────────────────────────╯
Two ecosystems in computational biology operate side-by-side and barely talk to each other:
- Genomics (DESeq2, edgeR, Seurat, scanpy, TCGA, recount3) stops at "here are the interesting genes."
- Protein design (RFdiffusion, ProteinMPNN, BindCraft, BoltzGen, AlphaFold, Boltz-2) starts from "given a target..."
The bridge between them — "this gene is up in disease, low in healthy tissue, surface-exposed, has a known targetable site, here is a docked binder seed and a designed binder ranked by predicted affinity, with the receipts back to the patient cohort" — is missing. People build it ad-hoc, per project, never reproducibly. bindsight ships that bridge as one tool.
RNA-seq counts (bulk or sc) Designed protein binders
│ ▲
│ │
▼ │
Differential expression ──► Surface-exposed ──► De novo backbone
(pydeseq2 or DESeq2) (SURFY) (RFdiffusion / BindCraft / BoltzGen)
│ │
▼ ▼
Targetable sites Sequence design
(SURFACE-Bind) (ProteinMPNN)
│ │
▼ ▼
AlphaFoldDB structure Affinity + structure
validation
(Boltz-2 / Chai-1r)
│
▼
Multi-objective ranking
│
▼
Quarto report + RO-Crate (Zenodo)
with full PROV-O provenance
- Translational researchers who want a free, reproducible "data → designed binder" pipeline.
- Clinical biologists who need an audit trail back from a binder to the patient cohort.
- Method developers who want a held-out evaluation harness (rediscovery of known antigens) to benchmark new designers/validators.
- Pharma early-discovery teams who want an open comparator they can extend with proprietary designers via the plugin interface.
| Existing protein-design tools | bindsight | |
|---|---|---|
| Input | Target structure | RNA-seq counts |
| Provenance | PDB + maybe a log | PROV-O JSON-LD + RO-Crate, audit trail to patient cohort |
| Hardware | HPC assumed | CPU laptop + offload to free Colab / Modal / Kaggle |
| Cost-awareness | None | --dry-run estimates GPU $ before running |
| Negative results | Discarded | Catalogued (failure_taxonomy.parquet) |
| Citability | Code dump | DOI per release, JSON-Schema-validated outputs, JOSS-style |
For the full landscape comparison, see ARCHITECTURE.md.
| Capability | Status | How to try |
|---|---|---|
| Web UI — multi-page Streamlit app (Home / Demo / Run on my data / Browse / About) | ✅ ready | bindsight ui or Streamlit Cloud |
bindsight demo — full discovery on shipped example + paper-style report |
✅ ready | bindsight demo |
bindsight discover — your own RNA-seq cohort → ranked targets |
✅ ready | bindsight discover my.yaml --out runs/x |
bindsight rank — multi-objective composite scoring of validated binders |
✅ ready | bindsight rank runs/x |
bindsight report --format html — paper-style HTML, embedded volcano + tables + provenance |
✅ ready | bindsight report runs/x |
bindsight report --format streamlit — interactive dashboard for one run |
✅ ready | bindsight report runs/x --format streamlit |
bindsight run — full pipeline orchestrator (discover → design → validate → rank → report → export) |
✅ ready | bindsight run my.yaml --out runs/x |
bindsight export — RO-Crate zip for Zenodo deposit |
✅ ready | bindsight export runs/x --out runs/x.crate.zip |
bindsight design --dry-run — GPU cost estimate for any backend |
✅ ready | bindsight design runs/x --backend modal --dry-run |
bindsight doctor — diagnose deps, caches, vendored data |
✅ ready | bindsight doctor |
bindsight verify-licenses — per-component license inventory |
✅ ready | bindsight verify-licenses |
| GPU design notebook — RFdiffusion + ProteinMPNN + Boltz-2 wired in templated Colab notebook | ✅ ready | bindsight design runs/x --backend colab opens a notebook with real install + inference cells |
| Manual Colab recipe — step-by-step for the GPU half | ✅ ready | docs/colab-design-howto.md |
- ✅ v0.1.0 (current) — discovery + rank + report + export + web UI + real Colab notebook patterns
- 🔬 v0.1.x — first-user GPU validation (someone with a real GPU runs the Colab notebook end-to-end and reports any install/inference issues; we patch fast)
- ⏳ v0.2.0 — live Modal/Colab job submission via API, BindCraft + BoltzGen plugins fully wired, scRNA-seq input
- ⏳ v1.0.0 — JOSS submission + validation paper (rediscovery of HER2/EGFR/MSLN/CLDN6 from blinded TCGA cohorts)
See ARCHITECTURE.md § Phased Roadmap for details.
bindsight is not yet on PyPI. Install from source (Windows / macOS / Linux,
Python 3.11+):
git clone <repo-url> bindsight
cd bindsight
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate
pip install -e ".[dev,discover,report]"
bindsight --version
bindsight doctor # confirm install is clean
bindsight demo # run the 60-second demoFor Conda users, envs/discover.yaml provides the same set of dependencies:
mamba env create -f envs/discover.yaml
mamba activate bindsight-discover
pip install -e ".[dev,report]"# 1. Discover targets from a TCGA cohort (CPU only, ~10 minutes on a laptop)
bindsight discover examples/tcga_luad.yaml --out runs/luad_v01
# 2. Inspect the discovered targets
bindsight report runs/luad_v01 --format html
open runs/luad_v01/report.html
# 3. (v0.1+) Design binders for the top 5 targets via Colab GPU
bindsight design runs/luad_v01 --backend colab --trajectories 50
# 4. (v0.1+) Validate with Boltz-2
bindsight validate runs/luad_v01 --backend colab --validator boltz2
# 5. (v0.1+) Rank, report, export as RO-Crate
bindsight rank runs/luad_v01
bindsight report runs/luad_v01 --format html --include-binders
bindsight export runs/luad_v01 --format ro-crate --out runs/luad_v01.crate.zipbindsight/ # Python package
├── io/ # Parquet, FASTA, PDB, mmCIF, manifest readers
├── deg/ # pydeseq2 wrapper (+ optional R bridge)
├── targets/ # Open Targets, HPA, GTEx, recount3 clients
├── surfaceome/ # SURFY filter + SURFACE-Bind client
├── structures/ # AlphaFoldDB + RCSB/PDBe fetch
├── epitopes/ # SURFACE-Bind site lookup; fpocket fallback (v0.2)
├── design/ # Designer plugin interface
├── runners/ # Colab / Modal / Kaggle / local-Docker adapters
├── validate/ # Boltz-2 default; Chai-1r, AF2-IG opt-in
├── rank/ # Multi-objective scoring
├── provenance/ # PROV-O JSON-LD schema + RO-Crate emitter
├── report/ # Quarto template + Streamlit app
└── cli.py # Click entrypoint
envs/ # Conda environment files (one per stage)
examples/ # Example pipeline configs (TCGA-LUAD, etc.)
tests/ # Pytest smoke + integration tests + fixtures
docs/ # mkdocs-material site source
.github/workflows/ # CI + Zenodo deposit on tag
ARCHITECTURE.md # Architectural source of truth
LICENSING.md # Per-dependency license inventory
CONTRIBUTING.md # How to contribute
CHANGELOG.md # Per-version changes
CITATION.cff # Zenodo / GitHub citation metadata
Snakefile # Snakemake DAG
pyproject.toml # Python packaging
- ARCHITECTURE.md — system design, module contracts, design rationale
- LICENSING.md — per-dependency license inventory and commercial-use guidance
- CONTRIBUTING.md — dev setup, testing, commit conventions
- CHANGELOG.md — per-version changes
docs/— long-form docs (built withmkdocs build)
bindsight is an opinionated wrapper. Real intellectual credit belongs to the upstream tool authors. See LICENSING.md for the full inventory; the work this builds on most directly:
- SURFACE-Bind (Khakzad et al., PNAS 2025) — the targetable-sites catalog that makes the bridge tractable
- pydeseq2 (Muzellec et al., Bioinformatics 2023) — Python DESeq2 implementation
- RFdiffusion (Watson et al., Nature 2023) — backbone generation
- ProteinMPNN (Dauparas et al., Science 2022) — sequence design
- Boltz-2 (Wohlwend et al., 2025) — structure + affinity prediction
- BindCraft (Pacesa et al., Nature 2025) — one-shot binder design
- Snakemake (Mölder et al., F1000Research 2021) — workflow orchestration
If you use bindsight in your work, please cite it via the Zenodo DOI:
Atef Rizk, M. (2026). bindsight: a reproducible bridge from RNA-seq to de novo protein binder design (v0.1.0). Zenodo. https://doi.org/10.5281/zenodo.20121496
BibTeX:
@software{atefrizk_bindsight_2026,
author = {Atef Rizk, Mikhaeel},
title = {bindsight: a reproducible bridge from RNA-seq to de novo protein binder design},
year = {2026},
publisher = {Zenodo},
version = {v0.1.0},
doi = {10.5281/zenodo.20121496},
url = {https://doi.org/10.5281/zenodo.20121496},
orcid = {https://orcid.org/0009-0006-1069-9558}
}GitHub also exposes a "Cite this repository" button on the right sidebar of the repo page that auto-generates citations in BibTeX, APA, and other formats from CITATION.cff. Please also cite the upstream tools you used (the per-run manifest emits a software.bib to make this easy).
bindsight is built and maintained by Mikhaeel Atef Rizk — PharmD graduate of the German University in Cairo (GUC), currently finishing the Egyptian post-PharmD applied-pharmacy term (Imtiyaz). Earlier in 2026 he had a research rotation at the German International University in Berlin (GIU Berlin) where he picked up R / RStudio.
- ORCID: 0009-0006-1069-9558
- GitHub: @mikhaeelatefrizk
- Email:
[email protected] - Languages: Arabic (native), English (full professional), German (professional working ≈ B2), French, Russian
bindsight sits at the deep end of an ongoing bioinformatics portfolio:
- bioinformatics-portfolio — an end-to-end bioinformatics portfolio with three subprojects, each fully reproducible from raw data to figures:
01-rnaseq-fox-domestication— RNA-seq differential expression on GEO GSE76517, replicating the Kukekova et al. PNAS 2018 silver-fox domestication study02-tcga-survival-kidney-cancer— TCGA-KIRC clinical survival analysis identifying EPAS1 / HIF-2α as a prognostic biomarker (target of FDA-approved belzutifan)03-scrnaseq-pbmc-seurat— Seurat v5 single-cell RNA-seq workflow on the 10x PBMC 3k dataset, recovering 8 immune populations
- affect-labeling-review — a pre-registered systematic review + meta-analysis of affect labeling (Lieberman et al. 2007 paradigm). Real random-effects meta-analysis (k=9), PRISMA 2020, RoB 2 / ROBINS-I, ~14,000-word manuscript, open data + open code,
.zenodo.jsonfor citable archival - awesome-protein-design-software — curated list of protein-design / structure-prediction software (RFdiffusion, ProteinMPNN, Boltz, AlphaFold, ESMFold, etc.)
- Awesome-Bioinformatics — curated list of bioinformatics libraries and tools
MIT. See LICENSE and LICENSING.md for component-level details.