bindsight

Expression → Binder. The first open-source pipeline that takes RNA-seq counts and outputs ranked de novo protein binder candidates, with full provenance back to the patient cohort.

👉 Try it live

Primary (16 GB CPU, no auto-sleep): huggingface.co/spaces/Mikhaeelatefrizk/bindsight Mirror (1 GB CPU, may briefly wake from sleep): bindsight.streamlit.app

Zero install — runs in your browser. Click the Demo tab and watch the full pipeline rediscover HER2 + EGFR from synthetic RNA-seq counts in ~60 seconds (cached for ~0.1 s on every revisit).

🚀 v0.1.0 — discovery half end-to-end on CPU; design + validation wired for free Colab; web UI deployed on Streamlit Cloud.

New here? → What is bindsight? (5-min read) · How to use it · Use cases · Designing on Colab · Hugging Face Space backup

Three ways to try it

1. Web app — Hugging Face Space (zero install) · Streamlit mirror

Anyone visiting either URL above gets:

The Home page with what bindsight is
A Demo button that runs the full pipeline live and renders a report
A Run on my data page (upload counts.tsv + design.tsv → get results)
A Browse a run page to inspect any output directory

The Hugging Face Space is the primary mirror (16 GB CPU, no auto-sleep). The Streamlit Cloud deploy at bindsight.streamlit.app is the same app on free-tier infrastructure (1 GB CPU); it may take ~60–120 s to wake from sleep on its first visit after a quiet stretch.

2. Local web app (one command)

pip install -e ".[discover,report]"
bindsight ui
# → opens http://localhost:8501 with the same multi-page interface

3. CLI (60 seconds)

bindsight demo

Runs the full discovery half on a shipped 10-gene tumor-vs-normal cohort, produces a real HTML report you can open in a browser. The pipeline rediscovers HER2 (ERBB2) and EGFR as the top antibody-tractable surface antigens — entirely from RNA-seq counts. ~30 seconds, no internet, no GPU.

$ bindsight demo
╭───────────── Demo run ──────────────╮
│ The pipeline should rediscover      │
│ ERBB2 (HER2) and EGFR as top        │
│ antibody-tractable surface antigens.│
╰─────────────────────────────────────╯
INFO  DEGs: 10 total, 5 significant
INFO  surfaceome filter: 5 → 2
INFO  wrote runs/demo/report.html
╭───────── bindsight demo ─────────────╮
│ Demo complete!                      │
│ Report HTML: runs/demo/report.html  │
╰─────────────────────────────────────╯

Why this exists

Two ecosystems in computational biology operate side-by-side and barely talk to each other:

Genomics (DESeq2, edgeR, Seurat, scanpy, TCGA, recount3) stops at "here are the interesting genes."
Protein design (RFdiffusion, ProteinMPNN, BindCraft, BoltzGen, AlphaFold, Boltz-2) starts from "given a target..."

The bridge between them — "this gene is up in disease, low in healthy tissue, surface-exposed, has a known targetable site, here is a docked binder seed and a designed binder ranked by predicted affinity, with the receipts back to the patient cohort" — is missing. People build it ad-hoc, per project, never reproducibly. bindsight ships that bridge as one tool.

What it does

  RNA-seq counts (bulk or sc)                       Designed protein binders
              │                                              ▲
              │                                              │
              ▼                                              │
   Differential expression  ──►  Surface-exposed  ──►  De novo backbone
   (pydeseq2 or DESeq2)         (SURFY)              (RFdiffusion / BindCraft / BoltzGen)
                                     │                       │
                                     ▼                       ▼
                              Targetable sites          Sequence design
                              (SURFACE-Bind)            (ProteinMPNN)
                                     │                       │
                                     ▼                       ▼
                              AlphaFoldDB structure     Affinity + structure
                                                        validation
                                                        (Boltz-2 / Chai-1r)
                                                              │
                                                              ▼
                                                  Multi-objective ranking
                                                              │
                                                              ▼
                                       Quarto report + RO-Crate (Zenodo)
                                       with full PROV-O provenance

Who it's for

Translational researchers who want a free, reproducible "data → designed binder" pipeline.
Clinical biologists who need an audit trail back from a binder to the patient cohort.
Method developers who want a held-out evaluation harness (rediscovery of known antigens) to benchmark new designers/validators.
Pharma early-discovery teams who want an open comparator they can extend with proprietary designers via the plugin interface.

What's distinctive

	Existing protein-design tools	bindsight
Input	Target structure	RNA-seq counts
Provenance	PDB + maybe a log	PROV-O JSON-LD + RO-Crate, audit trail to patient cohort
Hardware	HPC assumed	CPU laptop + offload to free Colab / Modal / Kaggle
Cost-awareness	None	`--dry-run` estimates GPU $ before running
Negative results	Discarded	Catalogued (`failure_taxonomy.parquet`)
Citability	Code dump	DOI per release, JSON-Schema-validated outputs, JOSS-style

For the full landscape comparison, see ARCHITECTURE.md.

What works today (v0.1.0)

Capability	Status	How to try
Web UI — multi-page Streamlit app (Home / Demo / Run on my data / Browse / About)	✅ ready	`bindsight ui` or Streamlit Cloud
`bindsight demo` — full discovery on shipped example + paper-style report	✅ ready	`bindsight demo`
`bindsight discover` — your own RNA-seq cohort → ranked targets	✅ ready	`bindsight discover my.yaml --out runs/x`
`bindsight rank` — multi-objective composite scoring of validated binders	✅ ready	`bindsight rank runs/x`
`bindsight report --format html` — paper-style HTML, embedded volcano + tables + provenance	✅ ready	`bindsight report runs/x`
`bindsight report --format streamlit` — interactive dashboard for one run	✅ ready	`bindsight report runs/x --format streamlit`
`bindsight run` — full pipeline orchestrator (discover → design → validate → rank → report → export)	✅ ready	`bindsight run my.yaml --out runs/x`
`bindsight export` — RO-Crate zip for Zenodo deposit	✅ ready	`bindsight export runs/x --out runs/x.crate.zip`
`bindsight design --dry-run` — GPU cost estimate for any backend	✅ ready	`bindsight design runs/x --backend modal --dry-run`
`bindsight doctor` — diagnose deps, caches, vendored data	✅ ready	`bindsight doctor`
`bindsight verify-licenses` — per-component license inventory	✅ ready	`bindsight verify-licenses`
GPU design notebook — RFdiffusion + ProteinMPNN + Boltz-2 wired in templated Colab notebook	✅ ready	`bindsight design runs/x --backend colab` opens a notebook with real install + inference cells
Manual Colab recipe — step-by-step for the GPU half	✅ ready	docs/colab-design-howto.md

Status & roadmap

✅ v0.1.0 (current) — discovery + rank + report + export + web UI + real Colab notebook patterns
🔬 v0.1.x — first-user GPU validation (someone with a real GPU runs the Colab notebook end-to-end and reports any install/inference issues; we patch fast)
⏳ v0.2.0 — live Modal/Colab job submission via API, BindCraft + BoltzGen plugins fully wired, scRNA-seq input
⏳ v1.0.0 — JOSS submission + validation paper (rediscovery of HER2/EGFR/MSLN/CLDN6 from blinded TCGA cohorts)

See ARCHITECTURE.md § Phased Roadmap for details.

Install

bindsight is not yet on PyPI. Install from source (Windows / macOS / Linux, Python 3.11+):

git clone <repo-url> bindsight
cd bindsight
python -m venv .venv

# Windows
.venv\Scripts\activate

# macOS / Linux
source .venv/bin/activate

pip install -e ".[dev,discover,report]"
bindsight --version
bindsight doctor                # confirm install is clean
bindsight demo                  # run the 60-second demo

For Conda users, envs/discover.yaml provides the same set of dependencies:

mamba env create -f envs/discover.yaml
mamba activate bindsight-discover
pip install -e ".[dev,report]"

Quickstart (target: v0.0.x)

# 1. Discover targets from a TCGA cohort (CPU only, ~10 minutes on a laptop)
bindsight discover examples/tcga_luad.yaml --out runs/luad_v01

# 2. Inspect the discovered targets
bindsight report runs/luad_v01 --format html
open runs/luad_v01/report.html

# 3. (v0.1+) Design binders for the top 5 targets via Colab GPU
bindsight design runs/luad_v01 --backend colab --trajectories 50

# 4. (v0.1+) Validate with Boltz-2
bindsight validate runs/luad_v01 --backend colab --validator boltz2

# 5. (v0.1+) Rank, report, export as RO-Crate
bindsight rank runs/luad_v01
bindsight report runs/luad_v01 --format html --include-binders
bindsight export runs/luad_v01 --format ro-crate --out runs/luad_v01.crate.zip

Repository layout

bindsight/                 # Python package
├── io/                   # Parquet, FASTA, PDB, mmCIF, manifest readers
├── deg/                  # pydeseq2 wrapper (+ optional R bridge)
├── targets/              # Open Targets, HPA, GTEx, recount3 clients
├── surfaceome/           # SURFY filter + SURFACE-Bind client
├── structures/           # AlphaFoldDB + RCSB/PDBe fetch
├── epitopes/             # SURFACE-Bind site lookup; fpocket fallback (v0.2)
├── design/               # Designer plugin interface
├── runners/              # Colab / Modal / Kaggle / local-Docker adapters
├── validate/             # Boltz-2 default; Chai-1r, AF2-IG opt-in
├── rank/                 # Multi-objective scoring
├── provenance/           # PROV-O JSON-LD schema + RO-Crate emitter
├── report/               # Quarto template + Streamlit app
└── cli.py                # Click entrypoint

envs/                     # Conda environment files (one per stage)
examples/                 # Example pipeline configs (TCGA-LUAD, etc.)
tests/                    # Pytest smoke + integration tests + fixtures
docs/                     # mkdocs-material site source
.github/workflows/        # CI + Zenodo deposit on tag

ARCHITECTURE.md           # Architectural source of truth
LICENSING.md              # Per-dependency license inventory
CONTRIBUTING.md           # How to contribute
CHANGELOG.md              # Per-version changes
CITATION.cff              # Zenodo / GitHub citation metadata
Snakefile                 # Snakemake DAG
pyproject.toml            # Python packaging

Documentation

ARCHITECTURE.md — system design, module contracts, design rationale
LICENSING.md — per-dependency license inventory and commercial-use guidance
CONTRIBUTING.md — dev setup, testing, commit conventions
CHANGELOG.md — per-version changes
docs/ — long-form docs (built with mkdocs build)

Acknowledgments

bindsight is an opinionated wrapper. Real intellectual credit belongs to the upstream tool authors. See LICENSING.md for the full inventory; the work this builds on most directly:

SURFACE-Bind (Khakzad et al., PNAS 2025) — the targetable-sites catalog that makes the bridge tractable
pydeseq2 (Muzellec et al., Bioinformatics 2023) — Python DESeq2 implementation
RFdiffusion (Watson et al., Nature 2023) — backbone generation
ProteinMPNN (Dauparas et al., Science 2022) — sequence design
Boltz-2 (Wohlwend et al., 2025) — structure + affinity prediction
BindCraft (Pacesa et al., Nature 2025) — one-shot binder design
Snakemake (Mölder et al., F1000Research 2021) — workflow orchestration

Citation

If you use bindsight in your work, please cite it via the Zenodo DOI:

Atef Rizk, M. (2026). bindsight: a reproducible bridge from RNA-seq to de novo protein binder design (v0.1.0). Zenodo. https://doi.org/10.5281/zenodo.20121496

BibTeX:

@software{atefrizk_bindsight_2026,
  author       = {Atef Rizk, Mikhaeel},
  title        = {bindsight: a reproducible bridge from RNA-seq to de novo protein binder design},
  year         = {2026},
  publisher    = {Zenodo},
  version      = {v0.1.0},
  doi          = {10.5281/zenodo.20121496},
  url          = {https://doi.org/10.5281/zenodo.20121496},
  orcid        = {https://orcid.org/0009-0006-1069-9558}
}

GitHub also exposes a "Cite this repository" button on the right sidebar of the repo page that auto-generates citations in BibTeX, APA, and other formats from CITATION.cff. Please also cite the upstream tools you used (the per-run manifest emits a software.bib to make this easy).

About the author

bindsight is built and maintained by Mikhaeel Atef Rizk — PharmD graduate of the German University in Cairo (GUC), currently finishing the Egyptian post-PharmD applied-pharmacy term (Imtiyaz). Earlier in 2026 he had a research rotation at the German International University in Berlin (GIU Berlin) where he picked up R / RStudio.

ORCID: 0009-0006-1069-9558
GitHub: @mikhaeelatefrizk
Email: [email protected]
Languages: Arabic (native), English (full professional), German (professional working ≈ B2), French, Russian

Sister projects on GitHub

bindsight sits at the deep end of an ongoing bioinformatics portfolio:

bioinformatics-portfolio — an end-to-end bioinformatics portfolio with three subprojects, each fully reproducible from raw data to figures:
- 01-rnaseq-fox-domestication — RNA-seq differential expression on GEO GSE76517, replicating the Kukekova et al. PNAS 2018 silver-fox domestication study
- 02-tcga-survival-kidney-cancer — TCGA-KIRC clinical survival analysis identifying EPAS1 / HIF-2α as a prognostic biomarker (target of FDA-approved belzutifan)
- 03-scrnaseq-pbmc-seurat — Seurat v5 single-cell RNA-seq workflow on the 10x PBMC 3k dataset, recovering 8 immune populations
affect-labeling-review — a pre-registered systematic review + meta-analysis of affect labeling (Lieberman et al. 2007 paradigm). Real random-effects meta-analysis (k=9), PRISMA 2020, RoB 2 / ROBINS-I, ~14,000-word manuscript, open data + open code, .zenodo.json for citable archival
awesome-protein-design-software — curated list of protein-design / structure-prediction software (RFdiffusion, ProteinMPNN, Boltz, AlphaFold, ESMFold, etc.)
Awesome-Bioinformatics — curated list of bioinformatics libraries and tools

License

MIT. See LICENSE and LICENSING.md for component-level details.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
.huggingface		.huggingface
announcement		announcement
bindsight		bindsight
data/surface_bind		data/surface_bind
docs		docs
envs		envs
examples		examples
paper		paper
scripts		scripts
tests		tests
tools		tools
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
CUSTOM_DOMAIN.md		CUSTOM_DOMAIN.md
GO_LIVE.ps1		GO_LIVE.ps1
LICENSE		LICENSE
LICENSING.md		LICENSING.md
PUBLISH_PYPI.md		PUBLISH_PYPI.md
README.md		README.md
RENAME.md		RENAME.md
SECURITY.md		SECURITY.md
Snakefile		Snakefile
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bindsight

👉 Try it live

Three ways to try it

1. Web app — Hugging Face Space (zero install) · Streamlit mirror

2. Local web app (one command)

3. CLI (60 seconds)

Why this exists

What it does

Who it's for

What's distinctive

What works today (v0.1.0)

Status & roadmap

Install

Quickstart (target: v0.0.x)

Repository layout

Documentation

Acknowledgments

Citation

About the author

Sister projects on GitHub

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bindsight

👉 Try it live

Three ways to try it

1. Web app — Hugging Face Space (zero install) · Streamlit mirror

2. Local web app (one command)

3. CLI (60 seconds)

Why this exists

What it does

Who it's for

What's distinctive

What works today (v0.1.0)

Status & roadmap

Install

Quickstart (target: v0.0.x)

Repository layout

Documentation

Acknowledgments

Citation

About the author

Sister projects on GitHub

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages