{"id":49758007,"url":"https://github.com/mikhaeelatefrizk/bindsight","last_synced_at":"2026-06-26T03:00:21.723Z","repository":{"id":357012091,"uuid":"1234980906","full_name":"mikhaeelatefrizk/bindsight","owner":"mikhaeelatefrizk","description":"RNA-seq counts to ranked de novo protein binder candidates, with full provenance back to the patient cohort.","archived":false,"fork":false,"pushed_at":"2026-06-24T21:42:39.000Z","size":1840,"stargazers_count":1,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-24T23:07:17.121Z","etag":null,"topics":["alphafold","bioinformatics","boltz","computational-biology","de-novo-binder-design","protein-design","proteinmpnn","prov-o","reproducibility","rfdiffusion","rna-seq","ro-crate","streamlit"],"latest_commit_sha":null,"homepage":"https://mikhaeelatefrizk.github.io/bindsight/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mikhaeelatefrizk.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-10T22:13:24.000Z","updated_at":"2026-06-24T21:31:12.000Z","dependencies_parsed_at":null,"dependency_job_id":"51f76c7f-a99a-4f3f-8a9a-27caffde44e5","html_url":"https://github.com/mikhaeelatefrizk/bindsight","commit_stats":null,"previous_names":["mikhaeelatefrizk/bindsight"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/mikhaeelatefrizk/bindsight","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mikhaeelatefrizk%2Fbindsight","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mikhaeelatefrizk%2Fbindsight/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mikhaeelatefrizk%2Fbindsight/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mikhaeelatefrizk%2Fbindsight/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mikhaeelatefrizk","download_url":"https://codeload.github.com/mikhaeelatefrizk/bindsight/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mikhaeelatefrizk%2Fbindsight/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34801014,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-26T02:00:06.560Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alphafold","bioinformatics","boltz","computational-biology","de-novo-binder-design","protein-design","proteinmpnn","prov-o","reproducibility","rfdiffusion","rna-seq","ro-crate","streamlit"],"created_at":"2026-05-11T00:02:21.873Z","updated_at":"2026-06-26T03:00:21.699Z","avatar_url":"https://github.com/mikhaeelatefrizk.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# bindsight\n\n\u003e **Expression → Binder.** The first open-source pipeline that takes RNA-seq counts and outputs ranked de novo protein binder candidates, with full provenance back to the patient cohort.\n\n[![HF Space](https://img.shields.io/badge/%F0%9F%A4%97%20HF%20Space-bindsight-yellow.svg)](https://huggingface.co/spaces/Mikhaeelatefrizk/bindsight)\n[![Open in Streamlit](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://bindsight.streamlit.app/)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20121496.svg)](https://doi.org/10.5281/zenodo.20121496)\n[![License: AGPL v3](https://img.shields.io/badge/License-AGPL_v3-blue.svg)](LICENSE)\n[![Python: 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)\n[![CI](https://github.com/mikhaeelatefrizk/bindsight/actions/workflows/ci.yml/badge.svg)](https://github.com/mikhaeelatefrizk/bindsight/actions/workflows/ci.yml)\n[![Workflow: Snakemake](https://img.shields.io/badge/workflow-Snakemake-brightgreen.svg)](https://snakemake.github.io/)\n\n## 👉 Try it live\n\n**Primary** (Hugging Face Space, 16 GB CPU): **[huggingface.co/spaces/Mikhaeelatefrizk/bindsight](https://huggingface.co/spaces/Mikhaeelatefrizk/bindsight)**\n**Mirror** (Streamlit Community Cloud, 1 GB CPU): [bindsight.streamlit.app](https://bindsight.streamlit.app/)\n\nZero install — runs in your browser. Click the **Demo** tab and watch the **discovery half** surface antibody-tractable cell-surface antigens from a **real TCGA breast-cancer cohort** (NIH/GDC), with full provenance. (Binder *design* and *validation* are GPU-only — you run those locally via Modal / Docker / Kaggle / Colab, so they don't execute in the browser.)\n\n\u003e Both hosts are free-tier and will sleep after several days without traffic; a GitHub Actions cron pings both URLs every 6 hours so the next visitor lands on a warm container. If you hit either link after a long quiet stretch, give the wake-up screen 30–60 s and reload once.\n\n\u003e 🚀 **v0.2.0** — discovery half end-to-end on CPU (real TCGA data); design + validation now **proven** end-to-end on a **free GPU** — bindsight's first real de novo binders (20 ERBB2 designs, best ipTM 0.84, 50% success@0.65, with the real Boltz-2-predicted complexes) ship in the [designer benchmark](benchmarks/designer_benchmark/RESULTS.md); web UI deployed on Streamlit Cloud.\n\n**New here?** → [What is bindsight?](docs/what-is-bindsight.md) (5-min read) · [How to use it](docs/how-to-use.md) · [Use cases](docs/use-cases.md) · [Designing on Colab](docs/colab-design-howto.md)\n\n---\n\n## Three ways to try it\n\n### 1. Web app — [Hugging Face Space](https://huggingface.co/spaces/Mikhaeelatefrizk/bindsight) (zero install) · [Streamlit mirror](https://bindsight.streamlit.app/)\n\nAnyone visiting either URL above gets:\n- The Home page with what bindsight is\n- A **Demo** button that runs the discovery half live and renders a report\n- A **Run on my data** page (upload counts.tsv + design.tsv → get results)\n- A **Browse a run** page to inspect any output directory\n\nThe Hugging Face Space is the primary mirror (16 GB CPU). The Streamlit Cloud deploy at `bindsight.streamlit.app` is the same app on smaller free-tier infrastructure (1 GB CPU). Both hosts sleep after several days of inactivity; a 6-hourly GitHub Actions ping keeps them warm, but the very first visit after a long quiet period can still take ~30–120 s to wake.\n\n### 2. Local web app (one command)\n\n```bash\npip install -e \".[discover,report]\"\nbindsight ui\n# → opens http://localhost:8501 with the same multi-page interface\n```\n\n### 3. CLI\n\n```bash\nbindsight demo\n```\n\nRuns the full discovery half on a **real TCGA-BRCA tumor-vs-adjacent-normal cohort** auto-downloaded from NIH/GDC, and produces a real HTML report you can open in a browser. The pipeline discovers antibody-tractable cell-surface antigens over-expressed in tumor — entirely from RNA-seq counts, with full provenance (well-known targets such as ERBB2/HER2 surface among the candidates when their signal is present). First run needs internet (cohort + SURFY downloaded, then cached) and takes a few minutes of real DESeq2 + enrichment; CPU-only, no GPU.\n\n```\n$ bindsight demo\n╭──────────────── Demo run ────────────────╮\n│ Real TCGA-BRCA tumor-vs-adjacent-normal  │\n│ cohort (NIH/GDC). Discovers antibody-    │\n│ tractable cell-surface antigens, with    │\n│ full provenance.                         │\n╰──────────────────────────────────────────╯\nINFO  GDC: downloading TCGA-BRCA cohort (20 tumor + 20 normal)…\nINFO  SURFY cache empty; populating the full surfaceome list (2886)\nINFO  DEGs: 17019 total, 4011 significant; enriching top 300 up-regulated\nINFO  surfaceome filter: 300 → 42\nINFO  wrote runs/demo/report.html\n╭───────────── bindsight demo ─────────────╮\n│ Demo complete!                           │\n│ Report HTML: runs/demo/report.html       │\n╰──────────────────────────────────────────╯\n```\n\n---\n\n## Why this exists\n\nTwo ecosystems in computational biology operate side-by-side and barely talk to each other:\n\n- **Genomics** (DESeq2, edgeR, Seurat, scanpy, TCGA, recount3) stops at *\"here are the interesting genes.\"*\n- **Protein design** (RFdiffusion, ProteinMPNN, BindCraft, BoltzGen, AlphaFold, Boltz-2) starts from *\"given a target...\"*\n\nThe bridge between them — *\"this gene is up in disease, low in healthy tissue, surface-exposed, has a known targetable site, here is a docked binder seed and a designed binder ranked by predicted affinity, with the receipts back to the patient cohort\"* — is missing. People build it ad-hoc, per project, never reproducibly. **bindsight ships that bridge as one tool.**\n\n## What it does\n\n```\n  RNA-seq counts (bulk or sc)                       Designed protein binders\n              │                                              ▲\n              │                                              │\n              ▼                                              │\n   Differential expression  ──►  Surface-exposed  ──►  De novo backbone\n   (pydeseq2 or DESeq2)         (SURFY)              (RFdiffusion / BindCraft / BoltzGen)\n                                     │                       │\n                                     ▼                       ▼\n                              Targetable sites          Sequence design\n                              (SURFACE-Bind, v0.2)      (ProteinMPNN)\n                                     │                       │\n                                     ▼                       ▼\n                              AlphaFoldDB structure     Affinity + structure\n                                                        validation\n                                                        (Boltz-2 / Chai-1r)\n                                                              │\n                                                              ▼\n                                                  Multi-objective ranking\n                                                              │\n                                                              ▼\n                                       HTML report + RO-Crate (Zenodo)\n                                       with full PROV-O provenance\n```\n\n## Who it's for\n\n- **Translational researchers** who want a free, reproducible \"data → designed binder\" pipeline.\n- **Clinical biologists** who need an audit trail back from a binder to the patient cohort.\n- **Method developers** who want a held-out evaluation harness (rediscovery of known antigens) to benchmark new designers/validators.\n- **Pharma early-discovery teams** who want an open comparator they can extend with proprietary designers via the plugin interface.\n\n## What's distinctive\n\n| | Existing protein-design tools | bindsight |\n|---|---|---|\n| Input | Target structure | RNA-seq counts |\n| Provenance | PDB + maybe a log | PROV-O JSON-LD + RO-Crate, audit trail to patient cohort |\n| Hardware | HPC assumed | CPU laptop + offload to free Colab / Modal / Kaggle |\n| Cost-awareness | None | `--dry-run` estimates GPU $ before running |\n| Negative results | Discarded | Catalogued (`failure_taxonomy.parquet`) |\n| Citability | Code dump | DOI per release, JSON-Schema-validated outputs, JOSS-style |\n\nFor the full landscape comparison, see [ARCHITECTURE.md](ARCHITECTURE.md#8-comparison-vs-existing-tools).\n\n## What works today (v0.2.0)\n\n| Capability | Status | How to try |\n|---|---|---|\n| **Web UI** — multi-page Streamlit app (Home / Demo / Run on my data / Browse / About) | ✅ ready | `bindsight ui`  *or*  Streamlit Cloud |\n| **`bindsight demo`** — full discovery on shipped example + paper-style report | ✅ ready | `bindsight demo` |\n| **`bindsight discover`** — your own RNA-seq cohort → ranked targets | ✅ ready | `bindsight discover my.yaml --out runs/x` |\n| **`bindsight rank`** — multi-objective composite scoring of validated binders | ✅ ready | `bindsight rank runs/x` |\n| **`bindsight report --format html`** — paper-style HTML, embedded volcano + tables + provenance | ✅ ready | `bindsight report runs/x` |\n| **`bindsight report --format streamlit`** — interactive dashboard for one run | ✅ ready | `bindsight report runs/x --format streamlit` |\n| **`bindsight run`** — full pipeline orchestrator (discover → design → validate → rank → report → export) | ✅ ready | `bindsight run my.yaml --out runs/x` |\n| **`bindsight export`** — RO-Crate zip for Zenodo deposit | ✅ ready | `bindsight export runs/x --out runs/x.crate.zip` |\n| **`bindsight design`** — RFdiffusion + ProteinMPNN + Boltz-2 (and BindCraft / BoltzGen / Chai-1r / AF2-IG) run end-to-end on a GPU backend | ✅ ready | `bindsight design runs/x --backend modal` (or `local_docker` / `kaggle` / `colab`) |\n| **`bindsight design --dry-run`** — GPU cost estimate for any backend | ✅ ready | `bindsight design runs/x --backend modal --dry-run` |\n| **`bindsight validate`** — materialise structure/affinity metrics → `validated.parquet` | ✅ ready | `bindsight validate runs/x` |\n| **`bindsight benchmark`** — score rediscovery of the held-out known antigens (recall@k) | ✅ ready | `bindsight benchmark runs/x --known-antigens benchmarks/known.tsv` |\n| **Snakemake front-end** — same pipeline as the CLI, end-to-end | ✅ ready | `snakemake --configfile my.yaml --cores 4` (`pip install -e \".[workflow]\"`) |\n| **`bindsight doctor`** — diagnose deps, caches, vendored data | ✅ ready | `bindsight doctor` |\n| **`bindsight verify-licenses`** — per-component license inventory | ✅ ready | `bindsight verify-licenses` |\n\n\u003e **Note on GPU stages.** The design/validation models require CUDA, so they\n\u003e run on the GPU backend you choose (Modal / local Docker / Kaggle, or a\n\u003e generated Colab notebook), not on the CPU host. The held-out evaluation set\n\u003e lives in [`benchmarks/`](benchmarks/) with full provenance.\n\n\u003e **Discovery quality filters (opt-in).** Beyond the core\n\u003e DE → surfaceome → structure path, discovery can apply real-data refinements via\n\u003e `target_discovery` config flags: an AlphaFold-pLDDT disorder gate\n\u003e (`min_mean_plddt`), UniProt extracellular-domain / topology restriction\n\u003e (`use_uniprot_topology`, `require_extracellular_domain`), and GTEx normal-tissue\n\u003e safety (`use_gtex_safety`) — each adds a negative-result disposition and a\n\u003e per-candidate column. Binder developability scoring (Biopython ProtParam) is a\n\u003e ranking component; an ESM-2 → PCA embedding visualizer (`pip install -e \".[embed]\"`)\n\u003e shows the designed-binder sequence space before any GPU spend; and the report\n\u003e carries a Limitations section (mRNA ≠ surface protein, bulk-purity confounding).\n\u003e All are documented in the [CHANGELOG](CHANGELOG.md).\n\n## Status \u0026 roadmap\n\n- ✅ **v0.2.0** (current) — everything in v0.1.0 (discovery on real TCGA data; full design half — RFdiffusion + ProteinMPNN + Boltz-2, plus BindCraft / BoltzGen / Chai-1r / AF2-IG — on Modal / local Docker / Kaggle / Colab; rank + report + export; benchmark + held-out eval set; CLI **and** Snakemake front-ends; web UI) **plus** the first real de novo binders, the free Kaggle split-environment backend, the negative-result taxonomy, SURFACE-Bind targetable-site lookup, opt-in discovery-quality filters (AlphaFold-pLDDT disorder gate, UniProt extracellular-domain/topology restriction, GTEx normal-tissue safety), binder developability scoring, an ESM-2 pre-GPU embedding visualizer, and surfaced discovery caveats (mRNA ≠ surface protein, bulk-purity confounding).\n- ✅ **Rediscovery validation** — the discovery half, run on six real indication-matched TCGA cohorts, resurfaces **ERBB2 at rank 4** in HER2-enriched breast cancer (via PAM50 subtype stratification — versus rank 25 in the unsplit BRCA cohort, where averaging across subtypes dilutes the HER2 signal) and is specific (non-over-expressed antigens such as EGFR/CEA are correctly not surfaced). Reproducible artifacts in [`benchmarks/validation/`](benchmarks/validation/RESULTS.md); write-up in [`paper/validation/`](paper/validation/manuscript.md).\n- ✅ **De novo binder design validated** — the design half (RFdiffusion → ProteinMPNN → Boltz-2) run on a **free Kaggle Tesla P100** produced **20 real binders** against the ERBB2 extracellular **domain IV** (the clinically validated trastuzumab epitope): mean **ipTM 0.59**, best **0.84**, **50 %** of designs pass the ipTM ≥ 0.65 success bar (mean PAE-interaction 13.7 Å) — at **$0**, no local GPU. The real Boltz-2-predicted **complexes** (CIF) + FASTAs + per-design metrics are in [`benchmarks/designer_benchmark/RESULTS.md`](benchmarks/designer_benchmark/RESULTS.md); reproduce on a free GPU via [`RUN_FREE_GPU.md`](benchmarks/designer_benchmark/RUN_FREE_GPU.md).\n- ⏳ **v0.3.0** — single-cell RNA-seq input, async (non-blocking) Modal job submission, and extending the [designer benchmark](benchmarks/designer_benchmark/DESIGNER_BENCHMARK.md) from the committed `rfdiff_mpnn` arm to the full three-way comparison (BindCraft / BoltzGen need ≥24–32 GB GPUs, so those arms run on paid backends).\n- ⏳ **v1.0.0** — JOSS submission; multi-modal tumor-selectivity scoring (single-cell + co-expression + immunopeptidomics) to extend discovery beyond bulk differential expression.\n\nSee [ARCHITECTURE.md § Phased Roadmap](ARCHITECTURE.md#11-phased-roadmap) for details.\n\n## Install\n\n`bindsight` is not yet on PyPI. Install from source (Windows / macOS / Linux,\nPython 3.11+):\n\n```bash\ngit clone \u003crepo-url\u003e bindsight\ncd bindsight\npython -m venv .venv\n\n# Windows\n.venv\\Scripts\\activate\n\n# macOS / Linux\nsource .venv/bin/activate\n\npip install -e \".[dev,discover,report]\"\nbindsight --version\nbindsight doctor                # confirm install is clean\nbindsight demo                  # run the 60-second demo\n```\n\nFor Conda users, `envs/discover.yaml` provides the same set of dependencies:\n\n```bash\nmamba env create -f envs/discover.yaml\nmamba activate bindsight-discover\npip install -e \".[dev,report]\"\n```\n\n## Quickstart\n\n```bash\n# 1. Discover targets from a TCGA cohort (CPU only, ~10 minutes on a laptop)\nbindsight discover examples/tcga_luad.yaml --out runs/luad_v01\n\n# 2. Inspect the discovered targets\nbindsight report runs/luad_v01 --format html\nopen runs/luad_v01/report.html\n\n# 3. (v0.1+) Design binders for the top 5 targets via Colab GPU\nbindsight design runs/luad_v01 --backend colab --trajectories 50\n\n# 4. (v0.1+) Validate with Boltz-2\nbindsight validate runs/luad_v01 --backend colab --validator boltz2\n\n# 5. (v0.1+) Rank, report, export as RO-Crate\nbindsight rank runs/luad_v01\nbindsight report runs/luad_v01 --format html --include-binders\nbindsight export runs/luad_v01 --format ro-crate --out runs/luad_v01.crate.zip\n```\n\n## Repository layout\n\n```\nbindsight/                 # Python package\n├── io/                   # Parquet, FASTA, PDB, mmCIF, manifest readers\n├── deg/                  # pydeseq2 wrapper (+ optional R bridge)\n├── targets/              # Open Targets client + ENSG→UniProt fallback + GTEx safety\n├── surfaceome/           # SURFY filter + SURFACE-Bind client\n├── structures/           # AlphaFoldDB + RCSB/PDBe fetch; pLDDT + UniProt topology\n├── epitopes/             # SURFACE-Bind site lookup; fpocket fallback (v0.2)\n├── design/               # Designer plugin interface; developability + ESM-2 embeddings\n├── runners/              # Colab / Modal / Kaggle / local-Docker adapters\n├── validate/             # Boltz-2 default; Chai-1r, AF2-IG opt-in\n├── rank/                 # Multi-objective scoring\n├── benchmark/            # Rediscovery + designer-benchmark scoring harness\n├── pipelines/            # Discovery orchestrator (discover.py) + honesty caveats\n├── provenance/           # PROV-O JSON-LD schema + RO-Crate emitter\n├── report/               # HTML report template + Streamlit app\n├── config.py             # Pydantic run-configuration models\n└── cli.py                # Click entrypoint\n\nenvs/                     # Conda environment files (one per stage)\nexamples/                 # Example pipeline configs (TCGA-LUAD, etc.)\nbenchmarks/               # Held-out known-antigen eval set + validation \u0026 designer-benchmark harnesses\npaper/                    # JOSS + bioRxiv manuscripts and the validation write-up\ndata/                     # Local cache for auto-downloaded TCGA cohorts (gitignored)\ntests/                    # Pytest smoke + integration tests + fixtures\ndocs/                     # mkdocs-material site source\n.github/workflows/        # CI + Zenodo deposit on tag\n\nARCHITECTURE.md           # Architectural source of truth\nLICENSING.md              # Per-dependency license inventory\nCONTRIBUTING.md           # How to contribute\nCHANGELOG.md              # Per-version changes\nCITATION.cff              # Zenodo / GitHub citation metadata\nSnakefile                 # Snakemake DAG\npyproject.toml            # Python packaging\n```\n\n## Documentation\n\n- [ARCHITECTURE.md](ARCHITECTURE.md) — system design, module contracts, design rationale\n- [LICENSING.md](LICENSING.md) — per-dependency license inventory and commercial-use guidance\n- [CONTRIBUTING.md](CONTRIBUTING.md) — dev setup, testing, commit conventions\n- [CHANGELOG.md](CHANGELOG.md) — per-version changes\n- `docs/` — long-form docs (built with `mkdocs build`)\n\n## Acknowledgments\n\n`bindsight` is an opinionated wrapper. Real intellectual credit belongs to the upstream tool authors. See [LICENSING.md](LICENSING.md) for the full inventory; the work this builds on most directly:\n\n- [SURFACE-Bind](https://github.com/hamedkhakzad/SURFACE-Bind) (Khakzad et al., PNAS 2025) — the targetable-sites catalog that makes the bridge tractable\n- [pydeseq2](https://github.com/owkin/PyDESeq2) (Muzellec et al., Bioinformatics 2023) — Python DESeq2 implementation\n- [RFdiffusion](https://github.com/RosettaCommons/RFdiffusion) (Watson et al., Nature 2023) — backbone generation\n- [ProteinMPNN](https://github.com/dauparas/ProteinMPNN) (Dauparas et al., Science 2022) — sequence design\n- [Boltz-2](https://github.com/jwohlwend/boltz) (Wohlwend et al., 2025) — structure + affinity prediction\n- [BindCraft](https://github.com/martinpacesa/BindCraft) (Pacesa et al., Nature 2025) — one-shot binder design\n- [Snakemake](https://github.com/snakemake/snakemake) (Mölder et al., F1000Research 2021) — workflow orchestration\n\n## Citation\n\nIf you use `bindsight` in your work, please cite it via the Zenodo DOI:\n\n\u003e Wahba, M. A. R. (2026). *bindsight: a reproducible bridge from RNA-seq to de novo protein binder design* (v0.2.0). Zenodo. https://doi.org/10.5281/zenodo.20121496\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20121496.svg)](https://doi.org/10.5281/zenodo.20121496)\n\nBibTeX:\n\n```bibtex\n@software{wahba_bindsight_2026,\n  author       = {Wahba, Mikhaeel Atef Rizk},\n  title        = {bindsight: a reproducible bridge from RNA-seq to de novo protein binder design},\n  year         = {2026},\n  publisher    = {Zenodo},\n  version      = {v0.2.0},\n  doi          = {10.5281/zenodo.20121496},\n  url          = {https://doi.org/10.5281/zenodo.20121496},\n  orcid        = {https://orcid.org/0009-0006-1069-9558}\n}\n```\n\nGitHub also exposes a \"Cite this repository\" button on the right sidebar of the [repo page](https://github.com/mikhaeelatefrizk/bindsight) that auto-generates citations in BibTeX, APA, and other formats from [CITATION.cff](CITATION.cff). Please also cite the upstream tools you used (the per-run manifest emits a `software.bib` to make this easy).\n\n## About the author\n\n`bindsight` is built and maintained by **Mikhaeel Atef Rizk Wahba** — PharmD graduate of the German University in Cairo (GUC), currently finishing the Egyptian post-PharmD applied-pharmacy term (Imtiyaz). Earlier in 2026 he had a research rotation at the German International University in Berlin (GIU Berlin) where he picked up R / RStudio.\n\n- ORCID: [0009-0006-1069-9558](https://orcid.org/0009-0006-1069-9558)\n- GitHub: [@mikhaeelatefrizk](https://github.com/mikhaeelatefrizk)\n- Email: `mikhaeelatefrizk@proton.me`\n- Languages: Arabic (native), English (full professional), German (professional working ≈ B2), French, Russian\n\n### Sister projects on GitHub\n\n`bindsight` sits at the deep end of an ongoing bioinformatics portfolio:\n\n- **[bioinformatics-portfolio](https://github.com/mikhaeelatefrizk/bioinformatics-portfolio)** — an end-to-end bioinformatics portfolio with three subprojects, each fully reproducible from raw data to figures:\n  - [`01-rnaseq-fox-domestication`](https://github.com/mikhaeelatefrizk/bioinformatics-portfolio/tree/main/01-rnaseq-fox-domestication) — RNA-seq differential expression on GEO GSE76517, replicating the Kukekova et al. *PNAS* 2018 silver-fox domestication study\n  - [`02-tcga-survival-kidney-cancer`](https://github.com/mikhaeelatefrizk/bioinformatics-portfolio/tree/main/02-tcga-survival-kidney-cancer) — TCGA-KIRC clinical survival analysis identifying EPAS1 / HIF-2α as a prognostic biomarker (target of FDA-approved belzutifan)\n  - [`03-scrnaseq-pbmc-seurat`](https://github.com/mikhaeelatefrizk/bioinformatics-portfolio/tree/main/03-scrnaseq-pbmc-seurat) — Seurat v5 single-cell RNA-seq workflow on the 10x PBMC 3k dataset, recovering 8 immune populations\n- **[affect-labeling-review](https://github.com/mikhaeelatefrizk/affect-labeling-review)** — a pre-registered systematic review + meta-analysis of affect labeling (Lieberman et al. 2007 paradigm). Real random-effects meta-analysis (k=9), PRISMA 2020, RoB 2 / ROBINS-I, ~14,000-word manuscript, open data + open code, `.zenodo.json` for citable archival\n- **[awesome-protein-design-software](https://github.com/mikhaeelatefrizk/awesome-protein-design-software)** — curated list of protein-design / structure-prediction software (RFdiffusion, ProteinMPNN, Boltz, AlphaFold, ESMFold, etc.)\n- **[Awesome-Bioinformatics](https://github.com/mikhaeelatefrizk/Awesome-Bioinformatics)** — curated list of bioinformatics libraries and tools\n\n## License\n\n- **Code:** [GNU AGPL-3.0-or-later](LICENSE). You may use, study, modify, and\n  redistribute bindsight freely; if you distribute a modified version **or run it\n  as a network service**, you must make your source available under the same\n  license, with attribution preserved. See [LICENSING.md](LICENSING.md) for\n  component-level details (bindsight orchestrates external tools that keep their\n  own licenses).\n- **Documentation, manuscripts, figures, and generated results** (e.g. `paper/`):\n  [CC BY 4.0](paper/LICENSE) — reuse freely with attribution.\n\n© 2026 Mikhaeel Atef Rizk Wahba. Commercial licensing on other terms is available\nfrom the author on request.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmikhaeelatefrizk%2Fbindsight","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmikhaeelatefrizk%2Fbindsight","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmikhaeelatefrizk%2Fbindsight/lists"}