{"id":48640645,"url":"https://github.com/msk-access/kreview","last_synced_at":"2026-06-05T20:00:54.973Z","repository":{"id":350050771,"uuid":"1205041832","full_name":"msk-access/kreview","owner":"msk-access","description":"Advanced cfDNA Fragmentomics Core Evaluation Engine","archived":false,"fork":false,"pushed_at":"2026-06-01T21:53:26.000Z","size":4397,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-01T22:21:25.046Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://msk-access.github.io/kreview/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/msk-access.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-08T15:21:03.000Z","updated_at":"2026-06-01T21:51:32.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/msk-access/kreview","commit_stats":null,"previous_names":["msk-access/kreview"],"tags_count":15,"template":false,"template_full_name":null,"purl":"pkg:github/msk-access/kreview","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msk-access%2Fkreview","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msk-access%2Fkreview/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msk-access%2Fkreview/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msk-access%2Fkreview/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/msk-access","download_url":"https://codeload.github.com/msk-access/kreview/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msk-access%2Fkreview/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33957499,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-05T02:00:06.157Z","response_time":120,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-09T19:00:43.160Z","updated_at":"2026-06-05T20:00:54.949Z","avatar_url":"https://github.com/msk-access.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://img.shields.io/github/v/tag/msk-access/kreview?label=Release\u0026color=FF9B42\" alt=\"Release Badge\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/nbdev-Enabled-blue.svg\" alt=\"nbdev Badge\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Powered_by-DuckDB-yellow.svg\" alt=\"DuckDB Badge\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Reports-Quarto-blueviolet.svg\" alt=\"Quarto Badge\"\u003e\n  \u003ca href=\"https://deepwiki.com/msk-access/kreview\"\u003e\u003cimg src=\"https://deepwiki.com/badge.svg\" alt=\"Ask DeepWiki\"\u003e\u003c/a\u003e\n  \n  \u003ch1\u003ekreview\u003c/h1\u003e\n  \u003cp\u003e\u003cb\u003eAdvanced cfDNA Fragmentomics Core Evaluation Engine\u003c/b\u003e\u003c/p\u003e\n\u003c/div\u003e\n\n---\n\n## 🧬 Overview\n\n`kreview` is a production-grade, notebook-first (`nbdev`) evaluation engine designed for high-throughput cancer liquid biopsy fragmentomics feature analysis. Developed at Memorial Sloan Kettering (MSKCC), it processes cohorts containing tens of thousands of samples using an embedded DuckDB query engine with chunked I/O and automatic retry logic.\n\n📖 **[Full Documentation](https://msk-access.github.io/kreview/)**\n\n## 🚀 Features\n\n- **5-Tier ctDNA Taxonomy**: MSK-IMPACT paired-inference to label `True ctDNA+`, `Possible ctDNA+`, `Possible ctDNA−`, `Healthy Normal`, and `Insufficient Data`. Optional CH hotspot demotion via `--ch-hotspot-maf`.\n- **DuckDB Dynamic Data Lake**: In-memory `read_parquet` bindings with chunked I/O and exponential backoff retry. Builds a merged SQL-queryable `kreview_lake.duckdb` on demand.\n- **Multi-Model Evaluation**: Logistic Regression, Random Forest, and XGBoost (CPU) plus TabPFN and TabICL (GPU) with Stratified K-Fold CV, SHAP explainability, and subgroup analysis.\n- **Feature Selection**: [mRMR](https://github.com/smazzanti/mrmr) (Minimum Redundancy Maximum Relevance) as default strategy — iteratively selects features maximizing target relevance while minimizing inter-feature redundancy. Legacy `hybrid_union` (AUC ∪ MI) also available.\n- **Multimodal Stacking**: Cross-evaluator fusion via super-matrix with Mutual Information or [Boruta-SHAP](https://github.com/Ekeany/Boruta-Shap) selection, followed by stacking ensemble + ablation analysis.\n- **Interactive Dashboards**: Plotly-native HTML reports with ROC curves, violin plots, SHAP beeswarm/waterfall, mRMR scatter plots, per-cancer-type sensitivity tables, and Decision Curve Analysis.\n- **Nextflow HPC Integration**: Decomposed multistage DAG for SLURM-based HPC execution with per-evaluator parallelism, GPU scheduling, and automatic retry logic.\n- **26 Built-In Evaluators**: Modular extractors covering fragment sizes (FSC, FSD, FSR), nucleosome protection (WPS, TFBS), cleavage motifs (EndMotif, BreakPointMotif), chromatin accessibility (ATAC), motif divergence (MDS), and orientation (OCF).\n\n## 🏗️ Pipeline Architecture\n\n```mermaid\ngraph LR\n    A[Label] --\u003e B[\"Extract ×N\"]\n    B --\u003e C[Select]\n    C --\u003e D[\"Eval CPU\"]\n    C --\u003e E[\"Eval GPU\"]\n    C --\u003e F[Fuse]\n    D --\u003e G[Scoreboard]\n    E --\u003e G\n    D --\u003e I[\"Eval Multimodal\"]\n    E --\u003e I\n    F --\u003e I\n    G --\u003e H[Report]\n    I --\u003e J[\"Report Multimodal\"]\n```\n\nThe pipeline supports two modes:\n\n| Mode | Command | Use Case |\n|------|---------|----------|\n| **Monolithic** | `kreview run` | Single-machine, sequential execution |\n| **Multistage** | `nextflow run ... -profile iris` | HPC parallelism, per-evaluator scatter |\n\n## ⚙️ Quick Start\n\n### Installation\n\n\u003e [!IMPORTANT]\n\u003e **Quarto is strictly required** for programmatic dashboard generation. Because `quarto-cli` wrapper packages are unreliable across Python environments, `kreview` assumes the Quarto executable is installed dynamically on your OS or container.\n\n#### Option 1: Docker (Recommended \"Batteries-Included\" Method)\nThe easiest way to run `kreview` without managing external dependencies is to use our pre-built Docker containers (hosted on GHCR). They ship with `Python 3.12`, all ML libraries, and `quarto`:\n```bash\n# CPU image (~1.5 GB) — for all standard pipeline processes\ndocker pull ghcr.io/msk-access/kreview:latest\n\n# GPU image (~8-10 GB) — adds PyTorch, TabPFN, TabICL (requires NVIDIA drivers)\ndocker pull ghcr.io/msk-access/kreview:latest-gpu\n\n# Run\ndocker run -v /your/data:/data ghcr.io/msk-access/kreview:latest \\\n  kreview run --cancer-samplesheet /data/cancer.csv ...\n```\n\n#### Option 2: Local Install (Pip)\nIf you install via pip, you **must separately install Quarto** via your OS manager:\n1. **Install Quarto:** Follow the [official Quarto Installation Guide](https://quarto.org/docs/get-started/) (e.g. `brew install quarto` on macOS).\n2. **Install kreview:**\n```bash\ngit clone https://github.com/msk-access/kreview.git\ncd kreview\npip install -e .          # CPU models only\npip install -e \".[gpu]\"   # + TabPFN, TabICL (requires CUDA)\n```\n\n### Running the Pipeline\n\n#### Local (Single Machine)\n\n```bash\nkreview run \\\n  --cancer-samplesheet \"/path/to/cancer/samplesheet.csv\" \\\n  --healthy-xs1-samplesheet \"/path/to/healthy/xs1/samplesheet.csv\" \\\n  --healthy-xs2-samplesheet \"/path/to/healthy/xs2/samplesheet.csv\" \\\n  --cbioportal-dir \"/path/to/cBioPortal_MAF_CNA_SV/\" \\\n  --krewlyzer-dir \"/path/to/unified_krewlyzer_results\" \\\n  --output output/ \\\n  --strategy mrmr \\\n  --top-percentile 10 \\\n  --compute-univariate-auc \\\n  --ch-hotspot-maf \"/path/to/ch_hotspots.maf\" \\\n  --export-duckdb\n```\n\n#### HPC (Nextflow + SLURM)\n\n```bash\nnextflow run /path/to/kreview/nextflow/main.nf \\\n  --cancer_samplesheet /path/to/cancer.csv \\\n  --healthy_xs1_samplesheet /path/to/healthy_xs1.csv \\\n  --healthy_xs2_samplesheet /path/to/healthy_xs2.csv \\\n  --cbioportal_dir /path/to/cbioportal/ \\\n  --krewlyzer_dir /path/to/manifest.txt \\\n  --outdir /path/to/output/ \\\n  --pipeline_mode multistage \\\n  --run_gpu_eval true \\\n  --gpu_models \"tabpfn,tabicl\" \\\n  --run_multimodal_eval true \\\n  -profile iris\n```\n\n### Dashboard Access\n\nOnce finished, open the generated HTML reports:\n```bash\nopen output/reports/ATAC_dashboard.html\n```\n\n## 🧪 Feature Selection\n\n| Strategy | Scope | Method | Default |\n|----------|-------|--------|---------|\n| `mrmr` | Single-evaluator | F-statistic relevance + Pearson redundancy penalty | ✅ |\n| `hybrid_union` | Single-evaluator | Top-X% AUC ∪ Top-X% MI | Legacy |\n| `mi` | Multimodal | Mutual Information top-K ranking | ✅ |\n| `boruta_shap` | Multimodal | SHAP importance vs shadow variables (50 trials) | Optional |\n\nSee [Statistical Evaluation](https://msk-access.github.io/kreview/machine-learning/statistical-tests/) for full documentation.\n\n## 📓 nbdev Architecture\n\nThis project operates as an `nbdev` repo. Do **not** edit `.py` scripts manually in `kreview/`. Build natively inside Jupyter notebooks within `nbs/` and trigger:\n```bash\nnbdev_export\n```\n\n## 📚 Resources\n\n- **[Documentation](https://msk-access.github.io/kreview/)** — Full user and developer guide\n- **[Contributing](CONTRIBUTING.md)** — How to contribute\n- **[Changelog](https://msk-access.github.io/kreview/changelog/)** — Version history\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmsk-access%2Fkreview","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmsk-access%2Fkreview","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmsk-access%2Fkreview/lists"}