{"id":49074642,"url":"https://github.com/jaydu1/crispyx","last_synced_at":"2026-04-20T09:31:14.339Z","repository":{"id":348017335,"uuid":"1073666619","full_name":"jaydu1/crispyx","owner":"jaydu1","description":"Streamlining CRISPR Screen Analysis","archived":false,"fork":false,"pushed_at":"2026-03-30T12:51:58.000Z","size":36371,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-30T14:22:55.971Z","etag":null,"topics":["crispr","differntial-gene-expression","on-disk","single-cell"],"latest_commit_sha":null,"homepage":"https://crispyx.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jaydu1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/contributing.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-10T12:51:03.000Z","updated_at":"2026-03-30T12:55:17.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/jaydu1/crispyx","commit_stats":null,"previous_names":["jaydu1/crispyx"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/jaydu1/crispyx","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaydu1%2Fcrispyx","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaydu1%2Fcrispyx/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaydu1%2Fcrispyx/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaydu1%2Fcrispyx/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jaydu1","download_url":"https://codeload.github.com/jaydu1/crispyx/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaydu1%2Fcrispyx/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32041165,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-20T00:18:06.643Z","status":"online","status_checked_at":"2026-04-20T02:00:06.527Z","response_time":94,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crispr","differntial-gene-expression","on-disk","single-cell"],"created_at":"2026-04-20T09:31:13.626Z","updated_at":"2026-04-20T09:31:14.332Z","avatar_url":"https://github.com/jaydu1.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# crispyx\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)\n[![PyPI](https://img.shields.io/pypi/v/crispyx?label=pypi\u0026color=orange)](https://pypi.org/project/crispyx)\n[![PyPI Downloads](https://static.pepy.tech/personalized-badge/crispyx?period=total\u0026units=INTERNATIONAL_SYSTEM\u0026left_color=BLACK\u0026right_color=BRIGHTGREEN\u0026left_text=downloads)](https://pepy.tech/projects/crispyx)\n[![Tests](https://github.com/jaydu1/crispyx/actions/workflows/tests.yml/badge.svg)](https://github.com/jaydu1/crispyx/actions/workflows/tests.yml)\n\n## Motivation\n\nGenome-wide CRISPR screens routinely produce datasets with hundreds of thousands of cells and tens of thousands of genes. Standard single-cell analysis toolkits (Scanpy, Pertpy) load the entire count matrix into memory, which can require 30–100+ GB of RAM and makes many screens impractical to analyse on commodity hardware or shared HPC nodes with per-job memory limits.\n\n**crispyx** solves this by streaming data directly from on-disk AnnData (`.h5ad`) files. Quality control, normalisation, pseudo-bulk aggregation, and differential expression all operate without materialising the full matrix in memory, so even the largest screens can be processed with modest resources.\n\n## Features\n\n- **Streaming QC \u0026 preprocessing** – Filter cells, perturbations, and genes; normalise and log-transform; all without loading the full matrix into memory\n- **Pseudo-bulk aggregation** – Average log expression and pseudo-bulk count matrices for effect size estimation\n- **Differential expression** – t-test, Wilcoxon rank-sum, and negative binomial GLM with apeGLM LFC shrinkage; multi-core support and adaptive memory management\n- **Dimension reduction** – Memory-efficient PCA and KNN graph construction on backed data\n- **Scanpy-compatible API \u0026 plotting** – Familiar `cx.pp`, `cx.pb`, `cx.tl`, and `cx.pl` namespaces; Scanpy-style rank genes plots, volcano, MA, PCA, UMAP, QC summaries, and overlap heatmaps\n- **Data preparation utilities** – Edit backed metadata without loading X; standardise gene names; normalise perturbation labels; auto-detect metadata columns\n- **HPC-ready** – Resume/checkpoint for long-running jobs; configurable `memory_limit_gb`; Docker and Singularity support\n\n## Quick Start\n\n```python\nimport crispyx as cx\n\n# Open dataset without loading into memory\nadata = cx.read_h5ad_ondisk(\"data/demo_benchmark.h5ad\")\n\n# Quality control with adaptive thresholds\nadata = cx.pp.qc_summary(\n    adata,\n    perturbation_column=\"perturbation\",\n    min_genes=5,\n    min_cells_per_perturbation=5,\n)\n\n# Differential expression\nadata = cx.tl.rank_genes_groups(\n    adata,\n    perturbation_column=\"perturbation\",\n    method=\"wilcoxon\",  # or \"t-test\", \"nb_glm\"\n)\n\n# Access results\nprint(adata.uns[\"rank_genes_groups\"])\nde_results = adata.uns[\"rank_genes_groups\"].load()\n```\n\nFor the full workflow (normalisation, PCA, pseudo-bulk, NB-GLM, LFC shrinkage, plotting, data preparation utilities), see the [Usage Guide](docs/usage.rst) and the [tutorial notebook](docs/crispyx_tutorial.ipynb).\n\n## Performance\n\nBenchmarked across 12 CRISPR screen datasets (21k–1.97M cells), crispyx consistently outperforms Scanpy, Pertpy/PyDESeq2, and edgeR in both speed and memory:\n\n| Metric | crispyx vs Scanpy | crispyx vs Pertpy/PyDESeq2 |\n|---|---|---|\n| **t-test** | **2–11× faster** | — |\n| **Wilcoxon** | **2–43× faster** | — |\n| **NB-GLM** | — | **2× faster**, completes where Pertpy OOMs |\n| **Peak memory** | **2–6× lower** | Runs within 64 GB where Pertpy exceeds 120 GB |\n| **Accuracy** | Pearson *r* \u003e 0.999 vs Scanpy | Pearson *r* \u003e 0.97 vs PyDESeq2 |\n\ncrispyx succeeds on **all 12 datasets**, while Scanpy times out or OOMs on the largest screens and Pertpy/edgeR fail on most genome-wide datasets.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"benchmarking/figures/benchmark_figure.png\" width=\"800\" alt=\"Benchmark results: crispyx vs reference methods\"\u003e\n\u003c/p\u003e\n\nSee [benchmarking/](benchmarking/) for full results and reproduction scripts.\n\n## Installation\n\n```bash\npip install -e .\n```\n\n## Benchmarking\n\n```bash\ncd benchmarking\n./run_benchmark.sh config/Adamson.yaml       # single dataset\n./run_benchmark.sh config/*.yaml             # all datasets\n```\n\nSee [benchmarking/README.md](benchmarking/README.md) for configuration options and output structure.\n\n## Testing\n\n```bash\npytest\n```\n\n## Documentation\n\n```bash\nsphinx-build docs docs/_build\n```\n\n## Acknowledgements\n\ncrispyx builds on the foundational work of [Scanpy](https://scanpy.readthedocs.io/) (Wolf *et al.*, 2018), [Pertpy](https://pertpy.readthedocs.io/), [PyDESeq2](https://pydeseq2.readthedocs.io/) (Muzellec *et al.*, 2023), and [AnnData](https://anndata.readthedocs.io/) (Virshup *et al.*, 2024). We gratefully acknowledge these projects for establishing the single-cell analysis ecosystem in Python; crispyx extends their APIs and algorithmic designs to enable memory-efficient, streaming computation for large-scale CRISPR screen datasets.\n\n## Contributing\n\nSuggestions, bug reports, and contributions are welcome! Please open an [issue](https://github.com/jaydu1/crispyx/issues) or submit a pull request.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjaydu1%2Fcrispyx","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjaydu1%2Fcrispyx","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjaydu1%2Fcrispyx/lists"}