{"id":34883020,"url":"https://github.com/cafferychen777/flashdeconv","last_synced_at":"2026-01-13T20:50:27.899Z","repository":{"id":330499917,"uuid":"1114934837","full_name":"cafferychen777/flashdeconv","owner":"cafferychen777","description":"Fast Linear Algebra for Scalable Hybrid Deconvolution of Spatial Transcriptomics","archived":false,"fork":false,"pushed_at":"2026-01-04T13:01:57.000Z","size":6627,"stargazers_count":3,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-04T23:15:59.863Z","etag":null,"topics":["bioinformatics","cell-type-deconvolution","computational-biology","deconvolution","python","scanpy","single-cell","spatial-transcriptomics","visium"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cafferychen777.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-12T05:20:58.000Z","updated_at":"2026-01-04T13:02:01.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/cafferychen777/flashdeconv","commit_stats":null,"previous_names":["cafferychen777/flashdeconv"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/cafferychen777/flashdeconv","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cafferychen777%2Fflashdeconv","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cafferychen777%2Fflashdeconv/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cafferychen777%2Fflashdeconv/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cafferychen777%2Fflashdeconv/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cafferychen777","download_url":"https://codeload.github.com/cafferychen777/flashdeconv/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cafferychen777%2Fflashdeconv/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28400304,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-13T14:36:09.778Z","status":"ssl_error","status_checked_at":"2026-01-13T14:35:19.697Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","cell-type-deconvolution","computational-biology","deconvolution","python","scanpy","single-cell","spatial-transcriptomics","visium"],"created_at":"2025-12-26T02:58:13.396Z","updated_at":"2026-01-13T20:50:27.894Z","avatar_url":"https://github.com/cafferychen777.png","language":"Python","funding_links":[],"categories":["Image processing and segmentation","Preprocess","Software packages and methods","Software packages","Spatial Omics Methods \u0026 Tools"],"sub_categories":["Clinical Trial","Single cell multi-omics","Spatial transcriptomics","Spatial Transcriptomics Methods \u0026 Tools"],"readme":"# FlashDeconv\n\n**Fast Linear Algebra for Scalable Hybrid Deconvolution**\n\n[![PyPI version](https://img.shields.io/pypi/v/flashdeconv.svg)](https://pypi.org/project/flashdeconv/)\n[![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)\n[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n[![DOI](https://zenodo.org/badge/1114934837.svg)](https://doi.org/10.5281/zenodo.18109003)\n\n*Unlocking atlas-scale spatial biology with randomized numerical linear algebra.*\n\nFlashDeconv is a high-performance spatial transcriptomics deconvolution method designed for **atlas-scale** and **subcellular-resolution** platforms (Visium HD, Stereo-seq, Xenium). It leverages structure-preserving randomized sketching to estimate cell type proportions with linear scalability—processing **1 million spots in ~3 minutes** on commodity hardware.\n\n\u003e **Paper:** Chen Yang, Jun Chen, Xianyang Zhang. *FlashDeconv enables atlas-scale, multi-resolution spatial deconvolution via structure-preserving sketching*. bioRxiv, 2025. DOI: [10.64898/2025.12.22.696108](https://doi.org/10.64898/2025.12.22.696108)\n\u003e\n\u003e **Reproducibility:** To reproduce figures and benchmarks from the paper, visit the [flashdeconv-reproducibility](https://github.com/cafferychen777/flashdeconv-reproducibility) repository.\n\n---\n\n## Key Features\n\n- **Ultra-fast \u0026 Scalable:** Deconvolve **1 million spots in ~3 minutes**. Time and memory scale linearly O(N) with dataset size.\n- **Hardware Friendly:** No GPU required. Runs efficiently on laptops (e.g., 32GB RAM handles 1M spots).\n- **Rare Cell Detection:** Uses **leverage-score sampling** to preserve transcriptomically distinct but low-abundance cell types (e.g., Tuft cells, endothelial cells) that variance-based methods systematically miss.\n- **Spatially Aware:** Sparse graph Laplacian regularization ensures spatial coherence without the O(N²) cost of dense kernel methods.\n- **Visium HD Ready:** Specifically optimized for the extreme sparsity and scale of subcellular resolution technologies (2µm–16µm bin sizes).\n- **Statistically Rigorous:** Log-CPM normalization with leverage-weighted gene selection preserves both common and rare cell populations.\n\n---\n\n## Installation\n\n```bash\n# From PyPI (recommended)\npip install flashdeconv\n\n# With scanpy/anndata integration\npip install flashdeconv[io]\n```\n\n**For development:**\n\n```bash\n# From source\ngit clone https://github.com/cafferychen777/flashdeconv.git\ncd flashdeconv\npip install -e \".[dev]\"\n```\n\n**Requirements:** Python ≥ 3.9, numpy, scipy, numba. Optional: scanpy, anndata for AnnData workflow.\n\n---\n\n## Quick Start\n\n### With Scanpy/AnnData\n\n```python\nimport scanpy as sc\nimport flashdeconv as fd\n\nadata_st = sc.read_h5ad(\"visium_hd.h5ad\")\nadata_ref = sc.read_h5ad(\"reference.h5ad\")\n\nfd.tl.deconvolve(adata_st, adata_ref, cell_type_key=\"cell_type\")\n\nadata_st.obsm[\"flashdeconv\"]          # Cell type proportions\nsc.pl.spatial(adata_st, color=\"flashdeconv_Hepatocyte\")\n```\n\n### With NumPy\n\n```python\nfrom flashdeconv import FlashDeconv\n\nmodel = FlashDeconv(lambda_spatial=5000)\nproportions = model.fit_transform(Y, X, coords)  # (n_spots, n_cell_types)\n```\n\n---\n\n## Best Practices: Tuning `lambda_spatial`\n\nWhile FlashDeconv works well with defaults, **adjusting `lambda_spatial`** (spatial regularization strength) based on your platform's **spot size** and **counts-per-spot** significantly improves results.\n\n| Platform | Spot Size | Typical UMI/Spot | Recommended `lambda_spatial` | Rationale |\n|:---------|:----------|:-----------------|:----------------------------|:----------|\n| **Standard Visium** | 55µm | 10,000–30,000 | `1000–10000` (default: 5000) | Strong signal; minimal smoothing needed |\n| **Visium HD (16µm)** | 16µm | 200–2,000 | `5000–20000` | Moderate sparsity; leverage neighbors |\n| **Visium HD (8µm)** | 8µm | 50–500 | `10000–50000` | Very sparse; rely on spatial priors |\n| **Visium HD (2µm)** | 2µm | 1–10 | `50000–100000` | Extreme sparsity; heavy smoothing |\n| **Stereo-seq / Seq-Scope** | 0.5–1µm | 5–50 | `50000–200000` | Single-cell/subcellular resolution; extreme sparsity |\n\n\u003e **Note:**\n\u003e - If cell type maps look **\"salt-and-pepper\" noisy**, increase `lambda_spatial`\n\u003e - If maps look **overly blurred**, decrease `lambda_spatial`\n\u003e - Use `lambda_spatial=\"auto\"` for automatic tuning (may underestimate for real data; best for initial exploration)\n\u003e - For **non-grid layouts** (e.g., Xenium, MERFISH), set `spatial_method=\"knn\"` (default)\n\n---\n\n## Algorithm Under the Hood\n\nFlashDeconv reformulates spatial deconvolution as **Graph-Regularized Non-Negative Least Squares**, solved in a compressed \"sketch\" space via randomized numerical linear algebra (RandNLA):\n\n![FlashDeconv Framework](https://raw.githubusercontent.com/cafferychen777/flashdeconv/main/figures/figure1.jpeg)\n**Figure 1. Overview of the FlashDeconv framework.** (A) Input data preprocessing with Log-CPM normalization and gene selection. (B) Structure-preserving randomized sketching using leverage-score weighting to compress gene space while preserving rare cell signals. (C) Spatial graph construction and regularized optimization via Block Coordinate Descent. (D) Final cell type proportion estimates for each spatial location.\n\n### Three-Stage Framework\n\n1. **Preprocessing \u0026 Gene Selection**\n   - **Log-CPM normalization**: Stabilizes variance and prevents high-expression genes from dominating the sketch\n   - **Leverage-weighted gene selection**: Combines highly variable genes (HVGs) with cell-type-specific markers, weighted by statistical leverage scores. Unlike variance (which conflates abundance with informativeness), leverage scores identify genes that define **transcriptomically distinct directions**, preserving rare cell type markers.\n\n2. **Structure-Preserving Sketching**\n   - **Randomized projection**: Compress gene space (~20,000 genes → 512 dimensions) using CountSketch with **leverage-score importance sampling**\n   - **Johnson-Lindenstrauss guarantee**: Preserves Euclidean distances between cell type signatures with high probability\n   - **Key innovation**: Leverage-weighted sampling amplifies rare cell type markers relative to housekeeping genes, preventing signal loss during hash collisions\n\n3. **Spatial Graph Regularization**\n   - **Sparse graph Laplacian**: Constructs k-NN spatial graph (O(N) memory vs. O(N²) for dense kernels like CARD)\n   - **Numba-accelerated Block Coordinate Descent (BCD)**: Fast closed-form updates with non-negativity constraints\n   - **Linear scalability**: Spatial term complexity O(N·k) enables million-spot analysis\n\n### Why This Works\n\n- **Log-CPM** bounds dynamic range while preserving sparsity (log1p(0) = 0)\n- **Leverage scores** decouple biological identity from population abundance—markers of rare cell types (0.1% frequency) receive equal weight to abundant types (30% frequency)\n- **Sparse graph Laplacian** encodes spatial autocorrelation as a Gaussian Markov Random Field (GMRF) without dense matrix operations\n\n---\n\n## Benchmarks\n\nFlashDeconv exhibits **linear O(N) scaling** for both time and memory:\n\n| Dataset Size | Runtime | Memory | Hardware |\n|:-------------|:--------|:-------|:---------|\n| 10K spots | \u003c 1 sec | \u003c 1 GB | MacBook Pro M2 Max |\n| 100K spots | ~4 sec | ~2 GB | (32GB unified memory) |\n| 1M spots | ~3 min | ~21 GB | No GPU required |\n\n**Accuracy on Synthetic Benchmarks (Spotless suite)**:\n- **Pearson correlation**: 0.944 (mean across 56 datasets spanning 6 tissues)\n- **RMSE**: 0.065 (median)\n- **Rare cell detection (AUPR)**: 0.960 ± 0.036 (standard deviation)\n\n**Real-world validation**:\n- Mouse liver (Visium): JSD = 0.056, ranking 3rd among 13 methods\n- Melanoma tumor (Visium): JSD = 0.027, ranking 5th among 13 methods\n- Reference stability: Ranked 1st for robustness to different scRNA-seq protocols\n\nFlashDeconv matches top-tier Bayesian methods (Cell2Location, RCTD) on accuracy while accelerating inference by **orders of magnitude**.\n\n---\n\n## API Reference\n\n### fd.tl.deconvolve\n\n```python\nfd.tl.deconvolve(\n    adata_st,                        # Spatial AnnData\n    adata_ref,                       # Reference AnnData\n    cell_type_key=\"cell_type\",       # Column in adata_ref.obs\n    sketch_dim=512,\n    lambda_spatial=5000.0,\n    key_added=\"flashdeconv\",         # Key for results in adata_st\n    random_state=0,                  # Random seed for reproducibility\n    copy=False,                      # If True, return copy instead of inplace\n)\n```\n\n**Results stored in `adata_st`:**\n- `.obsm[\"flashdeconv\"]` — Cell type proportions (DataFrame)\n- `.obs[\"flashdeconv_dominant\"]` — Dominant cell type per spot\n- `.uns[\"flashdeconv_params\"]` — Parameters used\n\n### FlashDeconv Class\n\n```python\nclass FlashDeconv:\n    def __init__(\n        self,\n        sketch_dim=512,              # Sketch space dimension\n        lambda_spatial=5000.0,       # Spatial regularization (or \"auto\")\n        rho_sparsity=0.01,           # L1 sparsity penalty\n        n_hvg=2000,                  # Number of highly variable genes\n        n_markers_per_type=50,       # Marker genes per cell type\n        spatial_method=\"knn\",        # \"knn\", \"radius\", or \"grid\"\n        k_neighbors=6,               # k for k-NN graph\n        max_iter=100,                # BCD max iterations\n        tol=1e-4,                    # Convergence tolerance\n        preprocess=\"log_cpm\",        # \"log_cpm\", \"pearson\", or \"raw\"\n        random_state=0,              # Random seed for reproducibility\n        verbose=False,\n    ): ...\n\n    def fit(self, Y, X, coords, gene_names=None, cell_type_names=None) -\u003e self\n    def fit_transform(self, Y, X, coords, **kwargs) -\u003e np.ndarray\n    def get_cell_type_proportions(self) -\u003e np.ndarray\n    def get_abundances(self) -\u003e np.ndarray\n    def get_dominant_cell_type(self) -\u003e np.ndarray\n    def summary(self) -\u003e dict\n```\n\n### Parameters\n\n| Parameter | Type | Default | Description |\n|:----------|:-----|:--------|:------------|\n| `sketch_dim` | int | 512 | Dimension of sketch space (higher = more info, slower) |\n| `lambda_spatial` | float or \"auto\" | 5000.0 | Spatial regularization strength (see Best Practices) |\n| `rho_sparsity` | float | 0.01 | L1 sparsity penalty |\n| `n_hvg` | int | 2000 | Number of highly variable genes to select |\n| `n_markers_per_type` | int | 50 | Top markers per cell type |\n| `k_neighbors` | int | 6 | Neighbors for spatial graph |\n| `max_iter` | int | 100 | Maximum BCD iterations |\n| `tol` | float | 1e-4 | Convergence tolerance |\n| `preprocess` | str | \"log_cpm\" | Preprocessing: \"log_cpm\" (recommended), \"pearson\", or \"raw\" |\n| `random_state` | int | 0 | Random seed for reproducibility (scanpy convention) |\n\n### Attributes (After Fitting)\n\n| Attribute | Shape | Description |\n|:----------|:------|:------------|\n| `beta_` | (n_spots, n_cell_types) | Raw cell type abundances |\n| `proportions_` | (n_spots, n_cell_types) | Normalized proportions (sum to 1) |\n| `gene_idx_` | (n_selected,) | Indices of genes used |\n| `lambda_used_` | float | Actual λ value used |\n| `info_` | dict | Optimization info (converged, n_iterations, final_objective) |\n| `cell_type_names_` | array | Cell type names (if provided) |\n\n---\n\n## Input Data Formats\n\nFlashDeconv accepts multiple input formats:\n\n### Spatial Data (Y)\n- **NumPy array**: Dense (n_spots, n_genes)\n- **SciPy sparse matrix**: CSR/CSC format (recommended for Visium HD to reduce memory usage)\n- **AnnData**: `.X` or specified layer (e.g., `adata.layers[\"counts\"]`)\n\n### Reference (X)\n- **NumPy array**: Dense (n_cell_types, n_genes) signature matrix\n- **AnnData**: Automatically aggregated from single-cell data via `prepare_data()` using mean expression per cell type\n\n### Coordinates\n- **NumPy array**: (n_spots, 2) for 2D spatial coordinates, or (n_spots, 3) for 3D (e.g., z-stacked sections)\n- **From AnnData**: Automatically extracted from `.obsm[\"spatial\"]`, `.obsm[\"X_spatial\"]`, or `.obs[[\"x\", \"y\"]]`\n\n---\n\n## Citation\n\nIf you use FlashDeconv in your research, please cite:\n\n**Plain text:**\n\u003e Yang, C., Chen, J. \u0026 Zhang, X. FlashDeconv enables atlas-scale, multi-resolution spatial deconvolution via structure-preserving sketching. *bioRxiv* (2025). https://doi.org/10.64898/2025.12.22.696108\n\n**BibTeX:**\n```bibtex\n@article{yang2025flashdeconv,\n  title={FlashDeconv enables atlas-scale, multi-resolution spatial deconvolution via structure-preserving sketching},\n  author={Yang, Chen and Chen, Jun and Zhang, Xianyang},\n  journal={bioRxiv},\n  year={2025},\n  doi={10.64898/2025.12.22.696108},\n  url={https://doi.org/10.64898/2025.12.22.696108}\n}\n```\n\n---\n\n## Contributing\n\nContributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n---\n\n## License\n\nThis project is licensed under the [BSD-3-Clause License](LICENSE).\n\n---\n\n## Related Resources\n\n- **Paper Reproducibility:** [flashdeconv-reproducibility](https://github.com/cafferychen777/flashdeconv-reproducibility) — Complete code to reproduce all figures and benchmarks\n- **Documentation:** [ReadTheDocs](https://flashdeconv.readthedocs.io) *(coming soon)*\n- **Issues \u0026 Support:** [GitHub Issues](https://github.com/cafferychen777/flashdeconv/issues)\n\n---\n\n## Acknowledgments\n\nWe thank the developers of [Spotless](https://github.com/OmicsML/Spotless-Benchmark), [Cell2Location](https://github.com/BayraktarLab/cell2location), and [RCTD](https://github.com/dmcable/spacexr) for their benchmarking frameworks and methodological contributions to the spatial transcriptomics field.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcafferychen777%2Fflashdeconv","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcafferychen777%2Fflashdeconv","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcafferychen777%2Fflashdeconv/lists"}