https://github.com/mdozmorov/scrna-seq_notes
A list of scRNA-seq analysis tools
https://github.com/mdozmorov/scrna-seq_notes
scrna-seq single-cell
Last synced: about 1 year ago
JSON representation
A list of scRNA-seq analysis tools
- Host: GitHub
- URL: https://github.com/mdozmorov/scrna-seq_notes
- Owner: mdozmorov
- License: mit
- Created: 2018-08-30T01:59:19.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2025-03-05T19:18:24.000Z (over 1 year ago)
- Last Synced: 2025-04-07T03:14:44.313Z (about 1 year ago)
- Topics: scrna-seq, single-cell
- Language: R
- Homepage:
- Size: 1.9 MB
- Stars: 686
- Watchers: 30
- Forks: 154
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# scRNA-seq data analysis tools and papers
[](https://opensource.org/licenses/MIT) [](http://makeapullrequest.com)
Single-cell RNA-seq related tools and genomics data analysis resources. Tools are sorted by publication date, reviews and most recent publications on top. Unpublished tools are listed at the end of each section. Please, [contribute and get in touch](CONTRIBUTING.md)! See [MDmisc notes](https://github.com/mdozmorov/MDmisc_notes) for other programming and genomics-related notes. See [scATAC-seq_notes](https://github.com/mdozmorov/scATAC-seq_notes) for scATAC-seq related resources.
# Table of content
- [Awesome](#awesome)
- [Courses](#courses)
- [Tutorials](#tutorials)
- [Preprocessing pipelines](#preprocessing-pipelines)
- [Format conversion](#format-conversion)
- [Visualization pipelines](#visualization-pipelines)
- [Quality control](#quality-control)
- [Doublet, multiplet detection](#doublet-multiplet-detection)
- [Normalization](#normalization)
- [Integration, Batch correction](#integration-batch-correction)
- [Imputation](#imputation)
- [Dimensionality reduction](#dimensionality-reduction)
- [Clustering](#clustering)
- [Time, trajectory inference](#time-trajectory-inference)
- [Networks](#networks)
- [RNA velocity](#rna-velocity)
- [Differential expression](#differential-expression)
- [Differential abundance](#differential-abundance)
- [Downstream analysis]($downstream-analysis)
- [CNV](#cnv)
- [Splicing](#splicing)
- [Annotation, subpopulation identification](#annotation-subpopulation-identification)
- [Cell markers](#cell-markers)
- [Immune markers](#immune-markers)
- [Brain markers](#brain-markers)
- [Immuno-analysis](#immuno-analysis)
- [Cell-cell interactions](#cell-cell-interactions)
- [Phylogenetic inference](#phylogenetic-inference)
- [Simulation](#simulation)
- [Power](#power)
- [Benchmarking](#benchmarking)
- [Deep learning](#deep-learning)
- [Spatial transcriptomics](#spatial-transcriptomics)
- [Technology](#technology)
- [10X Genomics](#10x-genomics)
[10X QC](#10x-qc)
- [Data](#data)
- [Human](#human)
[Cancer](#cancer)
- [Mouse](#mouse)
- [Brain](#brain)
- [Links](#links)
- [Papers](#papers)
## Awesome
- Review of scRNA-seq cell type annotation methods. Table 1 - tools grouped by methods, each described in the text. Tale 2 - annotation databases, CellMaker and PangaloDB are the largest. Table 3 - tools and cell types used for benchmarking.
Paper
Pasquini, Giovanni. “Automated Methods for Cell Type Annotation on scRNA-Seq Data.” Computational and Structural Biotechnology Journal, 2021, 9. https://doi.org/10.1016/j.csbj.2021.01.015
- Overview of various steps and tools for data analysis of single cell transcriptomics (scRNA-seq), chromatin accessibility (scATAC-seq), surface protein (CITE-seq), antigen immune receptor repertoire (AIRR, TCR and BCR profiling), and spatial transcriptomics. QC, doublet removal ([scDoubletFinder](https://bioconductor.org/packages/scDblFinder/)), normalization ([Scran](https://bioconductor.org/packages/scran/)), batch removal ([Harmony](https://portals.broadinstitute.org/harmony/articles/quickstart.html) and others), cell cycle removal ([Tricycle](https://bioconductor.org/packages/tricycle/)), cell annotation, trajectory analysis ([dynguidelines](https://zouter.shinyapps.io/server/)), differential expression, gene set enrichment analysis, cell-cell communication. Data integration methods. [Glossary](https://www.nature.com/articles/s41576-023-00586-w#glossary). Extended version in the [Single-cell best practices](https://www.sc-best-practices.org/preamble.html) book.
Paper
Heumos, Lukas, Anna C. Schaar, Christopher Lance, Anastasia Litinetskaya, Felix Drost, Luke Zappia, Malte D. Lücken, et al. “Best Practices for Single-Cell Analysis across Modalities.” Nature Reviews Genetics, March 31, 2023. https://doi.org/10.1038/s41576-023-00586-w.
- [Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.](https://github.com/seandavi/awesome-single-cell) by Sean Davis
- [Notes on scATAC-seq analysis](https://github.com/crazyhottommy/scATACseq-analysis-notes) by Ming Tang
- [Dave Tand's blog, single cell posts](https://davetang.org/muse/category/single-cell-2/)
- [www.scrna-tools.org](https://www.scrna-tools.org/) - The scRNA-tools database
Paper
Zappia, Luke, and Fabian J. Theis. “Over 1000 Tools Reveal Trends in the Single-Cell RNA-Seq Analysis Landscape.” Genome Biology 22, no. 1 (December 2021): 301. https://doi.org/10.1186/s13059-021-02519-4.
Zappia, Luke, Belinda Phipson, and Alicia Oshlack. "[Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database" https://doi.org/10.1371/journal.pcbi.1006245 PLoS computational biology, June 25, 2018
- [awesome-10x-genomics](https://github.com/johandahlberg/awesome-10x-genomics) - List of tools and resources related to the 10x Genomics GEMCode/Chromium system
- [awesome-deep-learning-single-cell-papers](https://github.com/OmicsML/awesome-deep-learning-single-cell-papers) - categorized list of latest scRNA-seq papers using deep learning
- [awesome-vdj](https://github.com/slowkow/awesome-vdj) - Tools and databases for analyzing HLA and VDJ genes, by [slowkow](https://github.com/slowkow)
- [SingleCell Omics](https://docs.google.com/spreadsheets/d/1IPe2ozb1Mny8sLvJaSE57RJr3oruiBoSudAVhSH-O8M/edit#gid=11468010) - A Google Doc with a structured collection of scRNA-seq methods, software, and many other scRNA-seq information, by @albertvilella
- [R_packages_for_scRNA-seq.pdf](R_packages_for_scRNA-seq.pdf) - Bioconductor software packages for single-cell analysis.
Paper
Amezquita, Robert A., Aaron T. L. Lun, Etienne Becht, Vince J. Carey, Lindsay N. Carpp, Ludwig Geistlinger, Federico Martini, et al. "Orchestrating Single-Cell Analysis with Bioconductor" https://doi.org/10.1038/s41592-019-0654-x Nature Methods, December 2, 2019.
- [single-cell-pseudotime](https://github.com/agitter/single-cell-pseudotime) - an overview of single-cell RNA-seq pseudotime estimation algorithms, comprehensive collection of links to software and accompanying papers, by Anthony Gitter
## Courses
- [Single-cell best practices](https://www.sc-best-practices.org/) - scRNA-seq analysis with Python/scanpy in. From preprocessing to trajectory inference, network analysis, immune receptor profiling, and more. [GitHub](https://github.com/theislab/single-cell-best-practices) with Jupyter notebooks/code.
- Review of single-cell transcriptomics technologies and analysis steps and software. Sample preparation, scRNA-seq preprocessing, QC, normalization, batch correction, dimensionaliry reduction. Downstream analysis on cell level (clustering, trajectory inference), gene level (differential expression, functional enrichment, network analysis). Table 1 - preprocessing pipelines and tools, brief description. Table 2 - clustering algorithms.
Paper
Nayak, Richa, and Yasha Hasija. “A Hitchhiker’s Guide to Single-Cell Transcriptomics and Data Analysis Pipelines.” Genomics 113, no. 2 (March 2021): 606–19. https://doi.org/10.1016/j.ygeno.2021.01.007.
- [Orchestrating Single-Cell Analysis with Bioconductor](http://bioconductor.org/books/release/OSCA/) - scRNA-seq analysis overview within Bioconductor ecosystem, bookdown. SingleCellexperiment, scran and scater examples. Table S1 - summary of packages for data input, infrastructure, QC, integration, dimensionality reduction, clustering, pseudotime, differential expression, functional enrichment, simulation, benchmarking data, and data packages. Types of feature selection. Associated GitHub repos: [OrchestratingSingleCellAnalysis](https://github.com/Bioconductor/OrchestratingSingleCellAnalysis) [OSCABase](https://github.com/Bioconductor/OSCABase), [OrchestratingSingleCellAnalysis](https://github.com/seandavi/OrchestratingSingleCellAnalysis). [simpleSingleCell](https://bioconductor.org/packages/simpleSingleCell/) R package, a step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor, by Aaron Lun et al., [rendered version](http://bioinformatics.age.mpg.de/presentations-tutorials/presentations/modules/single-cell//bioconductor_tutorial.html).
Paper
Amezquita RA, Lun AT, Becht E, Carey VJ, Carpp LN, Geistlinger L, Marini F, Rue-Albrecht K, Risso D, Soneson C, Waldron L. "Orchestrating single-cell analysis with Bioconductor" https://doi.org/10.1038/s41592-019-0654-x Nature methods. 2020 Feb;17(2):137-45.
- [Analysis of single cell RNA-seq data, www.singlecellcourse.org](https://scrnaseq-course.cog.sanger.ac.uk/website/index.html) - step-by-step scRNA-seq analysis course. R-based, with code examples, explanations, exercises. From alignment (STAR) and QC (FASTQC) to introduction to R, SingleCellExperiment class, `scater` object, data exploration (reads, UMI), filtering, normalization (`scran`), batch effect removal (`RUV`, `ComBat`, `mnnCorrect`, GLM, `Harmony`), clustering and marker gene identification (`SINCERA`, `SC3`, tSNE, `Seurat`), feature selection (`M3Drop::M3DropConvertData`, `BrenneckeGetVariableGenes`), pseudotime analysis (`TSCAN`, `Monocle`, diffusion maps, `SLICER`, `Ouija`, `destiny`), imputation (`scImpute`, `DrImpute`, `MAGIC`), differential expression (Kolmogorov-Smirnov, Wilcoxon, `edgeR`, `Monocle`, `MAST`), data integration (`scmap`, cell-to-cell mapping, `Metaneighbour`, `mnnCorrect`, `Seurat`'s canonical correllation analysis). Search for scRNA-seq data ([scfind](https://github.com/hemberg-lab/scfind) R package), as well as [Hemberg group’s public datasets](https://hemberg-lab.github.io/scRNA.seq.datasets/). [Seurat chapter](https://scrnaseq-course.cog.sanger.ac.uk/website/seurat-chapter.html). ["Ideal" scRNA-seq pipeline](https://scrnaseq-course.cog.sanger.ac.uk/website/ideal-scrnaseq-pipeline-as-of-oct-2017.html). [Video lectures](https://www.youtube.com/watch?list=PLEyKDyF1qdOYAhwU71qlrOXYsYHtyIu8n&v=56n77bpjiKo).
Paper
Andrews, Tallulah S., Vladimir Yu Kiselev, Davis McCarthy, and Martin Hemberg. "Tutorial: Guidelines for the Computational Analysis of Single-Cell RNA Sequencing Data." https://doi.org/10.1038/s41596-020-00409-w Nature Protocols, December 7, 2020.
- [ANALYSIS OF SINGLE CELL RNA-SEQ DATA](https://broadinstitute.github.io/2019_scWorkshop/index.html) course by Orr Ashenberg, Dana Silverbush, Kirk Gosik
### Tutorials
- [Single-cell RNA-seq data analysis workshop](https://github.com/hbctraining/scRNA-seq_online) by the Harvard Chan Bioinformatics Core. [Lessons](https://github.com/hbctraining/scRNA-seq_online/tree/master/lessons) - hands-on Introduction to single-cell RNA-seq analysis using Seurat/RStudio, starting from count matrices.
- [Rockfeller University scRNAseq tutorial](https://rockefelleruniversity.github.io/scRNA-seq/)
- [ASAP: Full pipeline on a project imported from the Human Cell Atlas](https://asap.epfl.ch/home/tutorial?t=full_pipeline)
- [Introduction to single-cell RNA-seq technologies](https://figshare.com/articles/Introduction_to_single-cell_RNA-seq_technologies/7704659/1), presentation by Lior Pachter. Key figures, references, statistics. [Slides](https://figshare.com/articles/Introduction_to_single-cell_RNA-seq_technologies/7704659/1), and [notes](https://liorpachter.wordpress.com/2019/02/19/introduction-to-single-cell-rna-seq-technologies/)
- [Machine learning for single cell analysis workshop](https://www.krishnaswamylab.org/workshop). Presentations (Google Slides) and Jupyter notebooks run on Google Colab
- [Pbmc dataset Roche analysis](https://almutlue.github.io/batch_dataset/pbmc_roche.html#)
- [Preprocessing and normalization of single-cell RNA-seq droplet data](https://kkorthauer.org/fungeno2019/singlecell/vignettes/1.2-preprocess-droplet.html)
## Preprocessing pipelines
- Assessment of 9 preprocessing pipelines (Cell Ranger, Optimus, salmon alevin, kallisto bustools, dropSeqPipe, scPipe, zUMIs, celseq2 and scruff) on 10X and CEL-Seq2 datasets ([scmixology](https://github.com/LuyiTian/sc_mixology) and others, 9 datasets total). All pipelines coupled with performant post-processing (normalization, filtering, etc.) produce comparable data quality in terms of clustering/agreement with known cell types. Low-expressed genes are discordant. Details and specific results of each pipeline. [GitHub with pre-/postprocessing scripts](https://github.com/YOU-k/preprocess)
Paper
You, Yue, Luyi Tian, Shian Su, Xueyi Dong, Jafar S Jabbari, Peter F Hickey, and Matthew E Ritchie. "Benchmarking UMI-Based Single Cell RNA-Sequencing Preprocessing Workflows" https://doi.org/10.1186/s13059-021-02552-3 Genome Biology. 14 December 2021
- [Single cell current best practices tutorial, GitHub](https://github.com/theislab/single-cell-tutorial). QC (count depth, number of genes, % mitochondrial), normalization (global, downsampling, nonlinear), data correction (batch, denoising, imputation), feature selection, dimensionality reduction (PCA, diffusion maps, tSNE, UMAP), visualization, clustering (k-means, graph/community detection), annotation, trajectory inference (PAGA, Monocle), differential analysis (DESeq2, EdgeR, MAST), gene regulatory networks. Description of the bigger picture at each step, latest tools, their brief description, references. R-based Scater as the full pipeline for QC and preprocessing, Seurat for downstream analysis, scanpy Python pipeline. Links and refs to other tutorials.
Paper
Luecken, Malte D., and Fabian J. Theis. "Current Best Practices in Single-Cell RNA-Seq Analysis: A Tutorial" https://doi.org/10.15252/msb.20188746 Molecular Systems Biology 15, no. 6 (June 19, 2019)
- [Alevin](https://github.com/COMBINE-lab/salmon) - end-to-end droplet-based scRNA-seq (10X Genomics) processing pipeline performing cell barcode detection (two-step whitelisting procedure), read mapping, UMI deduplication (parsimonious UMI graphs, PUGs), resolving multimapped reads (EM method to resolve UMI collisions), gene count estimation. Intelligently handles UMI deduplication and multimapped reads, resulting in more accurate gene abundance estimation. Input - sample-demultiplexed FASTQ, output - gene-level UMI counts. Compared against the Cell Ranger, dropEst, STAR and featureCount-based pipelines, UMI-tools, alevin is more accurate and quantifies a greater proportion of sequenced data, especially on combined genomes. Approx. 21X faster than Cell Ranger, low memory requirements, 10-12 threads optimal. C++ implementation, part of [Salmon](https://github.com/COMBINE-lab/salmon). [Alevin documentation](https://salmon.readthedocs.io/en/latest/alevin.html), [Tutorials](https://combine-lab.github.io/alevin-tutorial/#blog) that include visualization options.
Paper
Srivastava, Avi, Laraib Malik, Tom Smith, Ian Sudbery, and Rob Patro. "Alevin Efficiently Estimates Accurate Gene Abundances from DscRNA-Seq Data" https://doi.org/10.1186/s13059-019-1670-y Genome Biology, (December 2019)
- [bigSCale](https://github.com/iaconogi/bigSCale) - scalable analytical framework to analyze large scRNA-seq datasets, UMIs or counts. Pre-clustering, convolution into iCells, final clustering, differential expression, biomarkers.Correlation metric for scRNA-seq data based on converting expression to Z-scores of differential expression. Robust to dropouts. Matlab implementation. [Data, 1847 human neuronal progenitor cells](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE102934)
Paper
Iacono, Giovanni, Elisabetta Mereu, Amy Guillaumet-Adkins, Roser Corominas, Ivon Cuscó, Gustavo Rodríguez-Esteban, Marta Gut, Luis Alberto Pérez-Jurado, Ivo Gut, and Holger Heyn. "BigSCale: An Analytical Framework for Big-Scale Single-Cell Data." https://doi.org/10.1101/gr.230771.117 Genome Research 28, no. 6 (June 2018): 878–90.
- [CALISTA](https://github.com/CABSEL/CALISTA) - clustering, lineage reconstruction, transition gene identification, and cell pseudotime single cell transcriptional analysis. Analyses can be all or separate. Uses a likelihood-based approach based on probabilistic models of stochastic gene transcriptional bursts and random technical dropout events, so all analyses are compatible with each other. Input - a matrix of normalized, batch-removed log(RPKM) or log(TPM) or scaled UMIs. Methods detail statistical methodology. Matlab and R version
Paper
Papili Gao N, Hartmann T, Fang T, Gunawan R. [CALISTA: Clustering and LINEAGE Inference in Single-Cell Transcriptional Analysis" https://doi.org/10.3389/fbioe.2020.00018 Frontiers in bioengineering and biotechnology. 2020 Feb 4;8:18.
- [demuxlet](https://github.com/statgen/demuxlet) - Introduces the ‘demuxlet’ algorithm, which enables genetic demultiplexing, doublet detection, and super-loading for droplet-based scRNA-seq. Recommended approach when samples have distinct genotypes
Paper
Kang, Hyun Min, Meena Subramaniam, Sasha Targ, Michelle Nguyen, Lenka Maliskova, Elizabeth McCarthy, Eunice Wan, et al. "Multiplexed Droplet Single-Cell RNA-Sequencing Using Natural Genetic Variation." https://doi.org/10.1038/nbt.4042 Nature Biotechnology 36, no. 1 (January 2018): 89–94.
- [dropEst](https://github.com/hms-dbmi/dropEst) - pipeline for pre-processing, mapping, QCing, filtering, and quantifying droplet-based scRNA-seq datasets. Input - FASTQ or BAM, output - an R-readable molecular count matrix. Written in C++
Paper
Petukhov, Viktor, Jimin Guo, Ninib Baryawno, Nicolas Severe, David T. Scadden, Maria G. Samsonova, and Peter V. Kharchenko. "DropEst: Pipeline for Accurate Estimation of Molecular Counts in Droplet-Based Single-Cell RNA-Seq Experiments." https://doi.org/10.1186/s13059-018-1449-6 Genome Biology 19, no. 1 (December 2018): 78.
- [kallistobus](https://www.kallistobus.tools/) - fast pipeline for scRNA-seq processing. New BUS (Barcode, UMI, Set) format for storing and manipulating pseudoalignment results. Includes RNA velocity analysis. Python-based
Paper
Melsted, Páll, A. Sina Booeshaghi, Fan Gao, Eduardo da Veiga Beltrame, Lambda Lu, Kristján Eldjárn Hjorleifsson, Jase Gehring, and Lior Pachter. "Modular and Efficient Pre-Processing of Single-Cell RNA-Seq." https://doi.org/10.1101/673285 Preprint. Bioinformatics, June 17, 2019.
- [PyMINEr](https://www.sciencescott.com/pyminer) - Python-based scRNA-seq processing pipeline. Cell type identification, detection of cell type-enriched genes, pathway analysis, co-expression networks and graph theory approaches to interpreting gene expression. Notes on methods: modified K++ clustering, automatic detection of the number of cell types, co-expression and PPI networks. Input: .txt or .hdf5 files. Detailed analysis of several pancreatic datasets
Paper
Tyler, Scott R., Pavana G. Rotti, Xingshen Sun, Yaling Yi, Weiliang Xie, Michael C. Winter, Miles J. Flamme-Wiese, et al. "PyMINEr Finds Gene and Autocrine-Paracrine Networks from Human Islet ScRNA-Seq." https://doi.org/10.1016/j.celrep.2019.01.063 Cell Reports 26, no. 7 (February 2019): 1951-1964.e8.
- [Scanpy](https://github.com/theislab/scanpy) - Python-based pipeline for preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression and network simulation
Paper
Wolf, F. Alexander, Philipp Angerer, and Fabian J. Theis. "SCANPY: Large-Scale Single-Cell Gene Expression Data Analysis." https://doi.org/10.1186/s13059-017-1382-0 Genome Biology 19, no. 1 (06 2018): 15.
- [STQ](https://github.com/TheJacksonLaboratory/STQ) - PDX spatial transcriptomics Nextflow pipeline. 10x Genomics Visium with matched H&E images. Xenome read classification, spatial gene expression with Space Ranger, extraction of B-allele frequencies (can be used for CNV inference), splicing quantivication with Velocyto, image segmentation with Inception, StarDist, or HoVer-Net CNN. Can be applied to one-species samples. Optimized for SLURM HPC.
Paper
Domanskyi, Sergii, Anuj Srivastava, Jessica Kaster, Haiyin Li, Meenhard Herlyn, Jill C. Rubinstein, and Jeffrey H. Chuang. “Nextflow Pipeline for Visium and H&E Data from Patient-Derived Xenograft Samples.” Preprint. Bioinformatics, July 30, 2023. https://doi.org/10.1101/2023.07.27.550727.
- [RAPIDS & Scanpy Single-Cell RNA-seq Workflow](https://github.com/clara-parabricks/rapids-single-cell-examples/blob/master/notebooks/hlca_lung_gpu_analysis.ipynb) - real-time analysis of scRNA-seq data on GPU. [Tweet](https://twitter.com/johnny_israeli/status/1265762506993135618?s=20)
- [SCRAT](https://github.com/zji90/SCRAT) - a Single-Cell Regulome Analysis Toolbox R package and a [Shiny web service](https://zhiji.shinyapps.io/scrat). scRNA-seq and scATAC-seq analyses. Input - BAM files. Summarizes regulatory activities on gene or transcription factor binding sites (by ENCODE clusters, motifs, DHSs, genes, MSigDb gene sets, or custom genomic features), clustering, cell annotation, differential gene/TF activity analysis. [Supplementary Table S1](https://academic.oup.com/bioinformatics/article/33/18/2930/3823309#supplementary-data) compares with other tools. Supplementary results demonstrate application to human and mouse ESC data.
Paper
Ji, Zhicheng, Weiqiang Zhou, and Hongkai Ji. “Single-Cell Regulome Data Analysis by SCRAT.” Edited by Inanc Birol. Bioinformatics 33, no. 18 (September 15, 2017): 2930–32. https://doi.org/10.1093/bioinformatics/btx315.
- [scPipe](https://bioconductor.org/packages/release/bioc/html/scPipe.html) - A preprocessing pipeline for single cell RNA-seq data that starts from the fastq files and produces a gene count matrix with associated quality control information. It can process fastq data generated by CEL-seq, MARS-seq, Drop-seq, Chromium 10x and SMART-seq protocols. Modular, can swap tools like use different aligners.
Paper
Tian et al. "[scPipe: A flexible R/Bioconductor preprocessing pipeline for single-cell RNA-sequencing data" https://doi.org/10.1371/journal.pcbi.1006361 PLOS Computational Biology, 2018.
- [SEQC](https://github.com/ambrosejcarr/seqc) - Single-Cell Sequencing Quality Control and Processing Software, a general purpose method to build a count matrix from single cell sequencing reads, able to process data from inDrop, drop-seq, 10X, and Mars-Seq2 technologies.
Paper
Azizi, Elham, Ambrose J. Carr, George Plitas, Andrew E. Cornish, Catherine Konopacki, Sandhya Prabhakaran, Juozas Nainys, et al. "Single-Cell Map of Diverse Immune Phenotypes in the Breast Tumor Microenvironment." https://doi.org/10.1016/j.cell.2018.05.060 Cell, June 2018.
- [zUMIs](https://github.com/sdparekh/zUMIs) - scRNA-seq processing pipeline that handles barcodes and summarizes UMIs using exonic or exonic + intronic mapped reads (improves clustering, DE detection). Adaptive downsampling of oversequenced libraries. STAR aligner, Rsubread::featureCounts counting UMIs in exons and introns.
Paper
Parekh, Swati, Christoph Ziegenhain, Beate Vieth, Wolfgang Enard, and Ines Hellmann. "ZUMIs - A Fast and Flexible Pipeline to Process RNA Sequencing Data with UMIs." https://doi.org/10.1093/gigascience/giy059 GigaScience 7, no. 6 (01 2018).
- [ramdaq](https://github.com/rikenbit/ramdaq) - pipeline to analyze data from full-length single-cell RNA sequencing (scRNA-seq) methods. [Documentation](https://github.com/rikenbit/ramdaq/tree/master/docs)
- STAR alignment parameters: `–outFilterType BySJout, –outFilterMultimapNmax 100, –limitOutSJcollapsed 2000000 –alignSJDBoverhangMin 8, –outFilterMismatchNoverLmax 0.04, –alignIntronMin 20, –alignIntronMax 1000000, –readFilesIn fastqrecords, –outSAMprimaryFlag AllBestScore, –outSAMtype BAM Unsorted`. From Azizi et al., “Single-Cell Map of Diverse Immune Phenotypes in the Breast Tumor Microenvironment.”
### Format conversion
- [sceasy](https://github.com/cellgeni/sceasy) - R package to convert different single-cell data formats to each other, supports Seurat, SingleCellExperiment, AnnData, Loom
- [scKirby](https://github.com/neurogenomics/scKirby) - R package for automated ingestion and conversion of various single-cell data formats (SingleCellExperiment, SummarizedExperiment, HDF5SummarizedExperiment, Seurat, H5Seurat, anndata, loom, loomR, CellDataSet/monocle, ExpressionSet, and more).
- [zellkonverter](https://bioconductor.org/packages/zellkonverter/) - R package for conversion between scRNA-seq objects (the Bioconductor SingleCellExperiment data structure and the Python AnnData-based single-cell analysis environment). [Tweet](https://twitter.com/tangming2005/status/1466865990667542536?s=20)
### Visualization pipelines
- [ShinyCell](https://github.com/SGDDNB/ShinyCell) - R package to convert single-cell RNA-seq data into Shiny-based apps to share and visually explore the data. Input - h5ad, loom, SCE, Seurat object. Output - processed files and Shiny scripts. [Example 1](http://shinycell1.ddnetbio.com/), [Example 2](http://shinycell2.ddnetbio.com/).
Paper
Ouyang, John F., Uma S. Kamaraj, Elaine Y. Cao, and Owen JL Rackham. "ShinyCell: simple and sharable visualization of single-cell gene expression data." Bioinformatics 37, no. 19 (1 October 2021): https://doi.org/10.1093/bioinformatics/btab209
- [Kana](https://www.jkanche.com/kana/) - single-cell analysis in the browser, by Jayaram Kancherla ([@jkanche](https://github.com/jkanche)), Aaron Lun ([@LTLA](https://github.com/LTLA)). Client-side computations using the WebAssembly framework. Input - 10X genomics CellRanger's output (Matrix Market format), csv matrix or .h5 files. Preprocessing (removal of low-quality cells, Normalization and log-transformation, Modelling of the mean-variance trend across genes), PCA, Clustering (t-SNE/UMAP), Marker detection, custom cluster definition and marker analysis. Works with scATAC-seq data. [GitHub](https://github.com/jkanche/kana), [Tweet](https://twitter.com/jayaram/status/1480599647039016962?s=20).
Paper
Lun, Aaron, and Jayaram Kancherla. “Powering Single-Cell Analyses in the Browser with WebAssembly.” Preprint. Bioinformatics, March 4, 2022. https://doi.org/10.1101/2022.03.02.482701.
- [cellxgene](https://github.com/chanzuckerberg/cellxgene) - An interactive exploratory visualization tool for single-cell transcriptomics data, web and desktop versions. Input - matrix-form datasets, metadata, pre-computed embeddings/clustering. Compatible with Seurat, Scanpy, Bioconductor, scVI [GitHub](https://github.com/chanzuckerberg/cellxgene)
Paper
Megill, Colin, Bruce Martin, Charlotte Weaver, Sidney Bell, Lia Prins, Seve Badajoz, Brian McCandless, et al. "Cellxgene: A Performant, Scalable Exploration Platform for High Dimensional Sparse Matrices" https://doi.org/10.1101/2021.04.05.438318 Preprint. Systems Biology, April 6, 2021.
- [iCellR](https://github.com/rezakj/iCellR) - Single (i) Cell R package (iCellR) is an interactive R package to work with high-throughput single cell sequencing technologies (i.e scRNA-seq, scVDJ-seq and CITE-seq).
Paper
Khodadadi-Jamayran, Alireza, Joseph Pucella, Hua Zhou, Nicole Doudican, John Carucci, Adriana Heguy, Boris Reizis, and Aristotelis Tsirigos. "ICellR: Combined Coverage Correction and Principal Component Alignment for Batch Alignment in Single-Cell Sequencing Analysis" https://www.biorxiv.org/content/10.1101/2020.03.31.019109v1.full BioRxiv, April 1, 2020
- [Cerebro](https://github.com/romanhaa/Cerebro) - interactive scRNA-seq visualization from a Seurat object (v2 or 3), dimensionality reduction, clustering, identification and visualization of marker genes, enriched pathways (EnrichR), signatures (MSigDb), expression of individual genes. [cerebroPrepare](https://github.com/romanhaa/cerebroPrepare) R package saves the Seurat object, to be visualized with [cerebroApp Shiny app](https://github.com/romanhaa/cerebroApp). Standalone and Docker versions are available. [GitHub](https://github.com/romanhaa/Cerebro).
Paper
Hillje, Roman, Pier Giuseppe Pelicci, and Lucilla Luzi. "Cerebro: Interactive Visualization of ScRNA-Seq Data" https://doi.org/10.1093/bioinformatics/btz877 Bioinformatics, 1 April 2020
- [iS-CellR](https://github.com/immcore/iS-CellR) - a Shiny app for scRNA-seq analysis. Can be insalled locally, run from GitHub, Docker. Input - count matrix. Filtering, normalization, dimensionality reduction, clustering, differential expression, co-expression, reports.
Paper
Patel, Mitulkumar V. "IS-CellR: A User-Friendly Tool for Analyzing and Visualizing Single-Cell RNA Sequencing Data" https://doi.org/10.1093/bioinformatics/bty517 Bioinformatics 34, no. 24 (December 15, 2018)
- [iSEE](https://github.com/kevinrue/iSEEWorkshop2019) - Shiny app for interactive visualization of SummarizedExperiment scRNA-seq objects. [GitHub](https://github.com/csoneson/iSEE), [RNA-seq blog post](https://www.rna-seqblog.com/isee-an-interactive-shiny-based-graphical-user-interface-for-exploring-data-stored-in-summarizedexperiment-objects/), [Workshop](https://github.com/kevinrue/iSEEWorkshop2019).
Paper
-Rue-Albrecht, Kevin, Federico Marini, Charlotte Soneson, and Aaron T.L. Lun. "ISEE: Interactive SummarizedExperiment Explorer" https://doi.org/10.12688/f1000research.14966.1 F1000Research 7 (June 14, 2018)
- [SPRING](https://github.com/AllonKleinLab/SPRING_dev) - a pipeline for data filtering, normalization and visualization using force-directed layout of k-nearest-neighbor graph. [Web-based](https://kleintools.hms.harvard.edu/tools/spring.html) (10,000 cells max) and [GitHub](https://github.com/AllonKleinLab/SPRING_dev).
Paper
Weinreb, Caleb, Samuel Wolock, and Allon M. Klein. "SPRING: A Kinetic Interface for Visualizing High Dimensional Single-Cell Expression Data" https://doi.org/10.1093/bioinformatics/btx792 Bioinformatics (Oxford, England) 34, no. 7 (April 1, 2018)
- [Granatum](http://garmiregroup.org/granatum/app) - web-based scRNA-seq analysis. list of modules, including plate merging and batch-effect removal, outlier-sample removal, gene-expression normalization, imputation, gene filtering, cell clustering, differential gene expression analysis, pathway/ontology enrichment analysis, protein network interaction visualization, and pseudo-time cell series reconstruction. [Twitter](https://twitter.com/GarmireGroup/status/1185269818015940609).
Paper
Zhu, Xun, Thomas K. Wolfgruber, Austin Tasato, Cédric Arisdakessian, David G. Garmire, and Lana X. Garmire. "Granatum: A Graphical Single-Cell RNA-Seq Analysis Pipeline for Genomics Scientists" https://doi.org/10.1186/s13073-017-0492-3 Genome Medicine 9, no. 1 (December 2017).
- [SCope](https://github.com/aertslab/SCope) - Fast visualization tool for large-scale and high dimensional single-cell data in `.loom` format. R and Python scripts for converting scRNA-seq data to `.loom` format.
- [singleCellTK](https://bioconductor.org/packages/singleCellTK/) - R/Shiny package for an interactive scRNA-Seq analysis. Input, raw counts in SingleCellExperiment. Analysis: filtering raw results, clustering, batch correction, differential expression, pathway enrichment, and scRNA-Seq study design.
- [scDataviz](https://github.com/kevinblighe/scDataviz) - single cell data vizualization and downstream analyses, by Kevin Blighe
- [scOrange](https://singlecell.biolab.si/) - visual pipeline builder for an in-depth analysis and visualization of scRNA-seq data. Works with 10X data, tab-delimited. Filtering, preprocessiong, differential gene expression, marker analysis, enrichment analysis, batch removal, clustering, tSNE. [Screenshots](https://singlecell.biolab.si/screenshots/), [Short video tutorials](https://www.youtube.com/playlist?list=PLmNPvQr9Tf-a4MrEG5thq3qzlkrF5NFbC). Python-based, Conda-installable. [GitHub](https://github.com/biolab/orange3-single-cell)
- [scCustomize](https://github.com/samuel-marsh/scCustomize) - an R package, Collection of functions created and/or curated to aid in the visualization and analysis of single-cell data. Extends Seurat, Liger visualization, helper functions to enhance analysis of Seurat objects.
- [UCSC Single Cell Browser](https://github.com/maximilianh/cellBrowser) - Python pipeline and Javascript scatter plot library for single-cell datasets. Pre-process an expression matrix by filtering, PCA, nearest-neighbors, clustering, t-SNE and UMAP and formats them for cbBuild. [Demo that includes several landmark datasets](https://cells.ucsc.edu/)
## Quality control
- [QClus](https://github.com/linnalab/qclus) - snRNA-seq quality filtering after CellRanger default filteringg. Uses cell-type-specific marker gene expression (nucleus localized markers, non-cardiomyocyte (CM) markers, cytoplasm/nucleus-localized CM markers) and other metrics (splicing, negative correlation with contamination, mitochondrial fraction, positive correlation) to cluster nuclei (k-means, 4 clusters) and filter empty and highly contaminated droplets. Tested on cardiomyocytes that have specific properties making QC challenging. Doublet removal with Scrublet. Outperformes several alternative methods (DIEM, DecontX, EmptyNN, SampleQC, DropletQC, CellBender) across six datasets. Tested on brain dataset. Flexible to include dataset-relevant metrics. Python, Conda, Jupyter notebook.
Paper
Eloi, Schmauch, Ojanen Johannes, Galani Kyriakitsa, Jalkanen Juho, Harju Kristiina, Hollmn Maija, Kokki Hannu, et al. “QClus: A Droplet Filtering Algorithm for Enhanced snRNA-Seq Data Quality in Challenging Samples.” Nucleic Acids Research, 2024.
- [miQC](https://bioconductor.org/packages/miQC/) - data-driven identification of cells with high mitochondrial content (likely, dead cells) from scRNA-seq data. Joint statistical model the proportion of reads mapping to mtDNA genes and the number of detected genes, EM for parameter estimation (flexmix). Tested on various datasets processed with CellRanged and salon alevin - improves removal of compromised cells as compared with hard threshold. Bioconductor R package, integrates with scater.
Paper
Hippen, Ariel A., Matias M. Falco, Lukas M. Weber, Erdogan Pekcan Erkan, Kaiyang Zhang, Jennifer Anne Doherty, Anna Vähärautio, Casey S. Greene, and Stephanie C. Hicks. "MiQC: An Adaptive Probabilistic Framework for Quality Control of Single-Cell RNA-Sequencing Data" https://doi.org/10.1371/journal.pcbi.1009290 PLOS Computational Biology, (August 24, 2021)
- [DropletQC](https://github.com/powellgenomicslab/DropletQC) - empty droplet identification. A novel metric - nuclear fraction. Damaged cells due to the depletion of cytoplasmic RNA will have a higher nuclear fraction compared to intact cells. Compared with 10X Cell Ranger, CellBlender, EmptyNN, EmptyDrops. [Scripts](https://github.com/powellgenomicslab/dropletQC_paper).
Paper
Muskovic, Walter. “DropletQC: Improved Identification of Empty Droplets and Damaged Cells in Single-Cell RNA-Seq Data,” 2021, 9.
- [DropletUtils](https://bioconductor.org/packages/DropletUtils/) - Provides a number of utility functions for handling single-cell (RNA-seq) data from droplet technologies such as 10X Genomics. This includes data loading, identification of cells from empty droplets, removal of barcode-swapped pseudo-cells, and downsampling of the count matrix.
Paper
Lun ATL, Riesenfeld S, Andrews T, Dao T, Gomes T, participants in the 1st Human Cell Atlas Jamboree, Marioni JC (2019). "EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data" https://doi.org/10.1186/s13059-019-1662-y Genome Biol.
- [scater](https://bioconductor.org/packages/scater/) - A collection of tools for doing various analyses of single-cell RNA-seq gene expression data, with a focus on quality control.
Paper
McCarthy et al. "[Scater: pre-processing, quality control, normalisation and visualisation of single-cell RNA-seq data in R" https://doi.org/10.1093/bioinformatics/btw777 Bioinformatics, 2017.
- [celloline](https://github.com/Teichlab/celloline) - A pipeline to remove low quality single cell files. Figure 2 - 20 biological and technical features used for filtering. High mitochondrial genes = broken cells.
Paper
Ilicic, Tomislav, Jong Kyoung Kim, Aleksandra A. Kolodziejczyk, Frederik Otzen Bagger, Davis James McCarthy, John C. Marioni, and Sarah A. Teichmann. "Classification of Low Quality Cells from Single-Cell RNA-Seq Data" https://doi.org/10.1186/s13059-016-0888-1 Genome Biology 17 (February 17, 2016)
## Doublet, multiplet detection
- [XenoCell](https://gitlab.com/XenoCell/XenoCell) - removal multiplets in PDX scRNA-seq data. Multiplet - cells from human and mouse captured in one cell. Assumes 50/50% of human/mouse cells., 10% mouse reads in host-specific reads cutoff. Removes mouse cells but does not introduce biases. Python, utilizes Xenome and the combined genome, wrapped in a Docker image. Input - paired-end FASTQs.
Paper
Cheloni, Stefano, Roman Hillje, Lucilla Luzi, Pier Giuseppe Pelicci, and Elena Gatti. “XenoCell: Classification of Cellular Barcodes in Single Cell Experiments from Xenograft Samples.” BMC Medical Genomics 14, no. 1 (December 2021): 34. https://doi.org/10.1186/s12920-021-00872-8.
- [doubletD](https://github.com/elkebir-group/doubletD) - doublet detection in single-cell DNA-seq data. doublets in scRNA-seq data have a characteristic variant allele frequency spectrum due to increased copy number and allelic dropout. A maximum likelihood approach with a closed-form solution - stats in Methods. Simulated and real data, outperforms SCG, Scrublet, robust to the presence of CNAs, mixture of two cell types. Python3 implementation.
Paper
Weber, Leah L, Palash Sashittal, and Mohammed El-Kebir. "DoubletD: Detecting Doublets in Single-Cell DNA Sequencing Data" https://doi.org/10.1093/bioinformatics/btab266 Bioinformatics, (August 4, 2021)
- [souporcell](https://github.com/wheaton5/souporcell) - variant-based deconvolution of donors in scRNA-seq data. Clustering problem of cells x variant (number of reads supporting each allele), fit a mixture model with the cluster centers represented as the alternate allele fraction for each locus in the cluster. A deterministic annealing variant of the expectation maximization algorithm. Doublet identification by modeling alleles from beta-binomial distribution. Outperforms vireo and scSplit in simulated and experimental data.
Paper
Heaton, Haynes. “Souporcell: Robust Clustering of Single-Cell RNA-Seq Data by Genotype without Reference Genotypes.” Nature MethOds 17 (2020).
- [DoubletFinder](https://github.com/chris-mcginnis-ucsf/DoubletFinder) - doublet detection using gene expression data. Simulates artificial doublets, incorporate them into existing scRNA-seq data. Integrates with Seurat (Figure 1). Three input parameters (the expected number of doublets, the number of artificial doublets pN, the neighborhood size pN), need to be tailored to data with different number of cell types and magnitudes of transcriptional heterogeneity. Bimodality Coefficient maximization to select pN. Benchmarked against ground-truth scRNA-seq datasets. Not optimal for homogeneous data.
Paper
-McGinnis, Christopher S., Lyndsay M. Murrow, and Zev J. Gartner. "DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors" https://doi.org/10.1016/j.cels.2019.03.003 Cell Systems 8, no. 4 (April 2019)
- [demuxlet](https://github.com/statgen/demuxlet) - doublet detection based on genetic variation. Applicable to multiplex sequencing of different individuals. 50SNPs are sufficient to assign singlets and doublets. A statistical model evaluating the likelihood of observing RNA-seq reads overlapping a set of SNPs from each cell-containing droplet.
Paper
Kang, Hyun Min, Meena Subramaniam, Sasha Targ, Michelle Nguyen, Lenka Maliskova, Elizabeth McCarthy, Eunice Wan, et al. “Multiplexed Droplet Single-Cell RNA-Sequencing Using Natural Genetic Variation.” Nature Biotechnology 36, no. 1 (January 2018): 89–94. https://doi.org/10.1038/nbt.4042.
- [scrublet](https://github.com/AllonKleinLab/scrublet) - Detect doublets in single-cell RNA-seq data.
Paper
Wolock, Samuel L, Romain Lopez, and Allon M Klein. "Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data" https://doi.org/10.1101/357368 Preprint. Bioinformatics, July 9, 2018.
- [EmptyDrops](https://github.com/MarioniLab/EmptyDrops2017) - empty droplet detection in scRNA-seq data. Ambient RNA origin, existing approaches like 10% threshold, the knee point in the cumulative fraction of reads with respect to increasing total count. Approach - estimate the profile of the ambient RNA pool and test each barcode for deviations from this pool using a Dirichlet-multinomial model of UMI count sampling.
Paper
Marioni, John. “EmptyDrops: Distinguishing Cells from Empty Droplets in Droplet-Based Single-Cell RNA Sequencing Data,” 2019.
- [Solo](#solo) - semi-supervised deep learning for doublet identification. Variational autoencoder (scVI) followed by a classifier to detect doublets. Compared with Scrubled and DoubletFinder, improves area under the precision-recall curve.
## Normalization
- [sctransform](https://cran.r-project.org/web/packages/sctransform/) - using the Pearson residuals from regularized negative binomial regression where sequencing depth is utilized as a covariate to remove technical artifacts. Interfaces with Seurat.
Paper
Hafemeister, Christoph, and Rahul Satija. "Normalization and Variance Stabilization of Single-Cell RNA-Seq Data Using Regularized Negative Binomial Regression" https://doi.org/10.1101/576827 BioRxiv, March 14, 2019.
- [SCnorm](https://www.biostat.wisc.edu/~kendzior/SCNORM/) - normalization for single-cell data. Quantile regression to estimate the dependence of transcript expression on sequencing depth for every gene. Genes with similar dependence are then grouped, and a second quantile regression is used to estimate scale factors within each group. Within-group adjustment for sequencing depth is then performed using the estimated scale factors to provide normalized estimates of expression. Good statistical methods description.
Paper
Bacher, Rhonda, Li-Fang Chu, Ning Leng, Audrey P Gasch, James A Thomson, Ron M Stewart, Michael Newton, and Christina Kendziorski. "SCnorm: Robust Normalization of Single-Cell RNA-Seq Data" https://doi.org/10.1038/nmeth.4263 Nature Methods 14, no. 6 (April 17, 2017)
## Integration, Batch correction
- [Evaluation of 10 single-cell data integration methods and 4 preprocessing combinations](https://github.com/theislab/scib) on 77 batches of gene expression, chromatin accessibility, and simulated data (Table 1) in 9 integration tasks using 14 evaluation metrics. BBKNN, Scanorama, scVI perform well on complex tasks, Seurat performs well on simpler tasks but may eliminate biological signal. the use of Seurat v3 and Harmony is appropriate for simple integration tasks with distinct batch and biological structure. Batch in ATAC-seq is the most difficult to remove. Jupyter notebooks for full reproducibility.
Paper
Luecken, Md, M Büttner, K Chaichoompu, A Danese, M Interlandi, Mf Mueller, Dc Strobl, et al. “Benchmarking Atlas-Level Data Integration in Single-Cell Genomics.” Nature Methods, 23 December 2021 https://doi.org/10.1038/s41592-021-01336-8
- [Benchmark of 14 methods for scRNA-seq batch correction](https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking). Using five scenarios: different technologies and same cells, non-identical cell types, multiple batches, big data, simulated data. Four benchmarking metrics. Harmony (fast), LIGER, and Seurat 3 perform well overall. For differential expression, ComBat, limma, MNN Correct perform well. Detailed description of 9 datasets and download links. [Data and scripts](https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking).
Paper
Tran, Hoa Thi Nhu, Kok Siong Ang, Marion Chevrier, Xiaomeng Zhang, Nicole Yee Shin Lee, Michelle Goh, and Jinmiao Chen. "A Benchmark of Batch-Effect Correction Methods for Single-Cell RNA Sequencing Data" https://doi.org/10.1186/s13059-019-1850-9 Genome Biology 21, no. 1 (December 2020)
- [CellANOVA](https://github.com/Janezjz/cellanova) (cell state space analysis of variance) - scRNA-seq data integration. Statistical model to separate unwanted and biological variation. Operates on top of existing integration methods (Harmony, Seurat). Requires a control-pool set of samples, a set of samples whereby variation beyond what is preserved by the existing integration are not of interest to the study. Used to estimate a latent linear space that captures cell- and gene-specific unwanted batch variations. Tested on 4 experimental designs (case-control, longitudinal, irregular block design, scRNA and snRNA integration). Outperforms Seurat, Harmony, LIGER, Symphony integration, as judged by local inverse Simpson’s index (LISI). Python implementation.
Paper
Zhang, Zhaojun, Divij Mathew, Tristan Lim, Kaishu Mason, Clara Morral Martinez, Sijia Huang, E. John Wherry, et al. “Signal Recovery in Single Cell Batch Integration,” May 8, 2023. https://doi.org/10.1101/2023.05.05.539614.
- [SMAI](https://github.com/rongstat/SMAI) - spectral manifold alignment and inference framework for alignment single-cell sequencing data. Includes SMAI-test of (partial) alignability against the null hypothesis that two single-cell datasets are alignable up to some similarity transformation, that is, combinations of scaling, translation, and rotation. SMAI-align incorporates a high-dimensional shuffled Procrustes analysis, which iteratively searches for the sample correspondence and the best similarity transformation that minimizes the discrepancy between the intrinsic low-dimensional signal structures of the datasets. References that current integration methods distort the biology. Compared with Seurat. LIGER, Harmony, fastMNN, Scanorama, evaluated on integration of diverse tissues, technologies, 13 integration tasks. Assessment of false positives in differential expression, outperforms all methods. R implementation, works on Seurat objects, [tutorial](https://rongstat.github.io/SMAI_guide.io/SMAI-tutorial.html).
Paper
Ma, Rong, Eric D Sun, David Donoho, and James Zou. “Principled and Interpretable Alignability Testing and Integration of Single-Cell Data” 121, no. 10 (2024). https://doi.org/10.1073/pnas.2313719121
- [MOJITOO](https://github.com/CostaLab/MOJITOO) (Multi-mOdal Joint IntegraTion of cOmpOnents) - a single cell multi-modal integration method (does not require shared features, e.g., genes), uses canonical correlation analysis. Introduction of two main frameworks: metric learning (WNN, Schema) and latent variable learning (MOFA, scAI, totalVI, LIGER). Benchmarked against them on bi- and trimodal data (RNA, protein, ATAC). R implementation, compatible with Seurat, Signac.
Paper
Cheng, Mingbo, Zhijian Li, and Ivan Gesteira Costa Filho. “MOJITOO: A Fast and Universal Method for Integration of Multimodal Single Cell Data.” Preprint. Bioinformatics, January 21, 2022. https://doi.org/10.1101/2022.01.19.476907.
- [RPCI](https://github.com/bioinfoDZ/RISC) (Reference Principal Component Integration) - R package RISC for integration of scRNA-seq data using the gene eigenvectors from a reference dataset as a single reference space. Compared with CCA, shared cell type-based strategies. Tested on simulated and experimental datasets against 11 other integration approaches (Scanorama, Harmony, fastMNN, Anchor, among others) using four metrics (kBET scores, LISI, ARI, SW). Robust when using two and more datasets (e.g., timecourse). Scanorama generally ranks second. [Code to reproduce the paper](https://codeocean.com/capsule/9098032/tree/v1).
Paper
Liu, Yang, Tao Wang, Bin Zhou, and Deyou Zheng. “Robust Integration of Multiple Single-Cell RNA Sequencing Datasets Using a Single Reference Space.” Nature Biotechnology, March 25, 2021. https://doi.org/10.1038/s41587-021-00859-x.
- [MOFA2](https://github.com/bioFAM/MOFA2) - Multi-Omics Factor Analysis v2 (MOFA+), a statistical framework for the integration of single-cell multi-modal data. Reconstructs a low-dimensional representation of the data using variational inference (a stochastic variant parallelizable on GPU, 20-fold speed increase). Supports sparsity constraints, allowing to jointly model variation across multiple sample groups and data modalities. Infers K latent factors with associated feature weight matrices (per data modality, Figure 1a) that can be used for clustering, trajectory inference, variance decomposition etc. Input - multiple datasets measuring non-overlapping modalities, cells grouped by experiments, batches, or conditions. Python and R implementation.
Paper
Argelaguet, Ricard, Damien Arnol, Danila Bredikhin, Yonatan Deloro, Britta Velten, John C. Marioni, and Oliver Stegle. “MOFA+: A Statistical Framework for Comprehensive Integration of Multi-Modal Single-Cell Data.” Genome Biology 21, no. 1 (December 2020): 111. https://doi.org/10.1186/s13059-020-02015-1.
- [CarDEC](https://github.com/jlakkis/CarDEC) - Count adapted regularized Deep Embedded Clustering, a joint deep learning model that simultaneously clusters, denoises, corrects for multiple batch effects in gene expression space (Figure 1). Outperforms scVI, DCA, MNN, scDeepCluster. Separately treats highly and lowly variable genes. Improves integration of omics generated by multiple technologies, pseudotime reconstruction.
Paper
Lakkis, Justin, David Wang, Yuanchao Zhang, Gang Hu, Kui Wang, Huize Pan, Lyle Ungar, Muredach P. Reilly, Xiangjie Li, and Mingyao Li. “A Joint Deep Learning Model for Simultaneous Batch Effect Correction, Denoising and Clustering in Single-Cell Transcriptomics.” Preprint. Bioinformatics, September 25, 2020. https://doi.org/10.1101/2020.09.23.310003
- [iCellR](https://cran.r-project.org/web/packages/iCellR/index.html) - batch correction in scRNA-seq data. Combined Coverage Correction Alignment (CCCA) and Combined Principal Component Alignment (CPCA). CCCA - PCA into 30 dimensions, for each cell, take k=10 nearest neighbors, average gene expression, thus imputing the adjusted matrix. CPCA skips imputation, instead PCs themselves get averaged. Similar performance. Tested on nine PBMC datasets provided by the Broad institute to test batch effect. Outperforms MAGIC. [Data in text and .rda formats](https://genome.med.nyu.edu/results/external/iCellR/data/).
Paper
Khodadadi-Jamayran, Alireza, Joseph Pucella, Hua Zhou, Nicole Doudican, John Carucci, Adriana Heguy, Boris Reizis, and Aristotelis Tsirigos. "ICellR: Combined Coverage Correction and Principal Component Alignment for Batch Alignment in Single-Cell Sequencing Analysis" https://doi.org/10.1101/2020.03.31.019109 Preprint. Bioinformatics, April 1, 2020
- [scAlign](https://bioconductor.org/packages/scAlign/) - a deep learning method for alignment and integration of scRNA-seq datasets. Bidirectional mapping via a low-dimensional space. Can perform unsupervised, semi-supervised, and supervised (by cell type labels) integration. Outperforms scVI, MNN, scmap, MINT, scMERGE, Scanorama, Seurat (two latter perform well). [GitHub](https://github.com/quon-titative-biology/scAlign), [Bioconductor R package](https://bioconductor.org/packages/scAlign/).
Paper
Johansen, Nelson, and Gerald Quon. “ScAlign: A Tool for Alignment, Integration, and Rare Cell Identification from ScRNA-Seq Data.” Genome Biology 20, no. 1 (December 2019): 166. https://doi.org/10.1186/s13059-019-1766-4
- [BERMUDA](https://github.com/txWang/BERMUDA) - Batch Effect ReMoval Using Deep Autoencoders, for scRNA-seq data. Requires batches to share at least one common cell type. Five step framework: 1) preprocessing, 2) clustering of cells in each batch individually, 3) identifying similar cell clusters across different batches, 4) removing batch effect by training an autoencoder, 5) further analysis of batch-corrected data. Tested on simulated (splatter) and experimental (10X genomics) data.
Paper
Wang, Tongxin, Travis S. Johnson, Wei Shao, Zixiao Lu, Bryan R. Helm, Jie Zhang, and Kun Huang. "BERMUDA: A Novel Deep Transfer Learning Method for Single-Cell RNA Sequencing Batch Correction Reveals Hidden High-Resolution Cellular Subtypes" https://doi.org/10.1186/s13059-019-1764-6 Genome Biology 20, no. 1 (December 2019).
- [Scanorama](http://cb.csail.mit.edu/cb/scanorama/) - Python tool, integrates scRNA-seq datasets, identifies the shared cell types among all pairs of datasets (mutual nearest-neighbors matching in low-dimensional (100 SVD components) space) and uses this info for batch correction and merging. Tested on 26 scRNA-seq datasets, 9 technologies, simulated data. Compared with Seurat's CCA, scran MNN. [Links to many public datasets](https://www.nature.com/articles/s41587-019-0113-3#data-availability).
Paper
Hie, Brian, Bryan Bryson, and Bonnie Berger. “Efficient Integration of Heterogeneous Single-Cell Transcriptomes Using Scanorama.” Nature Biotechnology, May 6, 2019. https://doi.org/10.1038/s41587-019-0113-3.
- [BBKNN](https://github.com/Teichlab/bbknn) (batch balanced k nearest neighbours) - batch correction for scRNA-seq data. Neighborhood graphs, balanced across all batches of the data, separately for each batch, that are merged. Main assumption (as in mnnCorrect) - at least some cells of the same type exist across batches. Preserves data structure allowing subsequent embedding, trajectory reconstruction. Python, compatible with SCANPY, very fast.
Paper
Polański, Krzysztof, Matthew D Young, Zhichao Miao, Kerstin B Meyer, Sarah A Teichmann, and Jong-Eun Park. "BBKNN: Fast Batch Alignment of Single Cell Transcriptomes" https://doi.org/10.1093/bioinformatics/btz625 Bioinformatics, August 10, 2019
- [conos](https://github.com/hms-dbmi/conos) - joint analysis of scRNA-seq datasets through inter-sample mapping (mutual nearest-neighbor mapping) and constructing a joint graph. [Analysis scripts](http://pklab.med.harvard.edu/peterk/conos/).
Paper
Barkas, Nikolas, Viktor Petukhov, Daria Nikolaeva, Yaroslav Lozinsky, Samuel Demharter, Konstantin Khodosevich, and Peter V. Kharchenko. "Joint Analysis of Heterogeneous Single-Cell RNA-Seq Dataset Collections" https://doi.org/10.1038/s41592-019-0466-z Nature Methods, July 15, 2019.
- [LIGER](https://github.com/MacoskoLab/liger) - R package for integrating and analyzing multiple single-cell datasets, across conditions, technologies (scRNA-seq and methylation), or species (human and mouse). Integrative nonnegative matrix factorization (W and H matrices), dataset-specific and shared patterns (metagenes, matrix H). Graphs of factor loadings onto these patterns (shared factor neighborhood graph), then comparing patterns. Alignment and agreement metrics to assess performance, LIGER outperforms Seurat on agreement. Analysis of published blood cells, brain. [Human/mouse brain data](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE126836).
Paper
Welch, Joshua D., Velina Kozareva, Ashley Ferreira, Charles Vanderburg, Carly Martin, and Evan Z. Macosko. "Single-Cell Multi-Omic Integration Compares and Contrasts Features of Brain Cell Identity" https://doi.org/10.1016/j.cell.2019.05.006 Cell 177, no. 7 (June 13, 2019)
- [scMerge](https://github.com/SydneyBioX/scMerge/) - R package for batch effect removal and normalizing of multipe scRNA-seq datasets. fastRUVIII batch removal method. Tested on 14 datasets, compared with scran, MNN, ComBat, Seurat, ZINB-WaVE using Silhouette, ARI - better separation of clusters, pseudotime reconstruction.
Paper
Lin, Yingxin, Shila Ghazanfar, Kevin Wang, Johann A. Gagnon-Bartsch, Kitty K. Lo, Xianbin Su, Ze-Guang Han, et al. "ScMerge: Integration of Multiple Single-Cell Transcriptomics Datasets Leveraging Stable Expression and Pseudo-Replication" https://doi.org/10.1101/393280 September 12, 2018.
- [cellHarmony](https://github.com/AltAnalyze/cellHarmony-Align) - a Python 2.7 package for integration and comparison of scRNA-seq datasets, a part of [AltAnalyze](http://www.altanalyze.org/) workflow for RNA-Seq gene, splicing and pathway analysis. Uses a community clustering to produce a network graph and define communities in both the reference and query datasets, and alignment (label projection) strategy. Table 1 - comparison with other joint alignment and label projection methods. Differential expression using empirical Bayes moderated t-test and FDR, global, local, and co-regulated comparisons. Tested on several datasets, improves similarity to the author-defined ground truth. Support for 10x Genomics data format.
Paper
DePasquale, Erica AK, Phillip Dexheimer, Daniel Schnell, Kyle Ferchen, Stuart Hay, Íñigo Valiente-Alandí, Burns C. Blaxall, H. Leighton Grimes, and Nathan Salomonis. “CellHarmony: Cell-Level Matching and Holistic Comparison of Single-Cell Transcriptomes.” Preprint. Bioinformatics, September 8, 2018. https://doi.org/10.1101/412080.
- [MNN](https://bioconductor.org/packages/scran/) - mutual nearest neighbors method for single-cell batch correction. Assumptions: MNN exist between batches, batch is orthogonal to the biology. Cosine normalization, Euclidean distance, a pair-specific barch-correction vector as a vector difference between the expression profiles of the paired cells using selected genes of interest and hypervariable genes. Supplementary note 5 - algorithm. mnnCorrect function in the [scran](https://bioconductor.org/packages/scran/) package. [Code for paper](https://github.com/MarioniLab/MNN2017/).
Paper
Haghverdi, Laleh, Aaron T L Lun, Michael D Morgan, and John C Marioni. "Batch Effects in Single-Cell RNA-Sequencing Data Are Corrected by Matching Mutual Nearest Neighbors" https://doi.org/10.1038/nbt.4091 Nature Biotechnology, April 2, 2018.
- [batchelor](https://bioconductor.org/packages/batchelor/) - Single-Cell Batch Correction Methods, by Aaron Lun.
Paper
Haghverdi L, Lun ATL, Morgan MD, Marioni JC (2018). "Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors" https://doi.org10.1038/nbt.4091 Nat. Biotechnol.
- [scLVM](https://github.com/PMBio/scLVM) - a modelling framework for single-cell RNA-seq data that can be used to dissect the observed heterogeneity into different sources and remove the variation explained by latent variables. Can correct for the cell cycle effect. Applied to naive T cells differentiating into TH2 cells.
Paper
Buettner, Florian, Kedar N Natarajan, F Paolo Casale, Valentina Proserpio, Antonio Scialdone, Fabian J Theis, Sarah A Teichmann, John C Marioni, and Oliver Stegle. "Computational Analysis of Cell-to-Cell Heterogeneity in Single-Cell RNA-Sequencing Data Reveals Hidden Subpopulations of Cells" https://doi.org/10.1038/nbt.3102 Nature Biotechnology 33, no. 2 (March 2015)
Buettner, Florian, Naruemon Pratanwanich, Davis J. McCarthy, John C. Marioni, and Oliver Stegle. "F-ScLVM: Scalable and Versatile Factor Analysis for Single-Cell RNA-Seq" https://doi.org/10.1186/s13059-017-1334-8 Genome Biology 18, no. 1 (December 2017) - f-scLVM - factorial single-cell latent variable model guided by pathway annotations to infer interpretable factors behind heterogeneity. PCA components are annotated by correlated genes and their enrichment in pathways. Docomposition of the original gene expression matrix to a sum of annotated, unannotated, and confounding components. Applied to their own naive T to TH2 cells, mESCs, reanalyzed 3005 neuronal cells. Simulated data. https://github.com/bioFAM/slalom
## Imputation
[Assessment of 18 scRNA-seq imputation methods](https://github.com/Winnie09/imputationBenchmark) (model-based, smooth-based, deep learning, matrix decomposition). Similarity of scRNA- and bulk RNA-seq profiles (Spearman), differential expression (MAST and Wilcoxon), clustering (k-means, Louvain), trajectory reconstruction (Monocle 2, TSCAN), didn't test velocity. scran for normalization. Imputation methods improve correlation with bulk RNA-seq, but have minimal effect on downstream analyses. MAGIC, kNN-smoothing, SAVER perform well overall. [Plate- and droplet-derived scRNA-seq cell line data, Additional File 4](https://static-content.springer.com/esm/art%3A10.1186%2Fs13059-020-02132-x/MediaObjects/13059_2020_2132_MOESM4_ESM.xlsx)), [Summary table of the functionality of all imputation methods, Additional File 5](https://static-content.springer.com/esm/art%3A10.1186%2Fs13059-020-02132-x/MediaObjects/13059_2020_2132_MOESM4_ESM.xlsx).
Paper
Hou, Wenpin, Zhicheng Ji, Hongkai Ji, and Stephanie C. Hicks. "A Systematic Evaluation of Single-Cell RNA-Sequencing Imputation Methods" https://doi.org/10.1186/s13059-020-02132-x Genome Biology 21, no. 1 (December 2020)
- [Deepimpute](https://github.com/lanagarmire/DeepImpute) - scRNA-seq imputation using deep neural networks. Sub-networks, each processes up to 512 genes needed to be imputed. Four layers: Input - dense (ReLU activation) - 20% dropout - output. MSE as loss function. Outperforms MAGIC, DrImpute, ScImpute, SAVER, VIPER, and DCA on multiple metrics (PCC, several clustering metrics). Using 9 datasets.
Paper
Arisdakessian, Cédric, Olivier Poirion, Breck Yunits, Xun Zhu, and Lana X. Garmire. "DeepImpute: An Accurate, Fast, and Scalable Deep Neural Network Method to Impute Single-Cell RNA-Seq Data" https://doi.org/10.1186/s13059-019-1837-6 Genome Biology 20, no. 1 (December 2019)
- [SCRABBLE](https://github.com/tanlabcode/SCRABBLE) - scRNA-seq imputation constraining on bulk RNA-seq data. Matrix regularzation optimizing a three-term objective function. Compared with DrImpute, scImpute, MAGIC, VIPER on simulated and real data. [Datasets](https://github.com/tanlabcode/SCRABBLE_PAPER). R and Matlab implementation.
Paper
Peng, Tao, Qin Zhu, Penghang Yin, and Kai Tan. "SCRABBLE: Single-Cell RNA-Seq Imputation Constrained by Bulk RNA-Seq Data" https://doi.org/10.1186/s13059-019-1681-8 Genome Biology 20, no. 1 (December 2019)
- [ENHANCE](https://github.com/yanailab/enhance-R), an algorithm that denoises single-cell RNA-Seq data by first performing nearest-neighbor aggregation and then inferring expression levels from principal components. Variance-stabilizing normalization of the data before PCA. Implements its own simulation procedure for simulating sampling noise. Outperforms MAGIC, SAVER, ALRA. [Python](https://github.com/yanailab/enhance), and [R](https://github.com/yanailab/enhance-R) implementations.
Paper
Wagner, Florian, Dalia Barkley, and Itai Yanai. "ENHANCE: Accurate Denoising of Single-Cell RNA-Seq Data" https://doi.org/10.1101/655365 Preprint. Bioinformatics, June 3, 2019.
- [scHinter](https://github.com/BMILAB/scHinter) - imputation for small-size scRNA-seq datasets. Three modules: voting-based ensemble distance for learning cell-cell similarity, a SMOTE-based random interpolation module for imputing dropout events, and a hierarchical model for multi-layer random interpolation. [RNA-seq blog](https://www.rna-seqblog.com/schinter-imputing-dropout-events-for-single-cell-rna-seq-data-with-limited-sample-size/).
Paper
Ye, Pengchao, Wenbin Ye, Congting Ye, Shuchao Li, Lishan Ye, Guoli Ji, and Xiaohui Wu. "ScHinter: Imputing Dropout Events for Single-Cell RNA-Seq Data with Limited Sample Size" https://doi.org/10.1093/bioinformatics/btz627 Bioinformatics, August 8, 2019.
- [netNMF-sc](https://github.com/raphael-group/netNMF-sc) - scRNA-seq nonnegative matrix factorization for imputation and dimensionality reduction for improved clustering. Uses gene-gene interaction network to constrain W gene matrix on prior knowledge (graph regularized NMF). Added penalization for dropouts. Tested on simulated and experimental data, compared with several imputation and clustering methods.
Paper
Elyanow, Rebecca, Bianca Dumitrascu, Barbara E Engelhardt, and Benjamin J Raphael. "NetNMF-Sc: Leveraging Gene-Gene Interactions for Imputation and Dimensionality Reduction in Single-Cell Expression Analysis" https://doi.org/10.1101/544346 BioRxiv, February 8, 2019.
- [scRMD](https://github.com/ChongC1990/scRMD) - dropout imputation in scRNA-seq via robust matrix decomposition into true expression matrix (further decomposed into a matrix of means and gene's random deviation from its mean) minus dropout matrix plus error matrix. A function to estimate the matrix of means and dropouts. Comparison with MAGIC, scImpute.
Paper
Chen, Chong, Changjing Wu, Linjie Wu, Yishu Wang, Minghua Deng, and Ruibin Xi. "ScRMD: Imputation for Single Cell RNA-Seq Data via Robust Matrix Decomposition" https://doi.org/10.1101/459404 November 4, 2018
- [SAVER](https://github.com/mohuangx/SAVER) (single-cell analysis via expression recovery) - scRNA-seq imputation (UMI matrix) utilizing gene-to-gene relationship. Recover missing gene expression, removes technical variation. Assumes gene counts follow a negative binomial distribution, estimates the prior parameters in an empirical Bayes-like approach with as Poisson LASSO regression, using the expression of other genes as predictors. Tested using RNA FISH data as a reference, better recover gene expression using Drop-seq data. Outperforms [MAGIC](#magic) and [scImpute](#scimpute).
Paper
Huang, Mo, Jingshu Wang, Eduardo Torre, Hannah Dueck, Sydney Shaffer, Roberto Bonasio, John I. Murray, Arjun Raj, Mingyao Li, and Nancy R. Zhang. “SAVER: Gene Expression Recovery for Single-Cell RNA Sequencing.” Nature Methods 15, no. 7 (July 2018): 539–42. https://doi.org/10.1038/s41592-018-0033-z.
- [netSmooth](https://github.com/BIMSBbioinfo/netSmooth) - network diffusion-based method that uses priors for the covariance structure of gene expression profiles to smooth scRNA-seq experiments. Incorporates prior knowledge (i.e. protein-protein interaction networks) for imputation. Note that dropout applies to whole transcriptome. Compared with MAGIC, scImpute. Improves clustering, biological interpretation.
Paper
Ronen, Jonathan, and Altuna Akalin. "NetSmooth: Network-Smoothing Based Imputation for Single Cell RNA-Seq" https://doi.org/10.12688/f1000research.13511.3 F1000Research 7 (July 10, 2018)
- [DCA](https://github.com/theislab/dca) - A deep count autoencoder network to denoise scRNA-seq data. Zero-inflated negative binomial model. Current approaches - scimpute, MAGIC, SAVER. Benchmarking by increased correlation between bulk and scRNA-seq data, between protein and RNA levels, between key regulatory genes, better DE concordance in bulk and scRNA-seq, improved clustering.
Paper
Eraslan, Gökcen, Lukas M. Simon, Maria Mircea, Nikola S. Mueller, and Fabian J. Theis. "Single Cell RNA-Seq Denoising Using a Deep Count Autoencoder" https://doi.org/10.1101/300681 April 13, 2018.
- [kNN-smoothing](https://github.com/yanailab/knn-smoothing) of scRNA-seq data, aggregates information from similar cells, improves signal-to-noise ratio. Based on observation that gene expression in technical replicates are Poisson distributed. Freeman-Tukey transform to minimize variability of low expressed genes. Tested using real and simulated data. Improves clustering, PCA, Selection of k is critical, discussed.
Paper
Wagner, Florian, Yun Yan, and Itai Yanai. "K-Nearest Neighbor Smoothing for High-Throughput Single-Cell RNA-Seq Data" https://doi.org/10.1101/217737 BioRxiv, April 9, 2018.
- [scImpute](https://github.com/Vivianstats/scImpute) - imputation of scRNA-seq data. Methodology: 1) Determine K subpopulations using PCA, remove outliers; 2) Mixture model of gene i in subpopulation k as gamma and normal distributions, estimate dropout probability d; 3) Impute dropout values by splitting the subpopulation into A (dropout larger than threshold t) and B (smaller). Information from B is used to impute A. Better than MAGIC, SAVER.
Paper
Li, Wei Vivian, and Jingyi Jessica Li. "An Accurate and Robust Imputation Method ScImpute for Single-Cell RNA-Seq Data" https://doi.org/10.1038/s41467-018-03405-7 Nature Communications 9, no. 1 (08 2018)
- [LATE](https://github.com/audreyqyfu/LATE) - Learning with AuToEncoder to imputescRNA-seq data. `TRANSLATE` (TRANSfer learning with LATE) uses reference (sc)RNA-seq dataset to learn initial parameter estimates. TensorFlow implementation for GPU and CPU. ReLu as an activation function. Various optimization techniques. Comparison with MAGIC, scVI, DCA, SAVER. Links to data.
Paper
Badsha, Md. Bahadur, Rui Li, Boxiang Liu, Yang I. Li, Min Xian, Nicholas E. Banovich, and Audrey Qiuyan Fu. "Imputation of Single-Cell Gene Expression with an Autoencoder Neural Network" https://doi.org/10.1101/504977 BioRxiv, January 1, 2018
- [MAGIC](https://github.com/KrishnaswamyLab/MAGIC) - Markov Affinity-based Graph Imputation of Cells. Only \~5-15% of scRNA-seq data is non-zero, the rest are drop-outs. Use the diffusion operator to discover the manifold structure and impute gene expression. Detailed methods description. In real (bone marrow and retinal bipolar cells) and synthetic datasets, Imputed scRNA-seq data clustered better, enhances gene interactions, restores expression of known surface markers, trajectories. scRNA-seq data is preprocessed by library size normalization and PCA (to retain 70% of variability). Comparison with SVD-based low-rank data approximation (LDA) and Nuclear-Norm-based Matrix Completion (NNMC). [GitHub](https://github.com/KrishnaswamyLab/MAGIC).
Paper
Van Dijk, David, Roshan Sharma, Juozas Nainys, Kristina Yim, Pooja Kathail, Ambrose J. Carr, Cassandra Burdziak et al. "Recovering gene interactions from single-cell data using data diffusion." Cell 174, no. 3 (2018): 716-729. https://doi.org/10.1016/j.cell.2018.05.061
## Dimensionality reduction
- [MultiMAP](https://github.com/Teichlab/MultiMAP) dimensionality reduction algorithm. Works with dataset-specific features (does not require features to be shared across datasets, e.g., 20K-gene scRNA-seq and 100K-peak scATAC-seq datasets). Generalizes the UMAP algorithm to data with different dimensions, constructs a nonlinear manifold, constructs a joint graph on the manifold (MultiGraph), cross-entropy minimization to optimize the low-dimensional embedding of the manifold and data. Allows to specify the influence of each dataset on the embedding. Tested on synthetic and experimental data, including spatial transcriptomics datasets, outperforms Seurat, LIGER, iNMF, Conos, GLUER, significantly faster and scalable.
Paper
Jain, M.S., Polanski, K., Conde, C.D. et al. MultiMAP: dimensionality reduction and integration of multimodal data. Genome Biol 22, 346 (2021). https://doi.org/10.1186/s13059-021-02565-y
- [Poincare maps](https://github.com/facebookresearch/PoincareMaps) for two-dimensional scRNA-seq data representation. Preserves local and global distances, hierarchy, the center of the Poincare disk can be considered as a root node. Three-step procedure: 1) k-nearest-neighbor graph, 2) global geodesic distances from the kNN graph, 3) two-dimensional embeddings in the Poincare disk with hyperbolic distances preserve the inferred geodesic distances. Compared with t-SNE, UMAP, PCA, Monocle 2, SAUCIE and several other visualization and lineage detection methods. Two metrics to compare embeddings, Qlocal and Qglobal. References to several public datasets used for reanalysis.
Paper
Klimovskaia, Anna, David Lopez-Paz, Léon Bottou, and Maximilian Nickel. "Poincaré Maps for Analyzing Complex Hierarchies in Single-Cell Data" https://doi.org/10.1038/s41467-020-16822-4 Nature Communications 11, no. 1 (December 2020)
- [scHPF](https://github.com/simslab/schpf) - single-cell hierarchical Poisson Factorization for discovering patterns of gene expressions and cells. A Bayesian factorization method, does not require normalization, explicitly models sparsity across cells and genes. Compared with PCA, NMF, FA, ZIFA, ZINB-WaVE on three datasets, it better captures statistical and biological properties of scRNA-seq data. Python implementation.
Paper
Levitin, Hanna Mendes, Jinzhou Yuan, Yim Ling Cheng, Francisco JR Ruiz, Erin C Bush, Jeffrey N Bruce, Peter Canoll, et al. "De Novo Gene Signature Identification from Single‐cell RNA‐seq with Hierarchical Poisson Factorization" https://doi.org/10.15252/msb.2018855 Molecular Systems Biology 15, no. 2 (February 2019)
- [SAUCIE](https://github.com/KrishnaswamyLab/SAUCIE) - deep neural network with regularization on layers to improve interpretability. Denoising, batch removal, imputation, visualization of low-dimensional representation. Extensive comparison on simulated and real data.
Paper
Amodio, Matthew, David van Dijk, Krishnan Srinivasan, William S Chen, Hussein Mohsen, Kevin R Moon, Allison Campbell, et al. "Exploring Single-Cell Data with Deep Multitasking Neural Networks" https://doi.org/10.1101/237065 August 27, 2018.
- [UMAP](http://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, dimensionality reduction using machine learning. Detailed statistical framework. Compared with t-SNE, better preserves global structure. [R implementation](https://github.com/jlmelville/uwot).
Paper
McInnes, Leland, and John Healy. "UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction](http://arxiv.org/abs/1802.03426 ArXiv:1802.03426 [Cs, Stat], February 9, 2018
- [CIDR](https://github.com/VCCRI/CIDR) - Clustering through Imputation and Dimensionality Reduction. Impute dropouts. Explicitly deconvolve Euclidean distance into distance driven by complete, partially complete, and dropout pairs. Principal Coordinate Analysis.
Paper
Lin, Peijie, Michael Troup, and Joshua W. K. Ho. "CIDR: Ultrafast and Accurate Clustering through Imputation for Single-Cell RNA-Seq Data" https://doi.org/10.1186/s13059-017-1188-0 Genome Biology 18, no. 1 (December 2017).
- [VASC](https://github.com/wang-research/VASC) - deep variational autoencoder for scRNA-seq data for dimensionality reduction and visualization. Tested on twenty datasets vs PCA, tSNE, ZIFA, and SIMLR. Four metrics to assess clustering performance: NMI (normalized mutual information score), ARI (adjusted rand index), HOM (homogeneity) and COM (completeness). No filtering, only log transformation. Keras implementation. [Datasets](https://hemberg-lab.github.io/scRNA.seq.datasets/).
Paper
Wang, Dongfang, and Jin Gu. "VASC: Dimension Reduction and Visualization of Single Cell RNA Sequencing Data by Deep Variational Autoencoder" https://doi.org/10.1101/199315 October 6, 2017.
- [ZINB-WAVE](https://bioconductor.org/packages/zinbwave/) - Zero-inflated negative binomial model for normalization, batch removal, and dimensionality reduction. Extends the RUV model with more careful definition of "unwanted" variation as it may be biological. Good statistical derivations in Methods. Refs to real and simulated scRNA-seq datasets.
Paper
Risso, Davide, Fanny Perraudeau, Svetlana Gribkova, Sandrine Dudoit, and Jean-Philippe Vert. "ZINB-WaVE: A General and Flexible Method for Signal Extraction from Single-Cell RNA-Seq Data" https://doi.org/10.1101/125112 BioRxiv, January 1, 2017.
- [RobustAutoencoder](https://github.com/zc8340311/RobustAutoencoder) - Autoencoder and robust PCA for gene expression representation, robust to outliers. Main idea - split the input data X into two parts, L (reconstructed data) and S (outliers and noise). Grouped "l2,1" norm - an l2 regularizer within a group and then an l1 regularizer between groups. Iterative procedure to obtain L and S. TensorFlow implementation.
Paper
Zhou, Chong, and Randy C. Paffenroth. "Anomaly Detection with Robust Deep Autoencoders" https://doi.org/10.1145/3097983.3098052 In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’17, 665–74. Halifax, NS, Canada: ACM Press, 2017.
- [ZIFA](https://github.com/epierson9/ZIFA) - Zero-inflated dimensionality reduction algorithm for single-cell data. Single-cell dimensionality reduction. Model dropout rate as double exponential, give less weights to these counts. EM algorithm that incorporates imputation step for the expected gene expression level of drop-outs.
Paper
Pierson, Emma, and Christopher Yau. "ZIFA: Dimensionality Reduction for Zero-Inflated Single-Cell Gene Expression Analysis" https://doi.org/10.1186/s13059-015-0805-z Genome Biology 16 (November 2, 2015)
## Clustering
- [opt-SNE](https://github.com/omiq-ai/Multicore-opt-SNE) - data-driven automated parameter selection for t-SNE clustering. Utilizes Kullback-Leibler divergence evaluation in real time to tailor the early exaggeration and overall number of gradient descent iterations. Evaluated on flow cytometry data. C++/Python implementation.
Paper
Belkina, Anna C., Christopher O. Ciccolella, Rina Anno, Richard Halpert, Josef Spidlen, and Jennifer E. Snyder-Cappione. “Automated Optimized Parameters for T-Distributed Stochastic Neighbor Embedding Improve Visualization and Analysis of Large Datasets.” Nature Communications 10, no. 1 (December 2019): 5415. https://doi.org/10.1038/s41467-019-13055-y.
- [scSSA](https://github.com/houtongshuai123/scSSA/) - scRNA-seq clustering based on autoencoder for dimensionality reduction/denoising (improves performance), FastICA to make the data 2D, Caussian mixture clustering. Outperforms Seurat, CIDR, and other methods on datasets from [Hemberg Lab](https://hemberg-lab.github.io/scRNA.seq.datasets/).
Paper
Zhao, Jian-Ping, Tong-Shuai Hou, Yansen Su, and Chun-Hou Zheng. “ScSSA: A Clustering Method for Single Cell RNA-Seq Data Based on Semi-Supervised Autoencoder.” Methods 208 (December 2022): 66–74. https://doi.org/10.1016/j.ymeth.2022.10.006.
- [Challenges in scRNA-seq clustering](https://www.rna-seqblog.com/challenges-in-unsupervised-clustering-of-single-cell-rna-seq-data/). Clustering strategies (dimensionality reduction, k-means, agglomerative/divisive hierarchical clustering, discrete vs. continuous clustering). Table 1 - summary of 15 clustering methods.
Paper
Kiselev, Vladimir Yu, Tallulah S. Andrews, and Martin Hemberg. "Challenges in Unsupervised Clustering of Single-Cell RNA-Seq Data" https://doi.org/10.1038/s41576-018-0088-9 Nature Reviews Genetics, January 7, 2019.
- [Recommendations to properly use t-SNE on large omics datasets](https://github.com/berenslab/rna-seq-tsne) (scRNA-seq in particular) to preserve global geometry. Overview of t-SNE, PCA, MDS, UMAP, their similarities, differences, strengths and weaknesses. PCA initialization (first two components are OK), a high learning rate of n/12, and multi-scale similarity kernels. For very large data, increase exagerration. Strategies to align new points on an existing t-SNE plot, aligning two t-SNE visualizations. Extremely fast implementation is [FIt-SNE](https://github.com/KlugerLab/FIt-SNE). [Code to illustrate the use of t-SNE](https://github.com/berenslab/rna-seq-tsne).
Paper
Kobak, Dmitry, and Philipp Berens. "The Art of Using T-SNE for Single-Cell Transcriptomics" https://doi.org/10.1038/s41467-019-13056-x Nature Communications 10, no. 1 (December 2019)
- [BAMM-SC](https://github.com/CHPGenetics/BAMMSC) - scRNA-seq clustering. A Bayesian hierarchical Dirichlet multinomial mixture model, accounts for batch effect, operates on raw counts. Outperforms K-means, TSCAN, Seurat corrected for batch using MNN or CCA in simulated and experimental settings.
Paper
Sun, Zhe, Li Chen, Hongyi Xin, Yale Jiang, Qianhui Huang, Anthony R. Cillo, Tracy Tabib, et al. "A Bayesian Mixture Model for Clustering Droplet-Based Single-Cell Transcriptomic Data from Population Studies" https://doi.org/10.1038/s41467-019-09639-3 Nature Communications 10, no. 1 (December 2019)
- [Spectrum](https://cran.r-project.org/web/packages/Spectrum/) - a spectral clustering method for single- or multi-omics datasets. Self-tuning kernel that adapts to local density of the graph. Tensor product graph data integration method. Implementation of fast spectral clustering method (single dataset only). Finds optimal number of clusters using eigenvector distribution analysis. References to previous methods. Excellent methods description. Compared with M3C, CLEST, PINSplus, SNF, iClusterPlus, CIMLR, MUDAN. [GitHub](https://github.com/crj32/spectrum_manuscript).
Paper
John, Christopher R., David Watson, Michael R. Barnes, Costantino Pitzalis, and Myles J. Lewis. "Spectrum: Fast Density-Aware Spectral Clustering for Single and Multi-Omic Data" https://doi.org/10.1093/bioinformatics/btz704 Bioinformatics (Oxford, England), September 10, 2019
- [SAUCIE](#saucie) - a regularized autoencoder for scRNA-seq data denoising, batch correction, low-dimensional representation and clustering.
- [PanoView](https://github.com/mhu10/scPanoView) - scRNA-seq iterative clustering in an evolving 3D PCA space, Ordering Local Maximum by Convex hull (OLMC) to identify clusters of varying density. PCA on most variable genes, finding most optimal largest cluster within first 3 PCs, repeat PCA for the remaining cells etc. Tested on multiple simulated and experimental scRNA-seq datasets, compared with 9 methods, the Adjusted Rand Index as performance metric.
Paper
Hu, Ming-Wen, Dong Won Kim, Sheng Liu, Donald J. Zack, Seth Blackshaw, and Jiang Qian. "PanoView: An Iterative Clustering Method for Single-Cell RNA Sequencing Data" https://doi.org/10.1371/journal.pcbi.1007040 Edited by Qing Nie. PLOS Computational Biology 15, no. 8 (August 30, 2019)
- [FIt-SNE](https://github.com/KlugerLab/FIt-SNE) - accelerated version of t-SNE clustering for visualizing thousands/milions of cells. [t-SNE-Heatmaps](https://github.com/KlugerLab/t-SNE-Heatmaps) - discretized t-SNE clustering representation as a heatmap. [Detailed methods](https://gauss.math.yale.edu/~gcl22/blog/numerics/low-rank/t-sne/2018/01/11/low-rank-kernels.html).
Paper
Linderman, George C., Manas Rachh, Jeremy G. Hoskins, Stefan Steinerberger, and Yuval Kluger. "Fast Interpolation-Based t-SNE for Improved Visualization of Single-Cell RNA-Seq Data" https://doi.org/10.1038/s41592-018-0308-4 Nature Methods, February 11, 2019.
- [TooManyCells](https://github.com/GregorySchwartz/tooManyCellsR) - divisive hierarchical spectral clustering of scRNA-seq data. Uses truncated singular vector decomposition to bipartition the cells. Newman-Girvain modularity Q to assess whether bipartition is significant or should be stopped. [BirchBeer](https://github.com/faryabiLab/birch-beer) visualization. Outperforms Phenograph, Seurat, Cellranger, Monocle, the latter is second in performance. Excels for rare populations. Normalization marginally affects performance.
Paper
Schwartz, Gregory W, Jelena Petrovic, Maria Fasolino, Yeqiao Zhou, Stanley Cai, Lanwei Xu, Warren S Pear, Golnaz Vahedi, and Robert B Faryabi. "TooManyCells Identifies and Visualizes Relationships of Single-Cell Clades" https://doi.org/10.1101/519660 BioRxiv, January 13, 2019.
- [scClustViz](https://github.com/BaderLab/scClustViz) - assessment of scRNA-seq clustering using differential expression (Wilcoxon test) as a guide. Testing for two differences: difference in detection rate (dDR) and log2 gene expression ratio (logGER). Two hypothesis testing: one cluster vs. all, each cluster vs. another cluster. accepts SincleCellExperiment and Seurat objects (log2-transformed data), needs a data frame with different cluster assignments. Analysis within R, save as RData, visualize results in R Shiny app.
Paper
Innes, BT, and GD Bader. "ScClustViz - Single-Cell RNAseq Cluster Assessment and Visualization" https://doi.org/10.12688/f1000research.16198.2 F1000Research 7, no. 1522 (2019).
- [SHARP](https://github.com/shibiaowan/SHARP) - an ensemble random projection (RP)-based algorithm. Scalable, allows for clustering of 1.3 million cells (splitting the matrix into blocks, RP on each, then weighted ensemble clustering. Outperforms SC3, SIMLR, hierarchical clustering, tSNE + k-means. Tested on 17 public datasets. Robust to dropouts. Compatible with (UMI-based) counts (per million), FPKM/RPKM, TPM. Methods detailing four algorithm steps (data partition, RP, weighted ensemble clustering, similarity-based meta-clustering).
Paper
Wan, Shibiao, Junil Kim, and Kyoung Jae Won. "SHARP: Single-Cell RNA-Seq Hyper-Fast and Accurate Processing via Ensemble Random Projection" https://doi.org/10.1101/461640 Preprint. Bioinformatics, November 4, 2018
- [Performance evaluation of 14 scRNA-seq clustering algorithms](ttps://bioconductor.org/packages/DuoClustering2018/) using nine experimental and three simulated datasets. SC3 and Seurat perform best overall. Normalized Shannon entropy, adjusted Rand index for performance evaluation. Ensemble clustering doesn't help. [R scripts](https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison) and a [data package](https://bioconductor.org/packages/DuoClustering2018/) for clustering benchmarking with preprocessed and experimental scRNA-seq datasets.
Paper
Duò, Angelo, Mark D. Robinson, and Charlotte Soneson. "A Systematic Performance Evaluation of Clustering Methods for Single-Cell RNA-Seq Data" https://doi.org/10.12688/f1000research.15666.2 F1000Research 7 (September 10, 2018)
- [clusterExperiment](https://bioconductor.org/packages/clusterExperiment/) R package for scRNA-seq data visualization. Resampling-based Sequential Ensemble Clustering (RSEC) method. clusterMany - makeConsensus - makeDendrogram - mergeClusters pipeline. Biomarker detection by differential expression analysis between clusters.
Paper
Risso, Davide, Liam Purvis, Russell B. Fletcher, Diya Das, John Ngai, Sandrine Dudoit, and Elizabeth Purdom. "ClusterExperiment and RSEC: A Bioconductor Package and Framework for Clustering of Single-Cell and Other Large Gene Expression Datasets" https://doi.org/10.1371/journal.pcbi.1006378 Edited by Aaron E. Darling. PLOS Computational Biology 14, no. 9 (September 4, 2018)
- [PHATE](https://github.com/KrishnaswamyLab/PHATE) (Potential of Heat-diffusion for Affinity-based Transition Embedding) - low-dimensional embedding, denoising, and visualization, applicable to scRNA-seq, microbiome, SNP, Hi-C (as affinity matrices) and other data. Preserves biological structures and branching better than PCA, tSNE, diffusion maps. Robust to noise and subsampling. Detailed methods description and graphical representation of the algorithm. [Tweetorial](https://twitter.com/KrishnaswamyLab/status/1201935823056199680?s=20).
Paper
Moon, Kevin R., David van Dijk, Zheng Wang, Scott Gigante, Daniel Burkhardt, William Chen, Antonia van den Elzen, et al. "Visualizing Transitions and Structure for Biological Data Exploration" https://doi.org/10.1101/120378 June 28, 2018.
- [Bisquit](https://github.com/sandhya212/BISCUIT_SingleCell_IMM_ICML_2016) - a Bayesian clustering and normalization method.
Paper
Azizi, Elham, Ambrose J. Carr, George Plitas, Andrew E. Cornish, Catherine Konopacki, Sandhya Prabhakaran, Juozas Nainys, et al. "Single-Cell Map of Diverse Immune Phenotypes in the Breast Tumor Microenvironment" https://doi.org/10.1016/j.cell.2018.05.060 Cell, June 2018.
- [scVAE](https://github.com/chgroenbech/scVAE) - Variational auroencoder frameworks for modelling raw RNA-seq counts, denoising the data to improve biologically plausible grouping in scRNA-seq data. Improvement in Rand index.
Paper
Grønbech, Christopher Heje, Maximillian Fornitz Vording, Pascal N Timshel, Casper Kaae Sønderby, Tune Hannes Pers, and Ole Winther. "ScVAE: Variational Auto-Encoders for Single-Cell Gene Expression Data" https://doi.org/10.1101/318295 May 16, 2018.
- [Conos](https://github.com/hms-dbmi/conos) - clustering of scRNA-seq samples by joint graph construction. Seurat or pagoda2 for data preprocessing, selection of hypervariable genes, initial clustering (KNN, or dimensionality reduction), then joint clustering. R package.
Paper
Barkas, Nikolas, Viktor Petukhov, Daria Nikolaeva, Yaroslav Lozinsky, Samuel Demharter, Konstantin Khodosevich, and Peter V Kharchenko. “Wiring Together Large Single-Cell RNA-Seq Sample Collections.” BioRxiv, January 1, 2018.
- [MetaCell](https://bitbucket.org/tanaylab/metacell/src/default/) - partitioning scRNA-seq data into metacells - disjoint and homogeneous/compact groups of cells exhibiting only sampling variance. Most variable genes to cell-to-cell similarity matrix (PCC on to Knn similarity graph that is partitioned by bootstrapping to obtain subgraphs. Tested on several [10X datasets](https://support.10xgenomics.com/single-cell-gene-expression/datasets).
Paper
Baran, Yael, Arnau Sebe-Pedros, Yaniv Lubling, Amir Giladi, Elad Chomsky, Zohar Meir, Michael Hoichman, Aviezer Lifshitz, and Amos Tanay. "MetaCell: Analysis of Single Cell RNA-Seq Data Using k-NN Graph Partitions" https://doi.org/10.1101/437665 BioRxiv, January 1, 2018
- [SIMLR](https://github.com/BatzoglouLabSU/SIMLR) - scRNA-seq dimensionality reduction, clustering, and visualization based on multiple kernel-learned distance metric. Comparison with PCA, t-SNE, ZIFA. Seven datasets. R and Matlab implementation.
Paper
Wang, Bo, Junjie Zhu, Emma Pierson, Daniele Ramazzotti, and Serafim Batzoglou. "Visualization and Analysis of Single-Cell RNA-Seq Data by Kernel-Based Similarity Learning" https://doi.org/10.1101/460246 Nature Methods 14, no. 4 (April 2017)
- [SC3](https://bioconductor.org/packages/SC3/) - single-cell clustering. Multiple clustering iterations, consensus matrix, then hierarhical clustering. Benchmarking against other methods.
Paper
Kiselev, Vladimir Yu, Kristina Kirschner, Michael T Schaub, Tallulah Andrews, Andrew Yiu, Tamir Chandra, Kedar N Natarajan, et al. "SC3: Consensus Clustering of Single-Cell RNA-Seq Data" https://doi.org/10.1038/nmeth.4236 Nature Methods 14, no. 5 (March 27, 2017)
- [destiny](https://bioconductor.org/packages/destiny/) - R package for diffusion maps-based visualization of single-cell data.
Paper
Haghverdi, Laleh, Florian Buettner, and Fabian J. Theis. "Diffusion Maps for High-Dimensional Single-Cell Analysis of Differentiation Data" https://doi.org/10.1093/bioinformatics/btv325 Bioinformatics 31, no. 18 (September 15, 2015) - Introduction of other methods, Table 1 compares them. Methods details. Performance is similar to PCA and tSNE.
- [PhenoGraph](https://github.com/jacoblevine/PhenoGraph) - discovers subpopulations in scRNA-seq data. High-dimensional space is modeled as a nearest-neighbor graph, then the Louvain community detection algorithm. No assumptions about the size, number, or form of subpopulations.
Paper
Levine, Jacob H., Erin F. Simonds, Sean C. Bendall, Kara L. Davis, El-ad D. Amir, Michelle D. Tadmor, Oren Litvin, et al. "Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells That Correlate with Prognosis" https://doi.org/10.1016/j.cell.2015.05.047 Cell 162, no. 1 (July 2015)
- [SNN-Cliq](http://bioinfo.uncc.edu/SNNCliq/) - shared nearest neighbor clustering of scRNA-seq data, represented as a graph. Similarity between two data points based on the ranking of their shared neighborhood. Automatically determine the number of clusters, accomodates different densities and shapes. Compared with K-means and DBSCAN using Purity, Adjusted Rand Indes, F1-score. Matlab, Python, R implementation.
Paper
Xu, Chen, and Zhengchang Su. "Identification of Cell Types from Single-Cell Transcriptomes Using a Novel Clustering Method" https://doi.org/10.1093/bioinformatics/btv088 Bioinformatics (Oxford, England) 31, no. 12 (June 15, 2015)
- [viSNE](https://cran.r-project.org/web/packages/Rtsne/) - the Barnes-Hut implementation of the t-SNE algorithm, improved and tailored for the analysis of single-cell data. [Details of tSNE](https://www.denovosoftware.com/site/manual/visne.htm), and the [Rtsne R package](https://cran.r-project.org/web/packages/Rtsne/).
Paper
Amir, El-ad David, Kara L Davis, Michelle D Tadmor, Erin F Simonds, Jacob H Levine, Sean C Bendall, Daniel K Shenfeld, Smita Krishnaswamy, Garry P Nolan, and Dana Pe’er. "ViSNE Enables Visualization of High Dimensional Single-Cell Data and Reveals Phenotypic Heterogeneity of Leukemia" https://doi.org/10.1038/nbt.2594 Nature Biotechnology 31, no. 6 (June 2013)
- [celda](https://github.com/compbiomed/celda) - CEllular Latent Dirichlet Allocation. Simultaneous clustering of cells into subpopulations and genes into transcriptional states. [Tutorials](https://compbiomed.github.io/celda_tutorials/). No preprint yet.
### Time, trajectory inference
- [CellRank](https://cellrank.readthedocs.io/en/stable/) - single-cell fate mapping combining trajectory inference and RNA velocity directionality (scVelo), accounting for the stochastic nature of fate decisions and uncertainty in velocity vectors. Velocity alone is insufficient. Detects the initial, terminal and intermediate cell states and computes a global map of fate potentials. State transitions are modeled using a Markov chain. Stability index to automatically identify terminal states. Outperforms Palantir, STEMNET and FateID in diverse scenarious (development, regeneration, reprogramming, disease), fast, less memory, scalable. Input - (imputed) gene count matrix and velocity matrix (any vector field). Python, installable in Conda environment, Jupyter notebooks. [Tutorial](https://cellrank.readthedocs.io/en/stable/cellrank_basics.html), [Code to reproduce the results](https://github.com/theislab/cellrank_reproducibility). [Tweet](https://twitter.com/dana_peer/status/1481658478296907780?s=20) by Dana Pe'er.
Paper
Lange, Marius, Volker Bergen, Michal Klein, Manu Setty, Bernhard Reuter, Mostafa Bakhti, Heiko Lickert, et al. “CellRank for Directed Single-Cell Fate Mapping.” Nature Methods, January 13, 2022. https://doi.org/10.1038/s41592-021-01346-6
- [STREAM](https://stream.pinellolab.partners.org/) - trajectory analysis in both single-cell transcriptomic (scRNA-seq) and epigenomic (scATAC-seq) data. Ability to map new cells on reference trajectories. Exploration of cell type composition, relevant genes, TF binding dynamics. Modifi