https://github.com/rnabioco/clustifyr
Infer cell types in scRNA-seq data using bulk RNA-seq or gene sets
https://github.com/rnabioco/clustifyr
assign-identities clusters marker-genes rna-seq single-cell-rna-seq
Last synced: 7 months ago
JSON representation
Infer cell types in scRNA-seq data using bulk RNA-seq or gene sets
- Host: GitHub
- URL: https://github.com/rnabioco/clustifyr
- Owner: rnabioco
- License: mit
- Created: 2018-06-21T13:27:55.000Z (almost 8 years ago)
- Default Branch: devel
- Last Pushed: 2025-04-16T20:43:48.000Z (about 1 year ago)
- Last Synced: 2025-10-30T22:44:19.918Z (7 months ago)
- Topics: assign-identities, clusters, marker-genes, rna-seq, single-cell-rna-seq
- Language: R
- Homepage: https://rnabioco.github.io/clustifyr/
- Size: 60.1 MB
- Stars: 123
- Watchers: 9
- Forks: 15
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
---
output: github_document
---
```{r, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/",
dpi = 300
)
```
```{r, echo=FALSE, message=FALSE}
st <- data.table::fread("https://bioconductor.org/packages/stats/bioc/clustifyr/clustifyr_stats.tab", data.table = FALSE, verbose = FALSE)
st_all <- dplyr::filter(st, Month == "all")
cl <- as.numeric(data.table::fread("https://raw.githubusercontent.com/raysinensis/clone_counts_public/main/clustifyr_total.txt", verbose = FALSE))
```
# clustifyr
[](https://github.com/rnabioco/clustifyr/actions/workflows/check-bioc.yml)
[](https://app.codecov.io/gh/rnabioco/clustifyr?branch=devel)
[](https://bioconductor.org/packages/release/bioc/html/clustifyr.html)
[](https://bioconductor.org/packages/release/bioc/html/clustifyr.html)
[ + cl, "-brightgreen")`)](https://bioconductor.org/packages/stats/bioc/clustifyr/clustifyr_stats.tab)
clustifyr classifies cells and clusters in single-cell RNA sequencing experiments using reference bulk RNA-seq data sets, sorted microarray expression data, single-cell gene signatures, or lists of marker genes.
## Installation
Install the Bioconductor version with:
``` r
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("clustifyr")
```
Install the development version with:
``` r
BiocManager::install("rnabioco/clustifyr")
```
## Example usage
In this example we use the following built-in input data:
- an expression matrix of single cell RNA-seq data (`pbmc_matrix_small`)
- a metadata data.frame (`pbmc_meta`), with cluster information stored (`"classified"`)
- a vector of variable genes (`pbmc_vargenes`)
- a matrix of mean normalized scRNA-seq UMI counts by cell type (`cbmc_ref`)
We then calculate correlation coefficients and plot them on a pre-calculated projection (stored in `pbmc_meta`).
```{r readme_example, warning=F, message=F}
library(clustifyr)
# calculate correlation
res <- clustify(
input = pbmc_matrix_small,
metadata = pbmc_meta$classified,
ref_mat = cbmc_ref,
query_genes = pbmc_vargenes
)
# print assignments
cor_to_call(res)
# plot assignments on a projection
plot_best_call(
cor_mat = res,
metadata = pbmc_meta,
cluster_col = "classified"
)
```
`clustify()` can take a clustered `SingleCellExperiment` or `seurat` object (from v2 up to v5) and assign identities.
```{r example_seurat, warning=F, message=F}
# for SingleCellExperiment
sce_small <- sce_pbmc()
clustify(
input = sce_small, # an SCE object
ref_mat = cbmc_ref, # matrix of RNA-seq expression data for each cell type
cluster_col = "cell_type", # name of column in meta.data containing cell clusters
obj_out = TRUE # output SCE object with cell type inserted as "type" column
)
# for Seurat
library(Seurat)
s_small <- so_pbmc()
clustify(
input = s_small,
cluster_col = "RNA_snn_res.0.5",
ref_mat = cbmc_ref,
seurat_out = TRUE
)
# New output option, directly as a vector (in the order of the metadata), which can then be inserted into metadata dataframes and other workflows
clustify(
input = s_small,
cluster_col = "RNA_snn_res.0.5",
ref_mat = cbmc_ref,
vec_out = TRUE
)[1:10]
```
New reference matrix can be made directly from `SingleCellExperiment` and `Seurat` objects as well. Other scRNAseq experiment object types are supported as well.
```{r example_ref_matrix}
# make reference from SingleCellExperiment objects
sce_small <- sce_pbmc()
sce_ref <- object_ref(
input = sce_small, # SCE object
cluster_col = "cell_type" # name of column in colData containing cell identities
)
# make reference from seurat objects
s_small <- so_pbmc()
s_ref <- seurat_ref(
seurat_object = s_small,
cluster_col = "RNA_snn_res.0.5"
)
head(s_ref)
```
`clustify_lists()` handles identity assignment of matrix or `SingleCellExperiment` and `seurat` objects based on marker gene lists.
```{r example_seurat3, warning=F, message=F}
clustify_lists(
input = pbmc_matrix_small,
metadata = pbmc_meta,
cluster_col = "classified",
marker = pbmc_markers,
marker_inmatrix = FALSE
)
clustify_lists(
input = s_small,
marker = pbmc_markers,
marker_inmatrix = FALSE,
cluster_col = "RNA_snn_res.0.5",
seurat_out = TRUE
)
```
## Additional resources
* [Script](https://github.com/rnabioco/clustifyrdata/blob/master/inst/run_clustifyr.R) for benchmarking, compatible with [`scRNAseq_Benchmark`](https://github.com/tabdelaal/scRNAseq_Benchmark)
* Additional reference data (including tabula muris, immgen, etc) are available in a supplemental package [`clustifyrdatahub`](https://github.com/rnabioco/clustifyrdatahub). Also see [list](https://rnabioco.github.io/clustifyrdata/articles/download_refs.html) for individual downloads.
* See the [FAQ](https://github.com/rnabioco/clustifyr/wiki/Frequently-asked-questions) for more details.