An open API service indexing awesome lists of open source software.

https://github.com/rnabioco/clustifyr

Infer cell types in scRNA-seq data using bulk RNA-seq or gene sets
https://github.com/rnabioco/clustifyr

assign-identities clusters marker-genes rna-seq single-cell-rna-seq

Last synced: 7 months ago
JSON representation

Infer cell types in scRNA-seq data using bulk RNA-seq or gene sets

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/",
dpi = 300
)
```

```{r, echo=FALSE, message=FALSE}
st <- data.table::fread("https://bioconductor.org/packages/stats/bioc/clustifyr/clustifyr_stats.tab", data.table = FALSE, verbose = FALSE)
st_all <- dplyr::filter(st, Month == "all")
cl <- as.numeric(data.table::fread("https://raw.githubusercontent.com/raysinensis/clone_counts_public/main/clustifyr_total.txt", verbose = FALSE))
```

# clustifyr

[![R-CMD-check-bioc](https://github.com/rnabioco/clustifyr/actions/workflows/check-bioc.yml/badge.svg)](https://github.com/rnabioco/clustifyr/actions/workflows/check-bioc.yml)
[![Codecov test coverage](https://codecov.io/gh/rnabioco/clustifyr/branch/devel/graph/badge.svg)](https://app.codecov.io/gh/rnabioco/clustifyr?branch=devel)
[![platforms](https://bioconductor.org/shields/availability/release/clustifyr.svg)](https://bioconductor.org/packages/release/bioc/html/clustifyr.html)
[![bioc](https://bioconductor.org/shields/years-in-bioc/clustifyr.svg)](https://bioconductor.org/packages/release/bioc/html/clustifyr.html)
[![#downloads](`r paste0("https://img.shields.io/badge/%23%20downloads-", sum(st_all[[4]]) + cl, "-brightgreen")`)](https://bioconductor.org/packages/stats/bioc/clustifyr/clustifyr_stats.tab)

clustifyr classifies cells and clusters in single-cell RNA sequencing experiments using reference bulk RNA-seq data sets, sorted microarray expression data, single-cell gene signatures, or lists of marker genes.

## Installation

Install the Bioconductor version with:

``` r
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")

BiocManager::install("clustifyr")
```

Install the development version with:

``` r
BiocManager::install("rnabioco/clustifyr")
```

## Example usage

In this example we use the following built-in input data:

- an expression matrix of single cell RNA-seq data (`pbmc_matrix_small`)
- a metadata data.frame (`pbmc_meta`), with cluster information stored (`"classified"`)
- a vector of variable genes (`pbmc_vargenes`)
- a matrix of mean normalized scRNA-seq UMI counts by cell type (`cbmc_ref`)

We then calculate correlation coefficients and plot them on a pre-calculated projection (stored in `pbmc_meta`).

```{r readme_example, warning=F, message=F}
library(clustifyr)

# calculate correlation
res <- clustify(
input = pbmc_matrix_small,
metadata = pbmc_meta$classified,
ref_mat = cbmc_ref,
query_genes = pbmc_vargenes
)

# print assignments
cor_to_call(res)

# plot assignments on a projection
plot_best_call(
cor_mat = res,
metadata = pbmc_meta,
cluster_col = "classified"
)
```

`clustify()` can take a clustered `SingleCellExperiment` or `seurat` object (from v2 up to v5) and assign identities.

```{r example_seurat, warning=F, message=F}
# for SingleCellExperiment
sce_small <- sce_pbmc()
clustify(
input = sce_small, # an SCE object
ref_mat = cbmc_ref, # matrix of RNA-seq expression data for each cell type
cluster_col = "cell_type", # name of column in meta.data containing cell clusters
obj_out = TRUE # output SCE object with cell type inserted as "type" column
)

# for Seurat
library(Seurat)
s_small <- so_pbmc()
clustify(
input = s_small,
cluster_col = "RNA_snn_res.0.5",
ref_mat = cbmc_ref,
seurat_out = TRUE
)

# New output option, directly as a vector (in the order of the metadata), which can then be inserted into metadata dataframes and other workflows
clustify(
input = s_small,
cluster_col = "RNA_snn_res.0.5",
ref_mat = cbmc_ref,
vec_out = TRUE
)[1:10]
```

New reference matrix can be made directly from `SingleCellExperiment` and `Seurat` objects as well. Other scRNAseq experiment object types are supported as well.

```{r example_ref_matrix}
# make reference from SingleCellExperiment objects
sce_small <- sce_pbmc()
sce_ref <- object_ref(
input = sce_small, # SCE object
cluster_col = "cell_type" # name of column in colData containing cell identities
)

# make reference from seurat objects
s_small <- so_pbmc()
s_ref <- seurat_ref(
seurat_object = s_small,
cluster_col = "RNA_snn_res.0.5"
)

head(s_ref)
```

`clustify_lists()` handles identity assignment of matrix or `SingleCellExperiment` and `seurat` objects based on marker gene lists.

```{r example_seurat3, warning=F, message=F}
clustify_lists(
input = pbmc_matrix_small,
metadata = pbmc_meta,
cluster_col = "classified",
marker = pbmc_markers,
marker_inmatrix = FALSE
)

clustify_lists(
input = s_small,
marker = pbmc_markers,
marker_inmatrix = FALSE,
cluster_col = "RNA_snn_res.0.5",
seurat_out = TRUE
)
```

## Additional resources

* [Script](https://github.com/rnabioco/clustifyrdata/blob/master/inst/run_clustifyr.R) for benchmarking, compatible with [`scRNAseq_Benchmark`](https://github.com/tabdelaal/scRNAseq_Benchmark)

* Additional reference data (including tabula muris, immgen, etc) are available in a supplemental package [`clustifyrdatahub`](https://github.com/rnabioco/clustifyrdatahub). Also see [list](https://rnabioco.github.io/clustifyrdata/articles/download_refs.html) for individual downloads.

* See the [FAQ](https://github.com/rnabioco/clustifyr/wiki/Frequently-asked-questions) for more details.