{"id":32111375,"url":"https://github.com/rnabioco/clustifyr","last_synced_at":"2025-10-30T22:44:32.198Z","repository":{"id":42680927,"uuid":"138173960","full_name":"rnabioco/clustifyr","owner":"rnabioco","description":"Infer cell types in scRNA-seq data using bulk RNA-seq or gene sets","archived":false,"fork":false,"pushed_at":"2025-04-16T20:43:48.000Z","size":63027,"stargazers_count":123,"open_issues_count":0,"forks_count":15,"subscribers_count":9,"default_branch":"devel","last_synced_at":"2025-10-30T22:44:19.918Z","etag":null,"topics":["assign-identities","clusters","marker-genes","rna-seq","single-cell-rna-seq"],"latest_commit_sha":null,"homepage":"https://rnabioco.github.io/clustifyr/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rnabioco.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS","contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-06-21T13:27:55.000Z","updated_at":"2025-10-26T17:51:10.000Z","dependencies_parsed_at":"2023-01-21T08:18:40.641Z","dependency_job_id":"ffafee47-b124-4ba2-8e1f-90b9ac770304","html_url":"https://github.com/rnabioco/clustifyr","commit_stats":{"total_commits":727,"total_committers":17,"mean_commits":42.76470588235294,"dds":0.5405777166437413,"last_synced_commit":"df4193cfb145355a03fef45721fbb4588ed3d418"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/rnabioco/clustifyr","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rnabioco%2Fclustifyr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rnabioco%2Fclustifyr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rnabioco%2Fclustifyr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rnabioco%2Fclustifyr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rnabioco","download_url":"https://codeload.github.com/rnabioco/clustifyr/tar.gz/refs/heads/devel","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rnabioco%2Fclustifyr/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281896642,"owners_count":26580138,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-30T02:00:06.501Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["assign-identities","clusters","marker-genes","rna-seq","single-cell-rna-seq"],"created_at":"2025-10-20T14:26:20.580Z","updated_at":"2025-10-30T22:44:32.192Z","avatar_url":"https://github.com/rnabioco.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n```{r, echo = FALSE, message = FALSE}\nknitr::opts_chunk$set(\n    collapse = TRUE,\n    comment = \"#\u003e\",\n    fig.path = \"man/figures/\",\n    dpi = 300\n)\n```\n\n```{r, echo=FALSE, message=FALSE}\nst \u003c- data.table::fread(\"https://bioconductor.org/packages/stats/bioc/clustifyr/clustifyr_stats.tab\", data.table = FALSE, verbose = FALSE)\nst_all \u003c- dplyr::filter(st, Month == \"all\")\ncl \u003c- as.numeric(data.table::fread(\"https://raw.githubusercontent.com/raysinensis/clone_counts_public/main/clustifyr_total.txt\", verbose = FALSE))\n```\n\n# clustifyr \n\n\u003c!-- badges: start --\u003e\n[![R-CMD-check-bioc](https://github.com/rnabioco/clustifyr/actions/workflows/check-bioc.yml/badge.svg)](https://github.com/rnabioco/clustifyr/actions/workflows/check-bioc.yml)\n[![Codecov test coverage](https://codecov.io/gh/rnabioco/clustifyr/branch/devel/graph/badge.svg)](https://app.codecov.io/gh/rnabioco/clustifyr?branch=devel)\n[![platforms](https://bioconductor.org/shields/availability/release/clustifyr.svg)](https://bioconductor.org/packages/release/bioc/html/clustifyr.html)\n[![bioc](https://bioconductor.org/shields/years-in-bioc/clustifyr.svg)](https://bioconductor.org/packages/release/bioc/html/clustifyr.html)\n[![#downloads](`r paste0(\"https://img.shields.io/badge/%23%20downloads-\", sum(st_all[[4]]) + cl, \"-brightgreen\")`)](https://bioconductor.org/packages/stats/bioc/clustifyr/clustifyr_stats.tab)\n\u003c!-- badges: end --\u003e\n\nclustifyr classifies cells and clusters in single-cell RNA sequencing experiments using reference bulk RNA-seq data sets, sorted microarray expression data, single-cell gene signatures, or lists of marker genes. \n\n## Installation\n\nInstall the Bioconductor version with:\n\n``` r\nif (!requireNamespace(\"BiocManager\", quietly = TRUE))\n    install.packages(\"BiocManager\")\n\nBiocManager::install(\"clustifyr\")\n```\n\nInstall the development version with:\n\n``` r\nBiocManager::install(\"rnabioco/clustifyr\")\n```\n \n## Example usage\n\nIn this example we use the following built-in input data:\n\n- an expression matrix of single cell RNA-seq data (`pbmc_matrix_small`)\n- a metadata data.frame (`pbmc_meta`), with cluster information stored (`\"classified\"`)\n- a vector of variable genes (`pbmc_vargenes`)\n- a matrix of mean normalized scRNA-seq UMI counts by cell type (`cbmc_ref`)\n\nWe then calculate correlation coefficients and plot them on a pre-calculated projection (stored in `pbmc_meta`).\n\n```{r readme_example, warning=F, message=F}\nlibrary(clustifyr)\n\n# calculate correlation\nres \u003c- clustify(\n    input = pbmc_matrix_small,\n    metadata = pbmc_meta$classified,\n    ref_mat = cbmc_ref,\n    query_genes = pbmc_vargenes\n)\n\n# print assignments\ncor_to_call(res)\n\n# plot assignments on a projection\nplot_best_call(\n    cor_mat = res,\n    metadata = pbmc_meta,\n    cluster_col = \"classified\"\n)\n```\n\n`clustify()` can take a clustered `SingleCellExperiment` or `seurat` object (from v2 up to v5) and assign identities.\n\n```{r example_seurat, warning=F, message=F}\n# for SingleCellExperiment\nsce_small \u003c- sce_pbmc()\nclustify(\n    input = sce_small, # an SCE object\n    ref_mat = cbmc_ref, # matrix of RNA-seq expression data for each cell type\n    cluster_col = \"cell_type\", # name of column in meta.data containing cell clusters\n    obj_out = TRUE # output SCE object with cell type inserted as \"type\" column\n)\n\n# for Seurat\nlibrary(Seurat)\ns_small \u003c- so_pbmc()\nclustify(\n    input = s_small,\n    cluster_col = \"RNA_snn_res.0.5\",\n    ref_mat = cbmc_ref,\n    seurat_out = TRUE\n)\n\n# New output option, directly as a vector (in the order of the metadata), which can then be inserted into metadata dataframes and other workflows\nclustify(\n    input = s_small,\n    cluster_col = \"RNA_snn_res.0.5\",\n    ref_mat = cbmc_ref,\n    vec_out = TRUE\n)[1:10]\n```\n\nNew reference matrix can be made directly from `SingleCellExperiment` and `Seurat` objects as well. Other scRNAseq experiment object types are supported as well.\n\n```{r example_ref_matrix}\n# make reference from SingleCellExperiment objects\nsce_small \u003c- sce_pbmc()\nsce_ref \u003c- object_ref(\n    input = sce_small, # SCE object\n    cluster_col = \"cell_type\" # name of column in colData containing cell identities\n)\n\n# make reference from seurat objects\ns_small \u003c- so_pbmc()\ns_ref \u003c- seurat_ref(\n    seurat_object = s_small,\n    cluster_col = \"RNA_snn_res.0.5\"\n)\n\nhead(s_ref)\n```\n\n`clustify_lists()` handles identity assignment of matrix or `SingleCellExperiment` and `seurat` objects based on marker gene lists.\n \n```{r example_seurat3, warning=F, message=F}\nclustify_lists(\n    input = pbmc_matrix_small,\n    metadata = pbmc_meta,\n    cluster_col = \"classified\",\n    marker = pbmc_markers,\n    marker_inmatrix = FALSE\n)\n\nclustify_lists(\n    input = s_small,\n    marker = pbmc_markers,\n    marker_inmatrix = FALSE,\n    cluster_col = \"RNA_snn_res.0.5\",\n    seurat_out = TRUE\n)\n```\n\n## Additional resources\n\n* [Script](https://github.com/rnabioco/clustifyrdata/blob/master/inst/run_clustifyr.R) for benchmarking, compatible with [`scRNAseq_Benchmark`](https://github.com/tabdelaal/scRNAseq_Benchmark)\n\n* Additional reference data (including tabula muris, immgen, etc) are available in a supplemental package [`clustifyrdatahub`](https://github.com/rnabioco/clustifyrdatahub). Also see [list](https://rnabioco.github.io/clustifyrdata/articles/download_refs.html) for individual downloads. \n\n* See the [FAQ](https://github.com/rnabioco/clustifyr/wiki/Frequently-asked-questions) for more details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frnabioco%2Fclustifyr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frnabioco%2Fclustifyr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frnabioco%2Fclustifyr/lists"}