{"id":32207820,"url":"https://github.com/dosorio/rpanglaodb","last_synced_at":"2025-10-22T05:59:44.336Z","repository":{"id":56934966,"uuid":"273343750","full_name":"dosorio/rPanglaoDB","owner":"dosorio","description":"An R package to download and merge labeled single-cell RNA-seq data from the PanglaoDB database into a Seurat object.","archived":false,"fork":false,"pushed_at":"2023-06-01T19:09:23.000Z","size":2347,"stargazers_count":26,"open_issues_count":3,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-10-22T05:59:39.016Z","etag":null,"topics":["data-integration","data-mining","rna-seq","single-cell","single-cell-rna-seq"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dosorio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2020-06-18T21:29:52.000Z","updated_at":"2025-09-28T06:29:22.000Z","dependencies_parsed_at":"2022-08-21T01:10:12.213Z","dependency_job_id":"4427d7d3-cfe8-430e-8c52-976501e4cc80","html_url":"https://github.com/dosorio/rPanglaoDB","commit_stats":{"total_commits":131,"total_committers":2,"mean_commits":65.5,"dds":0.2748091603053435,"last_synced_commit":"1e23dd7b17a2fb130cb2ca4111ef1b639c4cf72a"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dosorio/rPanglaoDB","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dosorio%2FrPanglaoDB","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dosorio%2FrPanglaoDB/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dosorio%2FrPanglaoDB/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dosorio%2FrPanglaoDB/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dosorio","download_url":"https://codeload.github.com/dosorio/rPanglaoDB/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dosorio%2FrPanglaoDB/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280389301,"owners_count":26322507,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-22T02:00:06.515Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-integration","data-mining","rna-seq","single-cell","single-cell-rna-seq"],"created_at":"2025-10-22T05:59:36.603Z","updated_at":"2025-10-22T05:59:44.331Z","avatar_url":"https://github.com/dosorio.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"rPanglaoDB \n========\nAn R package to download and merge labeled single-cell RNA-seq data from the [PanglaoDB](https://panglaodb.se/) database into a [Seurat](https://CRAN.R-project.org/package=Seurat) object.\n\nInstall\n-------\nThis package requires `R` version 4.0 or higher. If you are using an older version of `R` you will be prompted to upgrade when you try to install the package.\n\nThe official release of `rPanglaoDB` is available on [CRAN](https://CRAN.R-project.org/package=rPanglaoDB). To install it from there, you can use the following command:\n```\n\u003e install.packages('rPanglaoDB', dependencies = TRUE)\n```\n\nIf you have `remotes` installed, you can install the latest stable version of `rPanglaoDB` package directly from GitHub:\n\n```\n\u003e remotes::install_github('dosorio/rPanglaoDB')\n```\nAvailable functions\n-------\n| Code        | Function |\n| :------------- |:-------------|\n|getMarkers|\tReturn a `data frame` with the list of samples from the panglaoDB database exhibiting a pattern of expression for a set of molecular markers.|\n|getSampleComposition| Return a `data frame` with the the cell-type content for each sample from the panglaoDB database.|\n|getSampleList| Return a `data frame` with the list of samples available at the panglaoDB database.|\n|getSamples| Download and return the expression matrix and annotations from the panglaoDB database in a `Seurat` object. |\n\nUsage\n-------\n#### Loading the package:\nAs any other R package `rPanglaoDB` can be loaded using the `library` function as follows:\n```\n\u003e library(rPanglaoDB)\n```\n#### Accessing the list of available samples:\nTo access the list of available samples deposited in the PanglaoDB database you may use the `getSamplesList()` function:\n```\n\u003e samplesList \u003c- getSampleList()\n```\nThis function returns a ``data frame`` with 6 columns matching with the information provided [here](https://panglaodb.se/samples.html) by the PanglaoDB database.\n```\n\u003e head(samplesList)\n\n        SRA        SRS                          Tissue     Protocol      Species Cells\n1 SRA553822 SRS2119548   Cultured embryonic stem cells 10x chromium Homo sapiens  6501\n2 SRA570744 SRS2253536                 Lung mesenchyme 10x chromium Mus musculus  4611\n3 SRA598936 SRS2428405                   Kidney cortex 10x chromium Homo sapiens  3759\n4 SRA644036 SRS2808714 Cervical and lumbar spinal cord 10x chromium Mus musculus  1025\n5 SRA670243 SRS3078084                Ventral midbrain 10x chromium Mus musculus  5603\n6 SRA689041 SRS3166675                           Colon 10x chromium Mus musculus  2878\n```\n#### Accessing the cellular composition of a sample:\nTo access the cell-type content for each sample from the panglaoDB database you may use the `getSampleComposition` function. This function returns the cell-type composition of the samples included in the PanglaoDB database in a `data frame` with 8 columns. For example, to retrieve the sample composition of the sample with SRS = SRS2119548 you may use the following code:\n```\n\u003e scSRS2119548 \u003c- getSampleComposition(srs = 'SRS2119548')\n\u003e head(scSRS2119548)\n\n          SRA        SRS                        Tissue     Protocol      Species Cluster Cells Cell Type\n1.1 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium Homo sapiens       0  1572   Unknown\n1.2 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium Homo sapiens       1   563   Unknown\n1.3 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium Homo sapiens       2   280   Unknown\n1.4 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium Homo sapiens       3   270   Unknown\n1.5 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium Homo sapiens       4   220   Unknown\n1.6 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium Homo sapiens       5   192   Unknown\n```\nRetrieved information match with the SRS2119548 reported record from the PanglaoDB available [here](https://panglaodb.se/list_clusters_and_cell_types.html?sra=SRA553822\u0026srs=SRS2119548).\n\n#### Accessing the list of available samples with specific expression patterns:\nTo access the list of available samples with specific expression patterns you may use the `getMarkers()` function. This function returns the output of a query submitted through [here](https://panglaodb.se/search.html) in the PanglaoDB database. \n\nAs an example, below we show how to retrieve the list of clusters containing two specific types of Endothelial cells. This type of cells act as barriers between vessels and tissues [(Aman et al., 2016)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5504427/). They are known to control the flow of substances and fluids into and out of a tissue. Endothelial cells line blood vessels and lymphatic vessels, and are found exclusively in vascularized tissue [(Bautch and Caron, 2015)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4355271/). Endothelial cells can be classified on the basis of a set of marker genes, for example, Lymphatic Endothelial Cells [(LEC)](https://promocell.com/product/human-dermal-lymphatic-endothelial-cells-hdlec/) are PECAM and PDPN positive, meanwhile Blood Endothelial Cells [(BEC)](https://promocell.com/product/human-dermal-blood-endothelial-cells-hdbec/) are PECAM1 and VWF positive but negative for PDPN and ACTA2. \n```\n\u003e BEC \u003c- getMarkers(include = c('PECAM1', 'VWF'), exclude = c('PDPN', 'ACTA2'))\n\u003e head(BEC)\n\n        SRA        SRS       Specie                           Tissue Cluster         Cell-Type                Markers\n1 SRA646572 SRS2833946 Homo sapiens           Human embryo forebrain      28 Endothelial cells +PECAM1+VWF-PDPN-ACTA2\n2 SRA646572 SRS2833947 Homo sapiens           Human embryo forebrain      24 Endothelial cells +PECAM1+VWF-PDPN-ACTA2\n3 SRA594999 SRS2397417 Homo sapiens Umbilical vein endothelial cells       0 Endothelial cells +PECAM1+VWF-PDPN-ACTA2\n4 SRA594999 SRS2397417 Homo sapiens Umbilical vein endothelial cells       2 Endothelial cells +PECAM1+VWF-PDPN-ACTA2\n5 SRA594999 SRS2397417 Homo sapiens Umbilical vein endothelial cells       3 Endothelial cells +PECAM1+VWF-PDPN-ACTA2\n6 SRA594999 SRS2397417 Homo sapiens Umbilical vein endothelial cells       4           Unknown +PECAM1+VWF-PDPN-ACTA2\n```\n```\n\u003e LEC \u003c- getMarkers(include = c('PECAM1', 'PDPN', 'PROX1'))\n\u003e head(LEC)\n\n        SRA        SRS       Specie                             Tissue Cluster         Cell-Type            Markers\n1 SRA640325 SRS2769051 Homo sapiens Lung proximal airway stromal cells      17 Endothelial cells +PECAM1+PDPN+PROX1\n2 SRA703206 SRS3296613 Homo sapiens         Colon (Ulcerative Colitis)      15           Unknown +PECAM1+PDPN+PROX1\n3 SRA782908 SRS3815606 Homo sapiens                            Decidua      13 Endothelial cells +PECAM1+PDPN+PROX1\n4 SRA637291 SRS2749416 Mus musculus                     Left Ventricle      17 Endothelial cells +PECAM1+PDPN+PROX1\n5 SRA652149 SRS2862117 Mus musculus         Lateral geniculate nucleus      11      Interneurons +PECAM1+PDPN+PROX1\n6 SRA611634 SRS2532206 Mus musculus                               Lung      18 Endothelial cells +PECAM1+PDPN+PROX1\n```\n\n#### Downloading the count matrices:\nOnce the desired samples to be downloaded are identified, the count matrices can be downloaded using the `getSamples` function. In the example below, we show how to download the set of Human Lymphatic Endothelial Cells applying two filters in the `getSample` function to the set of identified samples containing the desired phenotype (PECAM1+, PDPN+, PROX1+). By default, the output of the function is a `Seurat` object with all the samples merged. In this case is an object containing 1124 human endothelial cells. \n\n```\n\u003e countsLEC \u003c- getSamples(srs = unique(LEC$SRS), celltype = 'Endothelial cells', specie = 'Homo sapiens')\n|++++++++++++++++++++++++++++++++++++++++++++++++++| 100%\n\n\u003e countsLEC\nAn object of class Seurat \n39551 features across 1124 samples within 1 assay \nActive assay: RNA (39551 features, 0 variable features)\n```\nMetadata associated with the downloaded count matrices can be accessed using the `[[]]` operator.\n```\n\u003e head(countsLEC[[]])\n                 orig.ident nCount_RNA nFeature_RNA         CellTypes panglaoCluster                             Tissue       Specie\nAAACCTGTCAGTACGT SRS2769051       3137         1526 Endothelial cells             17 Lung proximal airway stromal cells Homo sapiens\nAAGGCAGAGGGAGTAA SRS2769051       1041          677 Endothelial cells             17 Lung proximal airway stromal cells Homo sapiens\nACCTTTAAGTAGGTGC SRS2769051       2431         1239 Endothelial cells             17 Lung proximal airway stromal cells Homo sapiens\nACGAGGAAGATGAGAG SRS2769051       2928         1470 Endothelial cells             17 Lung proximal airway stromal cells Homo sapiens\nACGGAGACAAGCTGTT SRS2769051       1971         1028 Endothelial cells             17 Lung proximal airway stromal cells Homo sapiens\nAGACGTTGTGCCTTGG SRS2769051       1176          750 Endothelial cells             17 Lung proximal airway stromal cells Homo sapiens\n```\n\nOptionally if the unmerged samples are needed, you may set the `merge` parameter as `FALSE`. In this case the output is a list containing *n* number of `Seurat` objects as samples requested in the input. \n```\n\u003e countsLEC \u003c- getSamples(srs = unique(LEC$SRS), celltype = 'Endothelial cells', specie = 'Homo sapiens', merge = FALSE)\n  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100%\n  \n\u003e countsLEC\n$SRS2769051\nAn object of class Seurat \n35225 features across 36 samples within 1 assay \nActive assay: RNA (35225 features, 0 variable features)\n\n$SRS3296613\nAn object of class Seurat \n32131 features across 860 samples within 1 assay \nActive assay: RNA (32131 features, 0 variable features)\n\n$SRS3815606\nAn object of class Seurat \n31724 features across 228 samples within 1 assay \nActive assay: RNA (31724 features, 0 variable features)\n\n```\n\nPost-processing\n-------\nOnce downloaded and merged the desired samples, some postprocessing is required to identify the cells exhibiting the desired phenotype. For that purpose, here we show the process how to integrate all the samples using [Seurat](https://CRAN.R-project.org/package=Seurat) and [Harmony](https://github.com/immunogenomics/harmony). The cluster exhibiting the desired phenotype is identified using the [Nebulosa](https://bioconductor.org/packages/Nebulosa/) package.\n```\n\u003e set.seed(1)\n\u003e countsLEC \u003c- Seurat::NormalizeData(countsLEC)\n\u003e countsLEC \u003c- Seurat::FindVariableFeatures(countsLEC)\n\u003e countsLEC \u003c- Seurat::ScaleData(countsLEC)\n\u003e countsLEC \u003c- Seurat::RunPCA(countsLEC, verbose = FALSE)\n\u003e countsLEC \u003c- harmony::RunHarmony(countsLEC, group.by.vars = 'orig.ident')\n\u003e countsLEC \u003c- Seurat::FindNeighbors(countsLEC, reduction = 'harmony')\n\u003e countsLEC \u003c- Seurat::FindClusters(countsLEC)\n\u003e countsLEC \u003c- Seurat::RunTSNE(countsLEC, reduction = 'harmony')\n\u003e Nebulosa::plot_density(countsLEC, features = c('PECAM1', 'PDPN', 'PROX1'), joint = TRUE)\n```\n![HDLEC](https://raw.githubusercontent.com/dosorio/rPanglaoDB/master/inst/plots/HDLEC.png)\n\nIn this example, cluster 4 is the one containing 121 Human Lymphatic Endothelial Cells with constitutive expression of PECAM1, PDPN, and PROX1.\n```\n\u003e Seurat::DotPlot(countsLEC, features = c('PECAM1', 'PDPN', 'PROX1')) + ggplot2::coord_flip()\n\n\u003e table(Seurat::Idents(countsLEC))\n  0   1   2   3   4   5   6   7   8   9 \n220 192 191 152 121  93  78  27  25  25 \n```\n![cellsHDLEC](https://raw.githubusercontent.com/dosorio/rPanglaoDB/master/inst/plots/cellsHDLEC.png)\n\nApplication case\n----------------\nTo show the potential of rPanglaoDB for collecting rare cell types by integrating multiple public datasets, we present a biological application collecting and characterizing a set of fibrocytes [here](https://htmlpreview.github.io/?https://raw.githubusercontent.com/dosorio/rPanglaoDB/master/inst/examples/Fibrocytes.nb.html)\n\nCitation\n-------\nTo cite package `rPanglaoDB` in publications use:\n```\n  Daniel Osorio, Marieke Kuijjer and James J. Cai (2021). rPanglaoDB: Download and Merge Single-Cell RNA-Seq Data from the PanglaoDB Database. R package. https://CRAN.R-project.org/package=rPanglaoDB\n```\nA `BibTeX` entry for `LaTeX` users is\n```\n  @Manual{,\n    title = {rPanglaoDB: Download and Merge Single-Cell RNA-Seq Data from the PanglaoDB Database},\n    author = {Daniel Osorio and Marieke Kuijjer and James J. Cai},\n    year = {2021},\n    note = {R package},\n    url = {https://CRAN.R-project.org/package=rPanglaoDB},\n  }\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdosorio%2Frpanglaodb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdosorio%2Frpanglaodb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdosorio%2Frpanglaodb/lists"}