{"id":21962518,"url":"https://github.com/SingleR-inc/celldex-py","last_synced_at":"2025-07-22T13:32:09.638Z","repository":{"id":241635231,"uuid":"807250091","full_name":"BiocPy/celldex","owner":"BiocPy","description":"Collection of reference cell type datasets","archived":false,"fork":false,"pushed_at":"2024-11-18T16:30:11.000Z","size":242,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-11-18T17:52:51.115Z","etag":null,"topics":["single-cell"],"latest_commit_sha":null,"homepage":"https://biocpy.github.io/celldex/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BiocPy.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.md","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-28T18:32:31.000Z","updated_at":"2024-11-11T19:14:31.000Z","dependencies_parsed_at":"2024-05-29T12:20:00.993Z","dependency_job_id":"11eeefd4-ae5e-47e6-9841-55dfcde668c3","html_url":"https://github.com/BiocPy/celldex","commit_stats":null,"previous_names":["biocpy/celldex"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BiocPy%2Fcelldex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BiocPy%2Fcelldex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BiocPy%2Fcelldex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BiocPy%2Fcelldex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BiocPy","download_url":"https://codeload.github.com/BiocPy/celldex/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227101590,"owners_count":17731157,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["single-cell"],"created_at":"2024-11-29T10:42:51.057Z","updated_at":"2025-07-22T13:32:09.632Z","avatar_url":"https://github.com/BiocPy.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- These are examples of badges you might want to add to your README:\n     please update the URLs accordingly\n\n[![Built Status](https://api.cirrus-ci.com/github/\u003cUSER\u003e/celldex.svg?branch=main)](https://cirrus-ci.com/github/\u003cUSER\u003e/celldex)\n[![ReadTheDocs](https://readthedocs.org/projects/celldex/badge/?version=latest)](https://celldex.readthedocs.io/en/stable/)\n[![Coveralls](https://img.shields.io/coveralls/github/\u003cUSER\u003e/celldex/main.svg)](https://coveralls.io/r/\u003cUSER\u003e/celldex)\n[![Conda-Forge](https://img.shields.io/conda/vn/conda-forge/celldex.svg)](https://anaconda.org/conda-forge/celldex)\n[![Monthly Downloads](https://pepy.tech/badge/celldex/month)](https://pepy.tech/project/celldex)\n[![Twitter](https://img.shields.io/twitter/url/http/shields.io.svg?style=social\u0026label=Twitter)](https://twitter.com/celldex)\n--\u003e\n\n[![Project generated with PyScaffold](https://img.shields.io/badge/-PyScaffold-005CA0?logo=pyscaffold)](https://pyscaffold.org/)\n[![PyPI-Server](https://img.shields.io/pypi/v/celldex.svg)](https://pypi.org/project/celldex/)\n\n# celldex - reference cell type datasets\n\nThis package provides reference datasets with annotated cell types for convenient use by [BiocPy](https://github.com/biocpy) packages and workflows in Python.\nThese references were sourced and uploaded by the [**celldex** R/Bioconductor](https://bioconductor.org/packages/celldex) package.\n\nEach dataset is loaded as a [`SummarizedExperiment`](https://bioconductor.org/packages/SummarizedExperiment) that is ready for further analysis, and may be used for downstream analysis,\ne.g in the [SingleR Python implementation](https://github.com/SingleR-inc/singler).\n\n## Installation\n\nTo get started, install the package from [PyPI](https://pypi.org/project/celldex/):\n\n```shell\npip install celldex\n```\n\n## Find reference datasets\n\nThe `list_references()` function will display all available reference datasets along with their metadata.\n\n```python\nfrom celldex import list_references\n\nrefs = list_references()\nprint(refs[[\"name\", \"version\"]].head(3))\n\n## output\n# |    | name             | version    |\n# |---:|:-----------------|:-----------|\n# |  0 | immgen           | 2024-02-26 |\n# |  1 | blueprint_encode | 2024-02-26 |\n# |  2 | dice             | 2024-02-26 |\n```\n\n## Fetch reference datasets\n\nFetch a dataset as a [SummarizedExperiment](https://github.com/biocpy/summarizedexperiment):\n\n```python\nref = fetch_reference(\"immgen\", version=\"2024-02-26\")\nref2 = fetch_reference(\"hpca\", \"2024-02-26\")\n\nprint(ref)\n\n## output\n# class: SummarizedExperiment\n# dimensions: (22134, 830)\n# assays(1): ['logcounts']\n# row_data columns(0): []\n# row_names(22134): ['Zglp1', 'Vmn2r65', 'Gm10024', ..., 'Ifi44', 'Tiparp', 'Kdm1a']\n# column_data columns(3): ['label.main', 'label.fine', 'label.ont']\n# column_names(830): ['GSM1136119_EA07068_260297_MOGENE-1_0-ST-V1_MF.11C-11B+.LU_1.CEL', 'GSM1136120_EA07068_260298_MOGENE-1_0-ST-V1_MF.11C-11B+.LU_2.CEL', 'GSM1136121_EA07068_260299_MOGENE-1_0-ST-V1_MF.11C-11B+.LU_3.CEL', ..., 'GSM920653_EA07068_201207_MOGENE-1_0-ST-V1_TGD.VG4+24AHI.E17.TH_3.CEL', 'GSM920654_EA07068_201214_MOGENE-1_0-ST-V1_TGD.VG4+24ALO.E17.TH_1.CEL', 'GSM920655_EA07068_201215_MOGENE-1_0-ST-V1_TGD.VG4+24ALO.E17.TH_2.CEL']\n# metadata(0):\n```\n\n## Search for references\n\nThere's limited number of references right now, but if you want to search for references,\n\n```python\nres = search_references(\"human\")\nres = search_references(define_text_query(\"Immun%\", partial=\"True\"))\nres = search_references(define_text_query(\"10090\", field=\"taxonomy_id\"))\n```\n\n## Adding new reference datasets\n\nThese instructions follow the same steps outlined in the [scrnaseq package](https://github.com/biocpy/scrnaseq).\n\n1. Format your dataset as a `SummarizedExperiment`. Let's mock a reference dataset:\n\n     ***Note: Experiment object must include an assay ('logcounts') matrix containing log-normalized counts.***\n\n     ```python\n     import numpy as np\n     from summarizedexperiment import SummarizedExperiment\n     from biocframe import BiocFrame\n\n     mat = np.random.exponential(1.3, (100, 10))\n     row_names = [f\"GENE_{i}\" for i in range(mat.shape[0])]\n     col_names = list(\"ABCDEFGHIJ\")\n     sce = SummarizedExperiment(\n          assays={\"logcounts\": mat},\n          row_data=BiocFrame(row_names=row_names),\n          column_data=BiocFrame(data={\"label.fine\": col_names}),\n     )\n     ```\n\n2. Assemble the metadata for your reference dataset. This should be a dictionary as specified in the [Bioconductor metadata schema](https://github.com/ArtifactDB/bioconductor-metadata-index). Check out some examples from `fetch_metadata()`. Note that the `application.takane` property will be automatically added later, and so can be omitted from the list that you create.\n\n     ```python\n     meta = {\n          \"title\": \"New reference dataset\",\n          \"description\": \"This is a new reference dataset\",\n          \"taxonomy_id\": [\"10090\"],  # NCBI ID\n          \"genome\": [\"GRCm38\"],  # genome build\n          \"sources\": [{\"provider\": \"GEO\", \"id\": \"GSE12345\"}],\n          \"maintainer_name\": \"Jayaram kancherla\",\n          \"maintainer_email\": \"jayaram.kancherla@gmail.com\",\n     }\n     ```\n\n3. Save your `SummarizedExperiment`  object to disk with `save_reference()`. This saves the reference dataset into a \"staging directory\" using language-agnostic file formats - check out the [ArtifactDB](https://github.com/artifactdb) framework for more details.\n\n     ```python\n     import tempfile\n     from celldex import save_reference\n\n     # replace tmp with a staging directory\n     staging_dir = tempfile.mkdtemp()\n     save_reference(sce, staging_dir, meta)\n     ```\n\n     You can check that everything was correctly saved by reloading the on-disk data for inspection:\n\n     ```python\n     import dolomite_base as dl\n\n     dl.read_object(staging_dir)\n     ```\n\n4. Wait for us to grant temporary upload permissions to your GitHub account.\n\n5. Upload your staging directory to [**gypsum** backend](https://github.com/ArtifactDB/gypsum-worker) with `upload_reference()`. On the first call to this function, it will automatically prompt you to log into GitHub so that the backend can authenticate you. If you are on a system without browser access (e.g., most computing clusters), a [token](https://github.com/settings/tokens) can be manually supplied via `set_access_token()`.\n\n     ```python\n     from celldex import upload_reference\n\n     upload_reference(staging_dir, \"my_dataset_name\", \"my_version\")\n     ```\n\n     You can check that everything was successfully uploaded by calling `fetch_reference()` with the same name and version:\n\n     ```python\n     from celldex import fetch_reference\n\n     fetch_reference(\"my_dataset_name\", \"my_version\")\n     ```\n\n     If you realized you made a mistake, no worries. Use the following call to clear the erroneous dataset, and try again:\n\n     ```python\n     from gypsum_client import reject_probation\n\n     reject_probation(\"celldex\", \"my_dataset_name\", \"my_version\")\n     ```\n\n6. Comment on the PR to notify us that the dataset has finished uploading and you're happy with it. We'll review it and make sure everything's in order. If some fixes are required, we'll just clear the dataset so that you can upload a new version with the necessary changes. Otherwise, we'll approve the dataset. Note that once a version of a dataset is approved, no further changes can be made to that version; you'll have to upload a new version if you want to modify something.\n\n\u003c!-- pyscaffold-notes --\u003e\n\n## Note\n\nThis project has been set up using PyScaffold 4.5. For details and usage\ninformation on PyScaffold see https://pyscaffold.org/.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSingleR-inc%2Fcelldex-py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSingleR-inc%2Fcelldex-py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSingleR-inc%2Fcelldex-py/lists"}