{"id":38710894,"url":"https://github.com/deweylab/cello","last_synced_at":"2026-01-17T11:00:05.287Z","repository":{"id":37594388,"uuid":"179744768","full_name":"deweylab/CellO","owner":"deweylab","description":"CellO: Gene expression-based hierarchical cell type classification using the Cell Ontology","archived":false,"fork":false,"pushed_at":"2023-07-08T00:59:53.000Z","size":22938,"stargazers_count":68,"open_issues_count":11,"forks_count":14,"subscribers_count":5,"default_branch":"master","last_synced_at":"2026-01-04T21:08:10.735Z","etag":null,"topics":["bioinformatics","cell-biology","cell-type","cell-type-classification","computational-biology","machine-learning","ontologies","rna-seq","single-cell-rna-seq"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/deweylab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-04-05T19:45:38.000Z","updated_at":"2025-12-11T11:39:07.000Z","dependencies_parsed_at":"2023-09-27T04:45:06.878Z","dependency_job_id":null,"html_url":"https://github.com/deweylab/CellO","commit_stats":{"total_commits":170,"total_committers":5,"mean_commits":34.0,"dds":"0.30000000000000004","last_synced_commit":"5a8fe46d94c02bbd8002501f6ded8ac6cf8dfede"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/deweylab/CellO","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deweylab%2FCellO","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deweylab%2FCellO/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deweylab%2FCellO/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deweylab%2FCellO/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/deweylab","download_url":"https://codeload.github.com/deweylab/CellO/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deweylab%2FCellO/sbom","scorecard":{"id":338639,"data":{"date":"2025-08-11","repo":{"name":"github.com/deweylab/CellO","commit":"5a8fe46d94c02bbd8002501f6ded8ac6cf8dfede"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":1.7,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Code-Review","score":0,"reason":"Found 2/28 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 4 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Vulnerabilities","score":0,"reason":"10 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: PYSEC-2021-856 / GHSA-5545-2q6w-2gh6","Warn: Project is vulnerable to: GHSA-6p56-wp2h-9hxr","Warn: Project is vulnerable to: PYSEC-2021-857 / GHSA-f7c7-j99h-c22f","Warn: Project is vulnerable to: GHSA-fpfv-jqm9-f5jm","Warn: Project is vulnerable to: PYSEC-2020-73","Warn: Project is vulnerable to: PYSEC-2020-107 / GHSA-jjw5-xxj6-pcv5","Warn: Project is vulnerable to: PYSEC-2024-110 / GHSA-jw8x-6495-233v","Warn: Project is vulnerable to: PYSEC-2020-108","Warn: Project is vulnerable to: PYSEC-2023-102","Warn: Project is vulnerable to: PYSEC-2023-114"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}}]},"last_synced_at":"2025-08-18T05:17:05.168Z","repository_id":37594388,"created_at":"2025-08-18T05:17:05.168Z","updated_at":"2025-08-18T05:17:05.168Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28506593,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-17T10:25:30.148Z","status":"ssl_error","status_checked_at":"2026-01-17T10:25:29.718Z","response_time":85,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","cell-biology","cell-type","cell-type-classification","computational-biology","machine-learning","ontologies","rna-seq","single-cell-rna-seq"],"created_at":"2026-01-17T11:00:05.018Z","updated_at":"2026-01-17T11:00:05.259Z","avatar_url":"https://github.com/deweylab.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CellO: *Cell O*ntology-based classification \u0026nbsp; \u003cimg src=\"https://raw.githubusercontent.com/deweylab/CellO/master/cello.png\" alt=\"alt text\" width=\"70px\" height=\"70px\"\u003e\r\n\r\n![PyPI Version](https://img.shields.io/pypi/v/cello-classify)  \r\n\r\n## About\r\n\r\nCellO (Cell Ontology-based classification) is a Python package for performing cell type classification of human RNA-seq data. CellO makes hierarchical predictions against the [Cell Ontology](http://www.obofoundry.org/ontology/cl.html). These classifiers were trained on nearly all of the human primary cell, bulk RNA-seq data in the [Sequence Read Archive](https://www.ncbi.nlm.nih.gov/sra).\r\n\r\nFor more details regarding the underlying method, see the paper:\r\n[Bernstein, M.N., Ma, J., Gleicher, M., Dewey, C.N. (2020). CellO: Comprehensive and hierarchical cell type classification of human cellswith the Cell Ontology. *iScience*, 24(1), 101913.](https://www.sciencedirect.com/science/article/pii/S258900422031110X) \r\n\r\nThere are two modes in which one can use CellO: within Python in conjunction with [Scanpy](), or with the command line. \r\n\r\n## Installation\r\n\r\nTo install CellO using Pip, run the following command:\r\n\r\n`pip install cello-classify`\r\n\r\n## Running CellO from within Python\r\n\r\nCellO's API interfaces with the Scanpy Python library and can integrate into a more general single-cell analysis pipeline. For an example on how to use CellO with Scanpy, please see the [tutorial](https://github.com/deweylab/CellO/blob/package_for_pypi/tutorial/cello_tutorial.ipynb).\r\n\r\nThis tutorial can also be executed from a Google Colab notebook in the cloud: [https://colab.research.google.com/drive/1lNvzrP4bFDkEe1XXKLnO8PZ83StuvyWW?usp=sharing](https://colab.research.google.com/drive/1lNvzrP4bFDkEe1XXKLnO8PZ83StuvyWW?usp=sharing).\r\n\r\n## Running CellO from the command line\r\n\r\nCellO takes as input a gene expression matrix. CellO accepts data in multiple formats:\r\n* TSV: tab-separated value \r\n* CSV: comma-separated value\r\n* HDF5: a database in HDF5 format that includes three datasets: a dataset storing the expression matrix, a dataset storing the list of gene-names (i.e. rows), and a gene-set storing the list of cell ID's (i.e. columns)\r\n* 10x formatted directory: a directory in the 10x format including three files: ``matrix.mtx``, ``genes.tsv``, and ``barcodes.tsv``\r\n\r\nGiven an output-prefix provided to CellO (this can include the path to the output), CellO outputs three tables formatted as tab-separated-value files: \r\n* ``\u003coutput_prefix\u003e.probability.tsv``: a NxM classification probability table of N cells and M cell types where element (i,j) is a probability value that describes CellO's confidence that cell i is of cell type j  \r\n* ``\u003coutput_prefix\u003e.binary.tsv``: a NxM binary-decision matrix where element (i,j) is 1 if CellO predicts cell i to be of cell type j and is 0 otherwise.\r\n* ``\u003coutput_prefix\u003e.most_specific.tsv``: a table mapping each cell to the most-specific predicted cell\r\n* ``\u003coutput_prefix\u003e.log``: a directory that stores log files that store details of CellO's execution\r\n* ``\u003coutput_prefix\u003e.log/genes_absent_from_training_set.tsv``: if a new model is trained using the ``-t`` option, then this file will store the genes in CellO's training set that were _not_ found in the input dataset\r\n* ``\u003coutput_prefix\u003e.log/clustering.tsv``: a TSV file mapping each cell to its assigned cluster. Note, that if pre-computed clusters are provided via the ``-p`` option, then this file will not be written. \r\n\r\nUsage:\r\n\r\n```\r\ncello_predict [options] input_file\r\n\r\nOptions:\r\n  -h, --help            show this help message and exit\r\n  -a ALGO, --algo=ALGO  Hierarchical classification algorithm to apply\r\n                        (default='IR'). Must be one of: 'IR' - Isotonic\r\n                        regression, 'CLR' - cascaded logistic regression\r\n  -d DATA_TYPE, --data_type=DATA_TYPE\r\n                        Data type (required). Must be one of: 'TSV', 'CSV',\r\n                        '10x', or 'HDF5'. Note: if 'HDF5' is used, then\r\n                        arguments must be provided to the h5_cell_key,\r\n                        h5_gene_key, and h5_expression_key parameters.\r\n  -c H5_CELL_KEY, --h5_cell_key=H5_CELL_KEY\r\n                        The key of the dataset within the input HDF5 file\r\n                        specifying which dataset stores the cell ID's.  This\r\n                        argument is only applicable if '-d HDF5' is used\r\n  -g H5_GENE_KEY, --h5_gene_key=H5_GENE_KEY\r\n                        The key of the dataset within the input HDF5 file\r\n                        specifying which dataset stores the gene names/ID's.\r\n                        This argument is only applicable if '-d HDF5' is used\r\n  -e H5_EXPRESSION_KEY, --h5_expression_key=H5_EXPRESSION_KEY\r\n                        The key of the dataset within the input HDF5 file\r\n                        specifying which dataset stores the expression matrix.\r\n                        This argument is only applicable if '-d HDF5' is used\r\n  -r, --rows_cells      Use this flag if expression matrix is organized as\r\n                        CELLS x GENES rather than GENES x CELLS. Not\r\n                        applicable when '-d 10x' is used.\r\n  -u UNITS, --units=UNITS\r\n                        Units of expression. Must be one of: 'COUNTS', 'CPM',\r\n                        'LOG1_CPM', 'TPM', 'LOG1_TPM'\r\n  -s ASSAY, --assay=ASSAY\r\n                        Sequencing assay. Must be one of: '3_PRIME',\r\n                        'FULL_LENGTH'\r\n  -t, --train_model     If the genes in the input matrix don't match what is\r\n                        expected by the classifier, then train a classifier on\r\n                        the input genes. The model will be saved to\r\n                        \u003coutput_prefix\u003e.model.dill\r\n  -m MODEL, --model=MODEL\r\n                        Path to pretrained model file.\r\n  -l REMOVE_ANATOMICAL, --remove_anatomical=REMOVE_ANATOMICAL\r\n                        A comma-separated list of terms ID's from the Uberon\r\n                        Ontology specifying which tissues to use to filter\r\n                        results. All cell types known to be resident to the\r\n                        input tissues will be filtered from the results.\r\n  -p PRE_CLUSTERING, --pre_clustering=PRE_CLUSTERING\r\n                        A TSV file with pre-clustered cells. The first column\r\n                        stores the cell names/ID's (i.e. the column names of\r\n                        the input expression matrix) and the second column\r\n                        stores integers referring to each cluster. The TSV\r\n                        file should not have column names.\r\n  -b, --ontology_term_ids\r\n                        Use the less readable, but more rigorous Cell Ontology\r\n                        term id's in output\r\n  -o OUTPUT_PREFIX, --output_prefix=OUTPUT_PREFIX\r\n                        Prefix for all output files. This prefix may contain a\r\n                        path.\r\n```\r\n\r\nNotably, the input expression data's genes must match the genes expected by the trained classifier.  If the genes match, then CellO will use a pre-trained classifier to classify the expression profiles (i.e. cells) in the input dataset. \r\n\r\nTo provide an example, here is how you would run CellO on a toy dataset stored in ``example_input/Zheng_PBMC_10x``. This dataset is a set of 1,000 cells subsampled from the [Zheng et al. (2017)](https://www.nature.com/articles/ncomms14049) dataset.  To run CellO on this dataset, run this command:\r\n\r\n``cello_predict -d 10x -u COUNTS -s 3_PRIME example_input/Zheng_PBMC_10x -o test``\r\n\r\nNote that ``-o test`` specifies the all output files will have the prefix \"test\". The ``-d`` specifies the input format, ``-u`` specifies the units of the expression matrix, and ``-s`` specifies the assay-type.  For a full list of available formats, units, assay-types, run:\r\n\r\n``cello_predict -h``\r\n\r\n\r\n### Running CellO with a gene set that is incompatible with a pre-trained model\r\n\r\nIf the genes in the input file do not match the genes on which the model was trained, CellO can be told to train a classifier with only those genes included in the given input dataset by using the ``-t`` flag.  The trained model will be saved to a file named ``\u003coutput_prefix\u003e.model.dill`` where ``\u003coutput_prefix\u003e`` is the output-prefix argument provided via the ``-o`` option.  Training CellO usually takes under an hour. \r\n\r\nFor example, to train a model and run CellO on the file ``example_input/LX653_tumor.tsv``, run the command:\r\n\r\n``cello_predict -u COUNTS -s 3_PRIME -t -o test example_input/LX653_tumor.tsv``\r\n\r\nAlong with the classification results, this command will output a file ``test.model.dill``.\r\n\r\n### Running CellO with a custom model\r\n\r\nTraining a model on a new gene set needs only to be done once (see previous section). For example, to run CellO on ``example_input/LX653_tumor.tsv`` using a specific model stored in a file, run:\r\n\r\n``cello_predict -u COUNTS -s 3_PRIME -m test.model.dill -o test example_input/LX653_tumor.tsv``\r\n\r\nNote that ``-m test.model.dill`` tells CellO to use the model computed in the previous example.\r\n\r\n## Quantifying reads with Kallisto to match CellO's pre-trained models\r\n\r\nWe provide a commandline tool for quantifying raw reads with [Kallisto](https://pachterlab.github.io/kallisto/). Note that to run this script, Kallisto must be installed and available in your ``PATH`` environment variable.  This script will output an expression profile that includes all of the genes that CellO is expecting and thus, expression profiles created with this script are automatically compatible with CellO.\r\n\r\nThis script requires a preprocessed kallisto reference.  To download the pre-built Kallisto reference that is compatible with CellO, run the command:\r\n\r\n``bash download_kallisto_reference.sh``\r\n\r\nThis command will download a directory called ``kallisto_reference`` in the current directory. To run Kallisto on a set of FASTQ files, run the command\r\n\r\n``cello_quantify_sample \u003ccomma_dilimited_fastq_files\u003e \u003ctmp_dir\u003e -o \u003ckallisto_output_file\u003e``\r\n\r\nwhere ``\u003ccomma_delimited_fastq_files\u003e`` is a comma-delimited set of FASTQ files containing all of the reads for a single RNA-seq sample and ``\u003ctmp_dir\u003e`` is the location where Kallisto will store it's output files.  The file ``\u003ckallisto_output_file\u003e`` is a tab-separated-value table of the log(TPM+1) values that can be fed directly to CellO.  To run CellO on this output file, run:\r\n\r\n``cell_predict -u LOG1_TPM -s FULL_LENGTH \u003ckallisto_output_file\u003e -o \u003ccell_output_prefix\u003e``\r\n\r\nNote that the above command assumes that the assay is a full-length assay (meaning reads can originate from the full-length of the transcript).  If this is a 3-prime assay (reads originate from only the 3'-end of the transcript), the ``-s FULL_LENGTH`` should be replaced with ``-s 3_PRIME`` in the above command.\r\n\r\n## Trouble-shooting\r\n\r\nIf upon running `pip install cello` you receive an error installing Cython, that looks like:\r\n\r\n```\r\nERROR: Command errored out with exit status 1:\r\n     command: /scratch/cdewey/test_cello/CellO-master/cello_env/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '\"'\"'/tmp/pip-install-wo2dj5q7/quadprog/setup.py'\"'\"'; __file__='\"'\"'/tmp/pip-install-wo2dj5q7/quadprog/setup.py'\"'\"';f=getattr(tokenize, '\"'\"'open'\"'\"', open)(__file__);code=f.read().replace('\"'\"'\\r\\n'\"'\"', '\"'\"'\\n'\"'\"');f.close();exec(compile(code, __file__, '\"'\"'exec'\"'\"'))' egg_info --egg-base pip-egg-info\r\n         cwd: /tmp/pip-install-wo2dj5q7/quadprog/\r\n    Complete output (5 lines):\r\n    Traceback (most recent call last):\r\n      File \"\u003cstring\u003e\", line 1, in \u003cmodule\u003e\r\n      File \"/tmp/pip-install-wo2dj5q7/quadprog/setup.py\", line 17, in \u003cmodule\u003e\r\n        from Cython.Build import cythonize\r\n    ModuleNotFoundError: No module named 'Cython'\r\n    ----------------------------------------\r\nERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.\r\n```\r\n\r\nthen you may try upgrading to the latest version of pip and Cython by running:\r\n\r\n```\r\npython -m pip install --upgrade pip\r\npip install --upgrade cython\r\n```\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeweylab%2Fcello","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeweylab%2Fcello","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeweylab%2Fcello/lists"}