{"id":28436002,"url":"https://github.com/immunogenomics/starcat","last_synced_at":"2025-06-28T01:31:58.879Z","repository":{"id":237779051,"uuid":"794540838","full_name":"immunogenomics/starCAT","owner":"immunogenomics","description":"Implements *CellAnnotator (aka *CAT/starCAT), annotating scRNA-Seq with predefined gene expression programs","archived":false,"fork":false,"pushed_at":"2025-05-30T19:40:24.000Z","size":61385,"stargazers_count":23,"open_issues_count":1,"forks_count":3,"subscribers_count":12,"default_branch":"main","last_synced_at":"2025-06-02T01:36:13.205Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/immunogenomics.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-01T12:18:57.000Z","updated_at":"2025-05-30T19:40:27.000Z","dependencies_parsed_at":"2024-08-26T22:02:10.791Z","dependency_job_id":"e7767e94-d3b8-45a5-b724-29d3b08dc6ca","html_url":"https://github.com/immunogenomics/starCAT","commit_stats":null,"previous_names":["immunogenomics/starcat"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/immunogenomics/starCAT","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/immunogenomics%2FstarCAT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/immunogenomics%2FstarCAT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/immunogenomics%2FstarCAT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/immunogenomics%2FstarCAT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/immunogenomics","download_url":"https://codeload.github.com/immunogenomics/starCAT/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/immunogenomics%2FstarCAT/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262361289,"owners_count":23299070,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-05T21:09:40.213Z","updated_at":"2025-06-28T01:31:58.874Z","avatar_url":"https://github.com/immunogenomics.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"## starCAT \u003cimg src=\"https://drive.google.com/uc?export=view\u0026id=1W1in9vldkKdNe6ncwsHD6L6MSvfcKV6M\" width=\"130px\" align=\"right\" /\u003e\nImplements *CellAnnotator (aka *CAT/starCAT), annotating scRNA-Seq with predefined gene expression programs\n\u003cbr\u003e\n\n## Citation\n\nIf you use *CAT, please cite our [preprint](https://doi.org/10.1101/2024.05.03.592310).\n\n## Installation\n\nYou can install starCAT and its dependencies via the Python Package Index.\n```bash\npip install starcatpy\n```\n\nWe tested it with scikit-learn 1.3.2, AnnData 0.9.2, and python 3.8. To run the tutorials, you also need jupyter or jupyterlab as well as scanpy and cnmf:\n\n```bash\npip install jupyterlab scanpy cnmf\n```\n\n\n## Basic starCAT usage\nPlease see our tutorials in [python](Examples/starCAT_vignette.ipynb) and [R](Examples/starCAT_vignette_R.ipynb). A sample pipeline using a pre-built reference programs (TCAT.V1) is shown below. \n\n```python\n# Load default TCAT reference from starCAT databse\ntcat = starCAT(reference='TCAT.V1')\n\n# tcat.ref.iloc[:5, :5]\n\n#                     A1BG       AARD     AARSD1      ABCA1     ABCB1\n# CellCycle-G2M   2.032614  22.965553  17.423538   3.478179  2.297279\n# Translation    35.445282   0.000000   9.245893   0.477994  0.000000\n# HLA            18.192997  14.632670   2.686475   3.937182  0.000000\n# ISG             0.436212   0.000000  18.078197  17.354506  0.000000\n# Mito           10.293049   0.000000  52.669895  14.615502  3.341488\n\n# Load cell x genes counts data\nadata = tcat.load_counts(datafn)\n\n# Run starCAT\nusage, scores = tcat.fit_transform(adata)\n\nusage.iloc[0:2, 0:4]\n#                             CellCycle-G2M  Translation       HLA       ISG\n# CATGCCTAGTCGATAA-1-gPlexA4       0.000039     0.001042  0.001223  0.000162\n# AAGACCTGTAGCGTCC-1-gPlexC6       0.000246     0.100023  0.002991  0.042354\n\nscores.iloc[0:2, :]\n#                                  ASA  Proliferation  ASA_binary  \\\n# CATGCCTAGTCGATAA-1-gPlexA4  0.001556        0.00052       False   \n# AAGACCTGTAGCGTCC-1-gPlexC6  0.012503        0.01191       False   \n\n#                             Proliferation_binary Multinomial_Label  \n# CATGCCTAGTCGATAA-1-gPlexA4                 False         CD8_TEMRA  \n# AAGACCTGTAGCGTCC-1-gPlexC6                 False         CD4_Naive  \n\n\n```\n\n\nstarCAT also can be run in the command line.\n```bash\nstarcat --reference \"TCAT.V1\" --counts {counts_fn} --output-dir {output_dir} --name {outuput_name}\n```\n* --reference - name of a default reference to download (ex. TCAT.V1) OR filepath containing a reference set of GEPs by genes (*.tsv/.csv/.txt), default is 'TCAT.V1'\n* --counts - filepath to input (cell x gene) counts matrix as a matrix market (.mtx.gz), tab delimited text file, or anndata file (.h5ad)\n* --scores - optional path to yaml file for calculating score add-ons, not necessary for pre-built references\n* --output-dir - the output directory. all output will be placed in {output-dir}/{name}...'. default directory is '.'\n* --name - the output analysis prefix name, default is 'starCAT'\n\n\nFor code to reproduce figures and analyses from our manuscript, please refer to the [TCAT analysis](https://github.com/immunogenomics/TCAT_analysis) Github.\n\n\n## Alternate implementation\nFor small datasets (smaller than ~50,000 cells or 700 MB), try running starCAT without installing any packages on our [website](https://immunogenomics.io/starcat/).\n\n## Creating your own reference\n\nWe provide example scripts for constructing custom starCAT references from [a single cNMF run](./Examples/build_reference_vignette.ipynb) or [multiple cNMF runs](./Examples/build_multidataset_reference_vignette.ipynb). \n\n__Please let us know if you are interested in making your reference publically available for others to use analogous to our TCAT.V1 reference. You can email me at dkotliar@broadinstitute.org__\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimmunogenomics%2Fstarcat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fimmunogenomics%2Fstarcat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimmunogenomics%2Fstarcat/lists"}