https://github.com/immunogenomics/starcat

Implements *CellAnnotator (aka *CAT/starCAT), annotating scRNA-Seq with predefined gene expression programs
https://github.com/immunogenomics/starcat

Last synced: about 1 year ago
JSON representation

Implements *CellAnnotator (aka *CAT/starCAT), annotating scRNA-Seq with predefined gene expression programs

Host: GitHub
URL: https://github.com/immunogenomics/starcat
Owner: immunogenomics
License: mit
Created: 2024-05-01T12:18:57.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2025-05-30T19:40:24.000Z (about 1 year ago)
Last Synced: 2025-06-02T01:36:13.205Z (about 1 year ago)
Language: Jupyter Notebook
Homepage:
Size: 58.5 MB
Stars: 23
Watchers: 12
Forks: 3
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## starCAT
Implements *CellAnnotator (aka *CAT/starCAT), annotating scRNA-Seq with predefined gene expression programs

## Citation

If you use *CAT, please cite our [preprint](https://doi.org/10.1101/2024.05.03.592310).

## Installation

You can install starCAT and its dependencies via the Python Package Index.
```bash
pip install starcatpy
```

We tested it with scikit-learn 1.3.2, AnnData 0.9.2, and python 3.8. To run the tutorials, you also need jupyter or jupyterlab as well as scanpy and cnmf:

```bash
pip install jupyterlab scanpy cnmf
```

## Basic starCAT usage
Please see our tutorials in [python](Examples/starCAT_vignette.ipynb) and [R](Examples/starCAT_vignette_R.ipynb). A sample pipeline using a pre-built reference programs (TCAT.V1) is shown below.

```python
# Load default TCAT reference from starCAT databse
tcat = starCAT(reference='TCAT.V1')

# tcat.ref.iloc[:5, :5]

# A1BG AARD AARSD1 ABCA1 ABCB1
# CellCycle-G2M 2.032614 22.965553 17.423538 3.478179 2.297279
# Translation 35.445282 0.000000 9.245893 0.477994 0.000000
# HLA 18.192997 14.632670 2.686475 3.937182 0.000000
# ISG 0.436212 0.000000 18.078197 17.354506 0.000000
# Mito 10.293049 0.000000 52.669895 14.615502 3.341488

# Load cell x genes counts data
adata = tcat.load_counts(datafn)

# Run starCAT
usage, scores = tcat.fit_transform(adata)

usage.iloc[0:2, 0:4]
# CellCycle-G2M Translation HLA ISG
# CATGCCTAGTCGATAA-1-gPlexA4 0.000039 0.001042 0.001223 0.000162
# AAGACCTGTAGCGTCC-1-gPlexC6 0.000246 0.100023 0.002991 0.042354

scores.iloc[0:2, :]
# ASA Proliferation ASA_binary \
# CATGCCTAGTCGATAA-1-gPlexA4 0.001556 0.00052 False
# AAGACCTGTAGCGTCC-1-gPlexC6 0.012503 0.01191 False

# Proliferation_binary Multinomial_Label
# CATGCCTAGTCGATAA-1-gPlexA4 False CD8_TEMRA
# AAGACCTGTAGCGTCC-1-gPlexC6 False CD4_Naive

```

starCAT also can be run in the command line.
```bash
starcat --reference "TCAT.V1" --counts {counts_fn} --output-dir {output_dir} --name {outuput_name}
```
* --reference - name of a default reference to download (ex. TCAT.V1) OR filepath containing a reference set of GEPs by genes (*.tsv/.csv/.txt), default is 'TCAT.V1'
* --counts - filepath to input (cell x gene) counts matrix as a matrix market (.mtx.gz), tab delimited text file, or anndata file (.h5ad)
* --scores - optional path to yaml file for calculating score add-ons, not necessary for pre-built references
* --output-dir - the output directory. all output will be placed in {output-dir}/{name}...'. default directory is '.'
* --name - the output analysis prefix name, default is 'starCAT'

For code to reproduce figures and analyses from our manuscript, please refer to the [TCAT analysis](https://github.com/immunogenomics/TCAT_analysis) Github.

## Alternate implementation
For small datasets (smaller than ~50,000 cells or 700 MB), try running starCAT without installing any packages on our [website](https://immunogenomics.io/starcat/).

## Creating your own reference

We provide example scripts for constructing custom starCAT references from [a single cNMF run](./Examples/build_reference_vignette.ipynb) or [multiple cNMF runs](./Examples/build_multidataset_reference_vignette.ipynb).

__Please let us know if you are interested in making your reference publically available for others to use analogous to our TCAT.V1 reference. You can email me at dkotliar@broadinstitute.org__

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/immunogenomics/starcat

Awesome Lists containing this project

README