https://github.com/biocpy/singler
Python bindings to the SingleR algorithm
https://github.com/biocpy/singler
Last synced: 5 months ago
JSON representation
Python bindings to the SingleR algorithm
- Host: GitHub
- URL: https://github.com/biocpy/singler
- Owner: BiocPy
- License: mit
- Created: 2023-08-30T23:41:56.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-06-27T16:28:21.000Z (11 months ago)
- Last Synced: 2024-11-05T02:37:26.452Z (6 months ago)
- Language: Python
- Homepage: https://biocpy.github.io/singler/
- Size: 353 KB
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
- Authors: AUTHORS.md
Awesome Lists containing this project
README
[](https://pyscaffold.org/)
[](https://pypi.org/project/singler/)
[](https://pepy.tech/project/singler)
# Tinder for single-cell data
## Overview
This package provides Python bindings to the [C++ implementation](https://github.com/LTLA/singlepp) of the [SingleR algorithm](https://github.com/LTLA/SingleR),
originally developed by [Aran et al. (2019)](https://www.nature.com/articles/s41590-018-0276-y).
It is designed to annotate cell types by matching cells to known references based on their expression profiles.
So kind of like Tinder, but for cells.## Quick start
Firstly, let's load in the famous PBMC 4k dataset from 10X Genomics:
```python
import singlecellexperiment as sce
data = sce.read_tenx_h5("pbmc4k-tenx.h5", realize_assays=True)
mat = data.assay("counts")
features = [str(x) for x in data.row_data["name"]]
```or if you are coming from scverse ecosystem, i.e. `AnnData`, simply read the object as `SingleCellExperiment` and extract the matrix and the features.
Read more on [SingleCellExperiment here](https://biocpy.github.io/tutorial/chapters/experiments/single_cell_experiment.html).```python
import singlecellexperiment as scesce_adata = sce.SingleCellExperiment.from_anndata(adata)
# or from a h5ad file
sce_h5ad = sce.read_h5ad("tests/data/adata.h5ad")
```Now, we fetch the Blueprint/ENCODE reference:
```python
import celldexref_data = celldex.fetch_reference("blueprint_encode", "2024-02-26", realize_assays=True)
```We can annotate each cell in `mat` with the reference:
```python
import singler
results = singler.annotate_single(
test_data = mat,
test_features = features,
ref_data = ref_data,
ref_labels = "label.main",
)
```The `results` data frame contains all of the assignments and the scores for each label:
```python
results.column("best")
## ['Monocytes',
## 'Monocytes',
## 'Monocytes',
## 'CD8+ T-cells',
## 'CD4+ T-cells',
## 'CD8+ T-cells',
## 'Monocytes',
## 'Monocytes',
## 'B-cells',
## ...
## ]results.column("scores").column("Macrophages")
## array([0.35935275, 0.40833545, 0.37430726, ..., 0.32135929, 0.29728435,
## 0.40208581])
```## Calling low-level functions
The `annotate_single()` function is a convenient wrapper around a number of lower-level functions in **singler**.
Advanced users may prefer to build the reference and run the classification separately.
This allows us to re-use the same reference for multiple datasets without repeating the build step.```python
built = singler.build_single_reference(
ref_data=ref_data.assay("logcounts"),
ref_labels=ref_data.col_data.column("label.main"),
ref_features=ref_data.get_row_names(),
restrict_to=features,
)
```And finally, we apply the pre-built reference to the test dataset to obtain our label assignments.
This can be repeated with different datasets that have the same features or a superset of `features`.```python
output = singler.classify_single_reference(
mat,
test_features=features,
ref_prebuilt=built,
)
```## output
BiocFrame with 4340 rows and 3 columns
best scores delta
[0] Monocytes 0.33265560369962943:0.407117403330602... 0.40706830113982534
[1] Monocytes 0.4078771641637374:0.4783396310685646... 0.07000418564184802
[2] Monocytes 0.3517036021728629:0.4076971245524348... 0.30997293412307647
... ... ...
[4337] NK cells 0.3472631136865701:0.3937898240670208... 0.09640242155786138
[4338] B-cells 0.26974632191999887:0.334862058137758... 0.061215905058676856
[4339] Monocytes 0.39390119034537324:0.468867490667427... 0.06678168346812047## Integrating labels across references
We can use annotations from multiple references through the `annotate_integrated()` function:
```python
import singler
import celldexblueprint_ref = celldex.fetch_reference("blueprint_encode", "2024-02-26", realize_assays=True)
immune_cell_ref = celldex.fetch_reference("dice", "2024-02-26", realize_assays=True)
single_results, integrated = singler.annotate_integrated(
mat,
features,
ref_data_list = (blueprint_ref, immune_cell_ref),
ref_labels_list = "label.main",
num_threads = 6
)
```This annotates the test dataset against each reference individually to obtain the best per-reference label,
and then it compares across references to find the best label from all references.
Both the single and integrated annotations are reported for diagnostics.```python
integrated.column("best_label")
## ['Monocytes',
## 'Monocytes',
## 'Monocytes',
## 'CD8+ T-cells',
## 'CD4+ T-cells',
## 'CD8+ T-cells',
## 'Monocytes',
## 'Monocytes',
## ...
## ]integrated.column("best_reference")
## ['Blueprint',
## 'Blueprint',
## 'Blueprint',
## 'Blueprint',
## 'Blueprint',
## 'Blueprint',
## 'Blueprint',
## ...
##]
```## Developer notes
Build the shared object file:
```shell
python setup.py build_ext --inplace
```For quick testing:
```shell
pytest
```For more complex testing:
```shell
python setup.py build_ext --inplace && tox
```To rebuild the **ctypes** bindings with [**cpptypes**](https://github.com/BiocPy/ctypes-wrapper):
```shell
cpptypes src/singler/lib --py src/singler/_cpphelpers.py --cpp src/singler/lib/bindings.cpp --dll _core
```