Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
https://github.com/BayraktarLab/cell2location

Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)
https://github.com/BayraktarLab/cell2location
Last synced: 17 days ago
JSON representation
Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)
Host: GitHub
URL: https://github.com/BayraktarLab/cell2location
Owner: BayraktarLab
License: apache-2.0
Created: 2020-05-10T23:38:37.000Z (about 4 years ago)
Default Branch: master
Last Pushed: 2024-04-29T22:25:30.000Z (2 months ago)
Last Synced: 2024-05-22T07:49:36.480Z (about 1 month ago)
Language: Python
Homepage: https://cell2location.readthedocs.io/en/latest/
Size: 47.4 MB
Stars: 281
Watchers: 7
Forks: 54
Open Issues: 92
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists

awesome-deconvolution - cell2location - 021-01139-4)). (Methods)
README

        


   



### Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)

[![Stars](https://img.shields.io/github/stars/BayraktarLab/cell2location?logo=GitHub&color=yellow)](https://github.com/BayraktarLab/cell2location/stargazers)

![Build Status](https://github.com/BayraktarLab/cell2location/actions/workflows/test.yml/badge.svg?event=push)

[![Documentation Status](https://readthedocs.org/projects/cell2location/badge/?version=latest)](https://cell2location.readthedocs.io/en/stable/?badge=latest)

[![Downloads](https://pepy.tech/badge/cell2location)](https://pepy.tech/project/cell2location)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/BayraktarLab/cell2location/blob/master/docs/notebooks/cell2location_tutorial.ipynb)

[![Docker image on quay.io](https://img.shields.io/badge/container-quay.io/vitkl/cell2location-brightgreen "Docker image on quay.io")](https://quay.io/vitkl/cell2location) 

If you use cell2location please cite our paper: 

Kleshchevnikov, V., Shmatko, A., Dann, E. et al. Cell2location maps fine-grained cell types in spatial transcriptomics. Nat Biotechnol (2022). https://doi.org/10.1038/s41587-021-01139-4

https://www.nature.com/articles/s41587-021-01139-4

Please note that cell2locations requires 2 user-provided hyperparameters (N_cells_per_location and detection_alpha) - for detailed guidance on setting these hyperparameters and their impact see [the flow diagram and the note](https://github.com/BayraktarLab/cell2location/blob/master/docs/images/Note_on_selecting_hyperparameters.pdf). Many real datasets (especially human) show within-slide variability in RNA detection sensitivity - requiring you to try both recommended settings of the `detection_alpha` parameter: `detection_alpha=200` for low within-slide technical variability and `detection_alpha=20` for high within-slide technical variability.

Cell2location is a principled Bayesian model that can resolve fine-grained cell types in spatial transcriptomic data and create comprehensive cellular maps of diverse tissues. Cell2location accounts for technical sources of variation and borrows statistical strength across locations, thereby enabling the integration of single cell and spatial transcriptomics with higher sensitivity and resolution than existing tools. This is achieved by estimating which combination of cell types in which cell abundance could have given the mRNA counts in the spatial data, while modelling technical effects (platform/technology effect, contaminating RNA, unexplained variance).



   



Overview of the spatial mapping approach and the workflow enabled by cell2location. From left to right: Single-cell RNA-seq and spatial transcriptomics profiles are generated from the same tissue (1). Cell2location takes scRNA-seq derived cell type reference signatures and spatial transcriptomics data as input (2, 3). The model then decomposes spatially resolved multi-cell RNA counts matrices into the reference signatures, thereby establishing a spatial mapping of cell types (4).    

## Usage and Tutorials

The tutorial covering the estimation of expresson signatures of reference cell types, spatial mapping with cell2location and the downstream analysis can be found here and tried on [Google Colab](https://colab.research.google.com/github/BayraktarLab/cell2location/blob/master/docs/notebooks/cell2location_tutorial.ipynb): https://cell2location.readthedocs.io/en/latest/

Please report bugs via https://github.com/BayraktarLab/cell2location/issues and ask any usage questions about [cell2location](https://discourse.scverse.org/c/ecosytem/cell2location/42), [scvi-tools](https://discourse.scverse.org/c/help/scvi-tools/7) or [Visium data](https://discourse.scverse.org/c/general/visium/32) in scverse community discourse.

Cell2location package is implemented in a general way (using https://pyro.ai/ and https://scvi-tools.org/) to support multiple related models - both for spatial mapping, estimating reference cell type signatures and downstream analysis.

## Installation

We suggest using a separate conda environment for installing cell2location.

Create conda environment and install `cell2location` package

```bash

conda create -y -n cell2loc_env python=3.9

conda activate cell2loc_env

pip install cell2location[tutorials]

```

Finally, to use this environment in jupyter notebook, add jupyter kernel for this environment:

```bash

conda activate cell2loc_env

python -m ipykernel install --user --name=cell2loc_env --display-name='Environment (cell2loc_env)'

```

If you do not have conda please install Miniconda first:

```bash

cd /path/to/software

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

bash Miniconda3-latest-Linux-x86_64.sh

# use prefix /path/to/software/miniconda3

```

Before installing cell2location and it's dependencies, it could be necessary to make sure that you are creating a fully isolated conda environment by telling python to NOT use user site for installing packages by running this line before creating conda environment and every time before activatin conda environment in a new terminal session:

```bash

export PYTHONNOUSERSITE="literallyanyletters"

```

## Documentation and API details

User documentation is availlable on https://cell2location.readthedocs.io/en/latest/. 

Cell2location architecture is designed to simplify extended versions of the model that account for additional technical and biologial information. We plan to provide a tutorial showing how to add new model classes but please get in touch if you would like to contribute or build on top our package.

## Acknowledgements 

We thank all paper authors for their contributions:

Vitalii Kleshchevnikov, Artem Shmatko, Emma Dann, Alexander Aivazidis, Hamish W King, Tong Li, Artem Lomakin, Veronika Kedlian, Mika Sarkin Jain, Jun Sung Park, Lauma Ramona, Liz Tuck, Anna Arutyunyan, Roser Vento-Tormo, Moritz Gerstung, Louisa James, Oliver Stegle, Omer Ali Bayraktar

We also thank Pyro developers (Fritz Obermeyer, Martin Jankowiak), Krzysztof Polanski, Luz Garcia Alonso, Carlos Talavera-Lopez, Ni Huang for feedback on the package, Martin Prete for dockerising cell2location and other software support.

## FAQ

See https://github.com/BayraktarLab/cell2location/discussions

## Future development and experimental features

Future developments of cell2location are focused on 1) scalability to 100k-mln+ locations using amortised inference of cell abundance (same ideas as used in VAE), 2) extending cell2location to related spatial analysis tasks that require modification of the model (such as using cell type hierarchy information), and 3) incorporating features presented by more recently proposed methods (such as CAR spatial proximity modelling). We are also experimenting with Numpyro and JAX (https://github.com/vitkl/cell2location_numpyro).

## Tips

### Conda environment for A100 GPUs

```bash

export PYTHONNOUSERSITE="literallyanyletters"

conda create -y -n test_scvi16_cuda113 python=3.9

conda activate test_scvi16_cuda113

conda install -y -c anaconda hdf5 pytables git

pip install scvi-tools

pip install git+https://github.com/BayraktarLab/cell2location.git#egg=cell2location[tutorials]

pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 -f https://download.pytorch.org/whl/torch_stable.html

conda activate test_scvi16_cuda113

python -m ipykernel install --user --name=test_scvi16_cuda113 --display-name='Environment (test_scvi16_cuda113)'

```

### Issues with package version mismatches often originate from python user site rather than conda environment being used to install a subset of packages

Before installing cell2location and it's dependencies, it could be necessary to make sure that you are creating a fully isolated conda environment by telling python to NOT use user site for installing packages by running this line before creating conda environment and every time before activatin conda environment in a new terminal session:

```bash

export PYTHONNOUSERSITE="literallyanyletters"

```

### Useful code for reading and combining multiple Visium sections

Keeping info on distinct sections in a csv file (Google Sheet).

```python

sample_annot = pd.read_csv('./sample_annot.csv')

from glob import glob

sample_annot['path'] = pd.Series(

    glob(f'{sp_data_folder}*'),

    index=[sub('^.+WTSI_', '', sub('_GRCh38-2020-A$', '', i)) for i in glob(f'{sp_data_folder}*')]

)[sample_annot['Sample_ID']].values

import os

sample_annot['file'] = [os.path.basename(i) for i in sample_annot['path']]

sample_annot['Sample_ID'].unique()

```

Reading and concatenating samples.

```python

def read_and_qc(sample_name, file, path=sp_data_folder):

    """

    Read one Visium file and add minimum metadata and QC metrics to adata.obs

    NOTE: var_names is ENSEMBL ID as it should be, you can always plot with sc.pl.scatter(gene_symbols='SYMBOL')

    """

    

    adata = sc.read_visium(path + str(file) +'/',

                           count_file='filtered_feature_bc_matrix.h5',

                           load_images=True)

    adata.obs['sample'] = sample_name

    adata.var['SYMBOL'] = adata.var_names

    adata.var.rename(columns={'gene_ids': 'ENSEMBL'}, inplace=True)

    adata.var_names = adata.var['ENSEMBL']

    adata.var.drop(columns='ENSEMBL', inplace=True)

    

    # just in case there are non-unique ENSEMBL IDs

    adata.var_names_make_unique()

    # Calculate QC metrics

    sc.pp.calculate_qc_metrics(adata, inplace=True)

    adata.var['mt'] = [gene.startswith('mt-') for gene in adata.var['SYMBOL']]

    adata.obs['mt_frac'] = adata[:, adata.var['mt'].tolist()].X.sum(1).A.squeeze()/adata.obs['total_counts']

    

    # add sample name to obs names

    adata.obs["sample"] = [str(i) for i in adata.obs['sample']]

    adata.obs_names = 's' + adata.obs["sample"] \

                          + '_' + adata.obs_names

    adata.obs.index.name = 'spot_id'

    

    file = list(adata.uns['spatial'].keys())[0]

    adata.uns['spatial'][sample_name] = adata.uns['spatial'][file].copy()

    del adata.uns['spatial'][file]

    print(adata.uns['spatial'].keys())

    

    return adata

def read_all_and_qc(

    sample_annot, Sample_ID_col, file_col, sp_data_folder, 

    count_file='filtered_feature_bc_matrix.h5',

):

    """

    Read and concatenate all Visium files.

    """

    # read first sample

    adata = read_and_qc(

        sample_annot[Sample_ID_col][0], sample_annot[file_col][0], 

        path=sp_data_folder

    ) 

    # read the remaining samples

    slides = {}

    for i, s in enumerate(sample_annot[Sample_ID_col][1:]):

        adata_1 = read_and_qc(s, sample_annot[file_col][i], path=sp_data_folder) 

        slides[str(s)] = adata_1

    adata_0 = adata.copy()

    # combine individual samples

    #adata = adata.concatenate(list(slides.values()), index_unique=None)

    adata = adata.concatenate(

        list(slides.values()),

        batch_key="sample",

        uns_merge="unique",

        batch_categories=sample_annot[Sample_ID_col], 

        index_unique=None

    )

    sample_annot.index = sample_annot[Sample_ID_col]

    for c in sample_annot.columns:

        sample_annot.loc[:, c] = sample_annot[c].astype(str)

    adata.obs[sample_annot.columns] = sample_annot.reindex(index=adata.obs['sample']).values

    

    return adata

    

adata = read_all_and_qc(

    sample_annot=sample_annot, 

    Sample_ID_col='Sample_ID', 

    file_col='file', 

    sp_data_folder=sp_data_folder, 

    count_file='filtered_feature_bc_matrix.h5',

)

adata_incl_nontissue = read_all_and_qc(

    sample_annot=sample_annot, 

    Sample_ID_col='Sample_ID', 

    file_col='file', 

    sp_data_folder=sp_data_folder, 

    count_file='raw_feature_bc_matrix.h5',

)

```

Since Version 0.9.0 (released on 2023-04-11), the function `AnnData.concatenate()` has been deprecated in favour of `anndata.concat()` as per the official release notes ([Reference](https://anndata.readthedocs.io/en/latest/release-notes/index.html#id4)). Here is the updated code snippet of `read_all_and_qc`:

```python

from anndata import concat

def read_all_and_qc(

    sample_annot, Sample_ID_col, file_col, sp_data_folder, 

    count_file='filtered_feature_bc_matrix.h5',

):

    """

    Read and concatenate all Visium files.

    """

    # read all samples and store them in a list

    adatas = []

    for i, s in enumerate(sample_annot[Sample_ID_col]):

        adata_i = read_and_qc(s, Sample_ID_col[file_col][i], path=sp_data_folder) 

        adatas.append(adata_i)

    # combine individual samples

    adata = concat(

        adatas,

        merge="unique",

        uns_merge="unique",

        label="batch",

        keys=sample_annot[Sample_ID_col].tolist(), 

        index_unique=None

    )

    sample_annot.index = sample_annot[Sample_ID_col]

    for c in sample_annot.columns:

        sample_annot.loc[:, c] = sample_annot[c].astype(str)

    adata.obs[sample_annot.columns] = sample_annot.reindex(index=adata.obs['sample']).values

    return adata

adata = read_all_and_qc(

    sample_annot=sample_annot, 

    Sample_ID_col='Sample_ID', 

    file_col='file', 

    sp_data_folder=sp_data_folder, 

    count_file='filtered_feature_bc_matrix.h5',

)

cell2location.models.Cell2location.setup_anndata(

    adata=adata_vis,

    batch_key="batch")

```