https://github.com/morgan-feuz/cellbender

CellBender workflow for usage with scRNA-seq data
https://github.com/morgan-feuz/cellbender

python single-cell-rna-seq

Last synced: 8 months ago
JSON representation

CellBender workflow for usage with scRNA-seq data

Host: GitHub
URL: https://github.com/morgan-feuz/cellbender
Owner: Morgan-Feuz
Created: 2024-08-26T16:40:11.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-08-26T17:46:43.000Z (about 1 year ago)
Last Synced: 2024-12-27T05:42:03.755Z (9 months ago)
Topics: python, single-cell-rna-seq
Homepage:
Size: 3.91 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Guide for Ambient RNA Removal from Single-Cell RNA-seq Data with CellBender

[Cellbender](https://cellbender.readthedocs.io/en/latest/introduction/index.html) is a python package that is useful for the removal of technical artificats (such as ambient/background RNA) from single-cell omics data. In general, CellBender functions by taking raw count matrices and molecule-level information (generated by alignment/mapping tools such as [CellRanger](https://www.10xgenomics.com/support/software/cell-ranger/latest), [Alevin](https://salmon.readthedocs.io/en/latest/alevin.html), [StarSOLO](https://github.com/alexdobin/STAR/blob/master/README.md), etc.) and then estimates and removes systemic biases and background noise to improve estimates of gene expression. The output of the main `remove-background` command inlcudes a new, background RNA removed, `.h5` count matrix that can be easily used in downstream analysis (i.e., with [Seurat](https://satijalab.org/seurat/) or [Scanpy](https://scanpy.readthedocs.io/en/stable/)).

Additional Useful References:
+ https://cellbender.readthedocs.io/en/latest/
+ https://www.10xgenomics.com/analysis-guides/background-removal-guidance-for-single-cell-gene-expression-datasets-using-third-party-tools
+ https://cdn.10xgenomics.com/image/upload/v1660261286/support-documents/CG000329_TechnicalNote_InterpretingCellRangerWebSummaryFiles_RevA.pdf
+ https://www.10xgenomics.com/analysis-guides/introduction-to-ambient-rna-correction

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
### Linux Installation

A few notes before installing:
+ CellBender runs much much faster (~30+ hours without a GPU compared to ~30 minutes with a GPU, in my experience) on real datasets if the machine in use has the appropriate [NVIDIA](https://www.nvidia.com/en-us/geforce/graphics-cards/) [graphics processing unit](https://www.ibm.com/topics/gpu) (GPU), appropriate drivers, and [CUDA toolkit](https://developer.nvidia.com/cuda-toolkit) installed.
+ If the machine in use has a GPU with the appropriate drivers installed, it should be automatically detected during the CellBender installation, and the appropriate version of [PyTorch](https://pytorch.org/) with CUDA support should automatically be downloaded as a CellBender dependency as well.
+ Currently, Nvidia GPUs are the only ones supported by PyTorch and CUDA, and CUDA may not work properly with other GPUs, such as [AMD](https://www.amd.com/en/products/graphics/desktops/radeon.html).
+ PyTorch2.0 currently only supports CUDA 11.8 and CUDA may need to be downgraded to run CellBender. See the Stack Overflow issue [here](https://stackoverflow.com/questions/76673010/experiencing-a-problem-with-accessing-cudaversion-12-while-using-cellbender) for more information. However, CellBender should be able to automatically detect the right PyTorch with CUDA support.

```
# Check system Linux Mint version
$ lsb_release -a

# Check Ubuntu version
$ cat /etc/upstream-release/lsb-release

# Check CUDA version
$ nvidia-smi
```

Install CellBender:
```
# Recommended to install inside its own Conda env
# Recommended to use Python v3.7
$ conda create -n CellBender python=3.7
$ conda activate CellBender
(CellBender) $ pip install cellbender
```

For more information on using Conda Environments, see the related post [here](https://github.com/Morgan-Feuz/conda-virual-envs).

------------------------------------------------------------------------------------------------------------------------------------------------------------------

### Basic Usage

```
# basic usage
$ cellbender remove-background --cuda --input input_file.h5 --output output_file.h5

# or
cellbender remove-background \
--cuda \
--input raw_feature_bc_matrix.h5 \
--output output.h5 \
--expected-cells (value) \
--total-droplets-included (value) \
--fpr 0.01 \
--epochs 150

```

+ For CellBender, `remove-background` is the main command used to remove ambient RNA from a raw count matrix.
+ Note that `remove-background` should be ran on a count matrix as part of the pre-processing steps and before any downstream analysis has been applied to the data.
+ The output of `remove-background` is a new `.h5` count matrix, where the ambient (or backround) RNA has been estimated and removed.
+ As of CellBender v0.3.0, values for `--expected-cells` or `--total-droplets-included`, will not typically need to be specified as CellBender will choose reasonable values based on the input dataset.

The `remove-background` command will produce nine output files:
+`output_report.html`: HTML report including plots and commentary, along with any warnings or suggestions for improved parameter settings.
+ `output.h5`: Full count matrix as an h5 file, with background RNA removed. This file contains all the original droplet barcodes.
+`output_filtered.h5`: Filtered count matrix as an h5 file, with background RNA removed. The word “filtered” means that this file contains only the droplets which were determined to have a > 50% posterior probability of containing cells.
+ `output_cell_barcodes.csv`: CSV file containing all the droplet barcodes which were determined to have a > 50% posterior probability of containing cells. Barcodes are written in plain text. This information is also contained in each of the above outputs, but is included as a separate output for convenient use in certain downstream applications.
+ `output.pdf`: PDF file that provides a standard graphical summary of the inference procedure.
+ `output.log`: Log file produced by the `cellbender remove-background` run.
+ `output_metrics.csv`: Metrics describing the run, potentially to be used to flag problematic runs when using CellBender as part of a large-scale automated pipeline.
+ `ckpt.tar.gz`: Checkpoint file which contains the trained model and the full posterior.
+ `output_posterior.h5`: The full posterior probability of noise counts. This is not normally used downstream.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

### Combining CellBender Output with Seurat

Newer versions of [Seurat](https://satijalab.org/seurat/) use a data loader function `Read10x_h5`, which is not currently compatible with the CellBender output file format. However, the files can be made compatible with a few simple extra steps:

Solution from the CellBender Documentation:
```
# From a Python env in which PyTables is installed
$ ptrepack --complevel 5 tiny_output_filtered.h5:/matrix tiny_output_filtered_seurat.h5:/matrix
```

Solution from scCustomize Documentation:
```r
cell_bender_mat <- Read_CellBender_h5_Mat(file_name = "PATH/SampleA_out_filtered.h5")
```
The scCustomize documentation can be found [here](https://samuel-marsh.github.io/scCustomize/articles/Cell_Bender_Functions.html).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/morgan-feuz/cellbender

Awesome Lists containing this project

README