https://github.com/morgan-feuz/cellbender
CellBender workflow for usage with scRNA-seq data
https://github.com/morgan-feuz/cellbender
python single-cell-rna-seq
Last synced: 8 months ago
JSON representation
CellBender workflow for usage with scRNA-seq data
- Host: GitHub
- URL: https://github.com/morgan-feuz/cellbender
- Owner: Morgan-Feuz
- Created: 2024-08-26T16:40:11.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-26T17:46:43.000Z (about 1 year ago)
- Last Synced: 2024-12-27T05:42:03.755Z (9 months ago)
- Topics: python, single-cell-rna-seq
- Homepage:
- Size: 3.91 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Guide for Ambient RNA Removal from Single-Cell RNA-seq Data with CellBender
[Cellbender](https://cellbender.readthedocs.io/en/latest/introduction/index.html) is a python package that is useful for the removal of technical artificats (such as ambient/background RNA) from single-cell omics data. In general, CellBender functions by taking raw count matrices and molecule-level information (generated by alignment/mapping tools such as [CellRanger](https://www.10xgenomics.com/support/software/cell-ranger/latest), [Alevin](https://salmon.readthedocs.io/en/latest/alevin.html), [StarSOLO](https://github.com/alexdobin/STAR/blob/master/README.md), etc.) and then estimates and removes systemic biases and background noise to improve estimates of gene expression. The output of the main `remove-background` command inlcudes a new, background RNA removed, `.h5` count matrix that can be easily used in downstream analysis (i.e., with [Seurat](https://satijalab.org/seurat/) or [Scanpy](https://scanpy.readthedocs.io/en/stable/)).
Additional Useful References:
+ https://cellbender.readthedocs.io/en/latest/
+ https://www.10xgenomics.com/analysis-guides/background-removal-guidance-for-single-cell-gene-expression-datasets-using-third-party-tools
+ https://cdn.10xgenomics.com/image/upload/v1660261286/support-documents/CG000329_TechnicalNote_InterpretingCellRangerWebSummaryFiles_RevA.pdf
+ https://www.10xgenomics.com/analysis-guides/introduction-to-ambient-rna-correction-----------------------------------------------------------------------------------------------------------------------------------------------------------------
### Linux InstallationA few notes before installing:
+ CellBender runs much much faster (~30+ hours without a GPU compared to ~30 minutes with a GPU, in my experience) on real datasets if the machine in use has the appropriate [NVIDIA](https://www.nvidia.com/en-us/geforce/graphics-cards/) [graphics processing unit](https://www.ibm.com/topics/gpu) (GPU), appropriate drivers, and [CUDA toolkit](https://developer.nvidia.com/cuda-toolkit) installed.
+ If the machine in use has a GPU with the appropriate drivers installed, it should be automatically detected during the CellBender installation, and the appropriate version of [PyTorch](https://pytorch.org/) with CUDA support should automatically be downloaded as a CellBender dependency as well.
+ Currently, Nvidia GPUs are the only ones supported by PyTorch and CUDA, and CUDA may not work properly with other GPUs, such as [AMD](https://www.amd.com/en/products/graphics/desktops/radeon.html).
+ PyTorch2.0 currently only supports CUDA 11.8 and CUDA may need to be downgraded to run CellBender. See the Stack Overflow issue [here](https://stackoverflow.com/questions/76673010/experiencing-a-problem-with-accessing-cudaversion-12-while-using-cellbender) for more information. However, CellBender should be able to automatically detect the right PyTorch with CUDA support.```
# Check system Linux Mint version
$ lsb_release -a# Check Ubuntu version
$ cat /etc/upstream-release/lsb-release# Check CUDA version
$ nvidia-smi
```Install CellBender:
```
# Recommended to install inside its own Conda env
# Recommended to use Python v3.7
$ conda create -n CellBender python=3.7
$ conda activate CellBender
(CellBender) $ pip install cellbender
```For more information on using Conda Environments, see the related post [here](https://github.com/Morgan-Feuz/conda-virual-envs).
------------------------------------------------------------------------------------------------------------------------------------------------------------------
### Basic Usage
```
# basic usage
$ cellbender remove-background --cuda --input input_file.h5 --output output_file.h5# or
cellbender remove-background \
--cuda \
--input raw_feature_bc_matrix.h5 \
--output output.h5 \
--expected-cells (value) \
--total-droplets-included (value) \
--fpr 0.01 \
--epochs 150```
+ For CellBender, `remove-background` is the main command used to remove ambient RNA from a raw count matrix.
+ Note that `remove-background` should be ran on a count matrix as part of the pre-processing steps and before any downstream analysis has been applied to the data.
+ The output of `remove-background` is a new `.h5` count matrix, where the ambient (or backround) RNA has been estimated and removed.
+ As of CellBender v0.3.0, values for `--expected-cells` or `--total-droplets-included`, will not typically need to be specified as CellBender will choose reasonable values based on the input dataset.
The `remove-background` command will produce nine output files:
+`output_report.html`: HTML report including plots and commentary, along with any warnings or suggestions for improved parameter settings.
+ `output.h5`: Full count matrix as an h5 file, with background RNA removed. This file contains all the original droplet barcodes.
+`output_filtered.h5`: Filtered count matrix as an h5 file, with background RNA removed. The word “filtered” means that this file contains only the droplets which were determined to have a > 50% posterior probability of containing cells.
+ `output_cell_barcodes.csv`: CSV file containing all the droplet barcodes which were determined to have a > 50% posterior probability of containing cells. Barcodes are written in plain text. This information is also contained in each of the above outputs, but is included as a separate output for convenient use in certain downstream applications.
+ `output.pdf`: PDF file that provides a standard graphical summary of the inference procedure.
+ `output.log`: Log file produced by the `cellbender remove-background` run.
+ `output_metrics.csv`: Metrics describing the run, potentially to be used to flag problematic runs when using CellBender as part of a large-scale automated pipeline.
+ `ckpt.tar.gz`: Checkpoint file which contains the trained model and the full posterior.
+ `output_posterior.h5`: The full posterior probability of noise counts. This is not normally used downstream.-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
### Combining CellBender Output with Seurat
Newer versions of [Seurat](https://satijalab.org/seurat/) use a data loader function `Read10x_h5`, which is not currently compatible with the CellBender output file format. However, the files can be made compatible with a few simple extra steps:
Solution from the CellBender Documentation:
```
# From a Python env in which PyTables is installed
$ ptrepack --complevel 5 tiny_output_filtered.h5:/matrix tiny_output_filtered_seurat.h5:/matrix
```Solution from scCustomize Documentation:
```r
cell_bender_mat <- Read_CellBender_h5_Mat(file_name = "PATH/SampleA_out_filtered.h5")
```
The scCustomize documentation can be found [here](https://samuel-marsh.github.io/scCustomize/articles/Cell_Bender_Functions.html).