Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/const-ae/transformgampoi

Variance stabilizing transformation for Gamma Poisson distributed data
https://github.com/const-ae/transformgampoi

Last synced: about 2 months ago
JSON representation

Variance stabilizing transformation for Gamma Poisson distributed data

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
```

# transformGamPoi

R package that accompanies our paper 'Comparison of transformations for single-cell RNA-seq data
' (https://www.nature.com/articles/s41592-023-01814-1).

`transformGamPoi` provides methods to stabilize the variance of single cell count data:

* acosh transformation based on the delta method
* shifted logarithm (log(x + c)) with a pseudo-count c, so that it approximates the acosh transformation
* randomized quantile and Pearson residuals

## Installation

You can install the current development version of `transformGamPoi` by typing the following into the [_R_](https://cloud.r-project.org/) console:
``` r
# install.packages("devtools")
devtools::install_github("const-ae/transformGamPoi")
```
The installation should only take a few seconds and work across all major operating systems (MacOS, Linux, Windows).

## Example

Let's compare the different variance-stabilizing transformations.

We start by loading the `transformGamPoi` package and setting a seed to make sure the results are reproducible.
```{r loadLibraries}
library(transformGamPoi)
set.seed(1)
```

We then load some example data, which we subset to 1000 genes and 500 cells
```{r loadData}
sce <- TENxPBMCData::TENxPBMCData("pbmc4k")
sce_red <- sce[sample(which(rowSums2(counts(sce)) > 0), 1000),
sample(ncol(sce), 500)]
```

We calculate the different variance-stabilizing transformations. We can either use the generic `transformGamPoi()` method and specify the `transformation`, or we use the low-level functions `acosh_transform()`, `shifted_log_transform()`, and `residual_transform()` which provide more settings. All functions return a matrix, which we can for example insert back into the `SingleCellExperiment` object:
```{r applyVSTs}
assay(sce_red, "acosh") <- transformGamPoi(sce_red, transformation = "acosh")
assay(sce_red, "shifted_log") <- shifted_log_transform(sce_red, overdispersion = 0.1)
# For large datasets, we can also do the processing without
# loading the full dataset into memory (on_disk = TRUE)
assay(sce_red, "rand_quant") <- residual_transform(sce_red, "randomized_quantile", on_disk = FALSE)
assay(sce_red, "pearson") <- residual_transform(sce_red, "pearson", clipping = TRUE, on_disk = FALSE)
```

Finally, we compare the variance of the genes after transformation using a scatter plot
```{r plotMeanVar, warning=FALSE}
par(pch = 20, cex = 1.15)
mus <- rowMeans2(counts(sce_red))
plot(mus, rowVars(assay(sce_red, "acosh")), log = "x", col = "#1b9e77aa", cex = 0.6,
xlab = "Log Gene Means", ylab = "Variance after transformation")
points(mus, rowVars(assay(sce_red, "shifted_log")), col = "#d95f02aa", cex = 0.6)
points(mus, rowVars(assay(sce_red, "pearson")), col = "#7570b3aa", cex = 0.6)
points(mus, rowVars(assay(sce_red, "rand_quant")), col = "#e7298aaa", cex = 0.6)
legend("topleft", legend = c("acosh", "shifted log", "Pearson Resid.", "Rand. Quantile Resid."),
col = c("#1b9e77", "#d95f02", "#7570b3", "#e7298a"), pch = 16)
```

# See also

There are a number of preprocessing methods and packages out there. Of particular interests are

* [sctransform](https://github.com/ChristophH/sctransform) by Christoph Hafemeister and the [Satija lab](https://satijalab.org/). The original developers of the Pearson residual variance-stabilizing transformation approach for single cell data.
* [scuttle::logNormCounts()](https://bioconductor.org/packages/release/bioc/html/scuttle.html) by Aaron Lun. This is an alternative to the `shifted_log_transform()` and plays nicely together with the Bioconductor universe. For more information, I highly recommend to take a look at the [normalization](https://bioconductor.org/books/release/OSCA/normalization.html) section of the [OSCA book](https://bioconductor.org/books/release/OSCA/).
* [Sanity](https://github.com/jmbreda/Sanity) by Jérémie Breda _et al._. This method is not directly concerned with variance stabilization, but still provides an interesting approach for single cell data preprocessing.

# Session Info

```{r}
sessionInfo()
```