https://github.com/drisso/zinb_analysis
Data analysis and simulations for the ZINB-WaVE paper
https://github.com/drisso/zinb_analysis
Last synced: 9 months ago
JSON representation
Data analysis and simulations for the ZINB-WaVE paper
- Host: GitHub
- URL: https://github.com/drisso/zinb_analysis
- Owner: drisso
- Created: 2016-09-23T19:03:19.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2020-07-23T17:52:48.000Z (almost 6 years ago)
- Last Synced: 2024-12-17T13:51:37.823Z (over 1 year ago)
- Language: R
- Homepage:
- Size: 285 MB
- Stars: 11
- Watchers: 6
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data analysis and simulations for the ZINB-WaVE paper
This repository is designed to allow interested people to reproduce the results and figures of the paper:
Risso D, Perraudeau F, Gribkova S, Dudoit S, Vert JP. ZINB-WaVE: A general and flexible method for signal extraction from single-cell RNA-seq data. bioRxiv. doi: https://doi.org/10.1101/125112
## Dependencies
To be able to run the code in this repo, it is required to have `R` (>=3.3), `python` (>=2.7), and the following packages.
### R packages
- [zinbwave](https://github.com/drisso/zinbwave)
- cluster
- matrixStats
- magrittr
- RColorBrewer
- ggplot2
- reshape
- dplyr
- knitr
- rmarkdown
- mclust
- cowplot
- rARPACK
- Rtsne
- parallel
- digest
### Bioconductor packages
- EDASeq
- biomaRt
- scRNAseq
- SummarizedExperiment
- edgeR
- scran
- scater
- scone
- DESeq2
### python packages
- [ZIFA](https://github.com/epierson9/ZIFA)
### A note on `zinbwave` version
To exactly reproduce the analyses of the paper, version `0.1.1` of the `zinbwave` package is required. This can be installed in R with the following code.
```{r}
library(devtools)
install_github("drisso/archive-zinbwave@v0.1.1")
```
The `zinbwave` package is under active development and we are constantly fixing bugs, adding new features, and improving the documentation, hence we recommend to download the latest stable release from Bioconductor for all purposes other than exactly reproducing the analyses of our paper. To do so, use the following code.
```{r}
install.packages("BiocManager")
BiocManager::install("zinbwave")
```
## Getting started
### Real data
For each of the real datasets analyzed in the paper, there are a `.Rmd` file and a `.R` file in the `real_data` folder, e.g.,
for the Patel data, the files are [patel_covariates.Rmd](https://github.com/drisso/zinb_analysis/blob/master/real_data/patel_covariates.Rmd)
and [patel_plots.R](https://github.com/drisso/zinb_analysis/blob/master/real_data/patel_plots.R).
One needs to compile the .Rmd file first. This will have two effects: (i) it will create an HTML report with useful analyses of
the dataset; and (ii) it will create a `.rda` file with the results of `zinbwave`, `pca`, and `zifa`. Once this file is generated,
one can use the `.R` file to generate the dataset-specific plots found in the paper.
To generate the plots related to silhouette width, one needs to source the [silhouette.R](https://github.com/drisso/zinb_analysis/blob/master/real_data/silhouette.R)
file.
To generate the plots related to the goodness-of-fit, run the `.Rmd` files in the `real_data` folder starting with `goodness_of_fit`, e.g., for the Patel data, the file is [goodness_of_fit_patel.Rmd](https://github.com/drisso/zinb_analysis/blob/master/real_data/goodness_of_fit_patel.Rmd).
The Patel data are stored in `real_data/Patel.zip`. Please unzip this file prior to run the Patel analysis.
### Simulations
To create the simulated datasets from the real datasets used in the paper, first run the code in [simFunction.R](https://github.com/drisso/zinb_analysis/blob/master/sims/figures/simFunction.R). Then, run the `.R` files in the folders in `sims/figures`. Finally, run [figuresPaper.Rmd](https://github.com/drisso/zinb_analysis/blob/master/sims/figures/figuresPaper.Rmd).
To simulate the datasets from the Lun & Marioni model, run [lunSim.R](https://github.com/drisso/zinb_analysis/blob/master/sims/figures/fig6e-g/lunSim.R). It uses file [function.rds](https://github.com/drisso/zinb_analysis/blob/master/sims/figures/fig6e-g/function.rds) generated by the steps described in the Methods section of the paper. Then, run [fitZinbLun.R](https://github.com/drisso/zinb_analysis/blob/master/sims/figures/fig6e-g/fitZinbLun.R).
To fit the simulated datasets with n=10,000 cells, we used a [Makefile](https://github.com/drisso/zinb_analysis/blob/master/sims/figures/fig6ad-S13-S14/Makefile) to launch jobs on a server. Alternatively, you can just call [fitZinb10000.R](https://github.com/drisso/zinb_analysis/blob/master/sims/figures/fig6ad-S13-S14/fitZinb10000.R) from your terminal with the arguments you want.
For any questions or issues with the code on this repository, please use the "Issues" tab.