Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/statOmics/zinbwaveZinger
Integrating zingeR with ZINB-WaVE weights
https://github.com/statOmics/zinbwaveZinger
Last synced: 23 days ago
JSON representation
Integrating zingeR with ZINB-WaVE weights
- Host: GitHub
- URL: https://github.com/statOmics/zinbwaveZinger
- Owner: statOmics
- Created: 2017-08-29T07:03:59.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2018-02-15T21:01:02.000Z (over 6 years ago)
- Last Synced: 2024-03-15T19:10:21.561Z (4 months ago)
- Language: TeX
- Homepage:
- Size: 29.2 MB
- Stars: 22
- Watchers: 9
- Forks: 9
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Lists
- awesome_single_cell - zinbwaveZinger - [R] - We introduce a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene- and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq. https://doi.org/10.1186/s13059-018-1406-4 (Software packages / RNA-seq)
- awesome-single-cell - zinbwaveZinger - [R] - We introduce a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene- and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq. https://doi.org/10.1186/s13059-018-1406-4 (Software packages / Count modelling and normalization)
- awesome-single-cell - zinbwaveZinger - [R] - We introduce a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene- and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq. https://doi.org/10.1186/s13059-018-1406-4 (Software packages / RNA-seq)
README
# Data analysis for the ZINB-WaVE / zingeR paper
This repository is designed to allow interested people to reproduce the results and figures of our paper called 'Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications', currently on bioRxiv at https://www.biorxiv.org/content/early/2018/01/18/250126.
All code in the repository is distributed under the GPL-3 license.For examples on how to use the method, please see the [zinbwave vignette](https://github.com/drisso/zinbwave/blob/master/vignettes/intro.Rmd#differential-expression) or the [zingeR vignette](https://github.com/statOmics/zingeR/blob/master/vignettes/zingeRVignette_v2.Rmd#differential-expression-analysis) on how to use the observation weights in your analysis.
For any questions or issues with the code on this repository, please use the "Issues" tab.
## Dependencies
To be able to run the code in this repo, it is required to have `R` (>=3.4) and the following packages.
### R packages
- [zingeR](https://github.com/statOmics/zingeR)
- countsimQC
- Seurat
- ggplot2
- RColorBrewer
- devtools
- scales
- gridBase
- grid
- Matrix
- dplyr
- plyr
- cowplot
- Rtsne
- FNN
- rARPACK### Bioconductor packages
- zinbwave
- limma
- edgeR
- DESeq2
- BiocParallel
- doParallel
- Biobase
- scde
- SingleCellExperiment
- MAST
- [xCell](https://github.com/dviraran/xCell)
- fgsea
- clusterExperiment
- GSEABase## Getting started
### Simulations
The functions used to estimate parameters based on real scRNA-seq data, and to simulate the expression counts can be found in the [simulationHelpFunctions_v7_diffInZero.R](https://github.com/statOmics/zinbwaveZinger/blob/master/zingeRsimulationFunctions/simulationHelpFunctions_v7_diffInZero.R) file in the zingeRsimulationFunctions folder. This framework has been used to simulate all scRNA-seq datasets.
We have also simulated a bulk RNA-seq dataset and code for this simulation can be found in the [rnaseqSim.R](https://github.com/statOmics/zinbwaveZinger/blob/master/zinbwaveSimulations/RNASeq/rnaseqSim.R) file.
The quality of the simulated datasets has been evaluated using the [countsimQC package](https://github.com/csoneson/countsimQC), and code for this evaluation can be found in the [zinbwaveSimulations/evaluateSimulatedData](https://github.com/statOmics/zinbwaveZinger/tree/master/zinbwaveSimulations/evaluateSimulatedData) folder.
The code for the evaluations on the simulated Islam, Trapnell and 10X datasets can be found in the respective [islam_sims_fc2](https://github.com/statOmics/zinbwaveZinger/tree/master/zinbwaveSimulations/islam_sims_fc2), [trapnell_sims_fc2](https://github.com/statOmics/zinbwaveZinger/tree/master/zinbwaveSimulations/trapnell_sims_fc2) and [tenX_sims_fc2](https://github.com/statOmics/zinbwaveZinger/tree/master/zinbwaveSimulations/tenX_sims_fc2) folders. In the respective files, the FDP-TPR plots are saved and the final Figures can be recreated with the [fdrTprPlots.R](https://github.com/statOmics/zinbwaveZinger/blob/master/zinbwaveSimulations/fdrTprPlots.R) file.
We have investigated the effect of the penalty parameter on the simulated Islam and 10x datasets, which can respectively be found in the [islam_sims_fc2_epsilon](https://github.com/statOmics/zinbwaveZinger/tree/master/zinbwaveSimulations/islam_sims_fc2_epsilon) and [tenX_sims_fc2_epsilon](https://github.com/statOmics/zinbwaveZinger/tree/master/zinbwaveSimulations/tenX_sims_fc2_epsilon) folders.### Mock comparisons
To generate the plots related to the false positive rate control, run [FPR_mocks_tenx.Rmd](https://github.com/statOmics/zinbwaveZinger/blob/master/fpr/tenx/FPR_mocks_tenx.Rmd) for the 10X genomics dataset and [FPR_mocks.Rmd](https://github.com/statOmics/zinbwaveZinger/blob/master/fpr/usoskin/FPR_mocks.Rmd) for the Usoskin dataset. Additionally, to generate the plots related to PCER when the penalization parameter of ZINB-WaVE is varied, run [FPR_mocks_eps_tenx.Rmd](https://github.com/statOmics/zinbwaveZinger/blob/master/fpr/epsilon_tenx/FPR_mocks_eps_tenx.Rmd) for the 10X genomics dataset and [FPR_mocks_eps_usoskin.Rmd](https://github.com/statOmics/zinbwaveZinger/blob/master/fpr/epsilon_usoskin/FPR_mocks_eps_usoskin.Rmd) for the Usoskin dataset.
### Real data
To generate the plots related to the analysis of the 10x Genomics PBMC dataset, first run [createDataObject.Rmd](https://github.com/statOmics/zinbwaveZinger/blob/master/realdata/createdata/createDataObject.Rmd) to create the data files. Then, to generate the data when the clustering is done using PCA, run both [de_seurat.Rmd](https://github.com/statOmics/zinbwaveZinger/blob/master/realdata/clusteringPCA/de_seurat.Rmd) and [de_othermethods.Rmd](https://github.com/statOmics/zinbwaveZinger/blob/master/realdata/clusteringPCA/de_othermethods.Rmd). There are two files to run instead of one unique file because packages `Seurat` and `zinbwave` load both many packages and R complains that there are too many packages loaded. To generate the plots when the clustering is done using ZINB-WaVE, run [dimredZinbwave.Rmd](https://github.com/statOmics/zinbwaveZinger/blob/master/realdata/clusteringW/dimredZinbwave.Rmd), [clusterW.Rmd](https://github.com/statOmics/zinbwaveZinger/blob/master/realdata/clusteringW/clusterW.Rmd), and then [de.Rmd](https://github.com/statOmics/zinbwaveZinger/blob/master/realdata/clusteringW/de.Rmd). Finally, to generate the plots, run [plotPaper.Rmd](https://github.com/statOmics/zinbwaveZinger/blob/master/realdata/plotPaper.Rmd).
To generate the results and plots for the differential expression analysis between the cell types identified in the Usoskin dataset, run the [deAnalysis.Rmd](https://github.com/statOmics/zinbwaveZinger/blob/master/realdata/usoskin/deAnalysis.Rmd) file.
### Time benchmarking
For each of the real datasets Islam, Usoskin, 10X genomics, respectively run [benchmark_islam.Rmd](https://github.com/statOmics/zinbwaveZinger/blob/master/timebenchmark/islam/benchmark_islam.Rmd), [benchmark_usoskin.Rmd](https://github.com/statOmics/zinbwaveZinger/blob/master/timebenchmark/usoskin/benchmark_usoskin.Rmd), [benchmark_tenx.Rmd](https://github.com/statOmics/zinbwaveZinger/blob/master/timebenchmark/tenx/benchmark_tenx.Rmd). Finally, run [benchmark_all.Rmd](https://github.com/statOmics/zinbwaveZinger/blob/master/timebenchmark/benchmark_all.Rmd).