https://github.com/helenalc/msc-thesis
https://github.com/helenalc/msc-thesis
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/helenalc/msc-thesis
- Owner: HelenaLC
- Created: 2018-08-14T12:51:26.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2018-08-30T17:49:35.000Z (almost 8 years ago)
- Last Synced: 2025-02-17T01:29:44.763Z (over 1 year ago)
- Language: R
- Size: 12.6 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MSc thesis on "Differential Analysis of scRNA-seq data with complex experimental designs"
### contents
- **scripts:** R code to reproduce all analyses & figures.
- **results:** Figures & data produced by scripts.
### packages
- **pkg:** R package containing method wrappers and utilities for plotting & evaluation.
- **fda:** `FDA` package fork w/ modified code of `tperm.fd()` to decrease runtime.
- **scDD:** `scDD` package fork w/ modified version of `simulateSet()` to prevent repeated running of `findIndex()` & make simulated counts non-continuous.
### scripts
- **dd_patterns.R**:
Generates a schematic of differential distribution patterns
(reproduces: *dd_patterns*)
- **ECDFs.R**:
Generates an exemplary set of ECDFs for a 3 vs. 3 sample comparison.
(reproduces: *ECDFs*)
- **scDD-sim_ex.R**:
Visualises an exemplary `scDD` simulation
(reproduces: *scDD_sim_ex-med_exprs*, *scDD_sim_ex-expr_profiles*)
- **simDD-seurat.R**
Evaluates `Seurat` clustering performance on
10 simulation replicates w/ randomised parameters
(reproduces: *simDD-seurat_scores*)
- **simDD-sim_qc.R**:
Generates basic quality control plots for `simDD` simulated data
and data from Koh et al.
(reproduces: *qc_var_explained*, *qc_lib_sizes*,
*qc_top_expr*, *qc_expr_freq_vs_mean*, *qc_disp_vs_mean*)
- **scDD-null_sim.R**:
Evaluates method performances on 3 replicates of a null simulation
(reproduces: *scDD_null_sim*)
- **diffcyt_runmodes.R**:
Evaluates the performance of `diffcyt` for varying data inputs & summary statistics
(reproduces: *diffcyt_runmodes*)
- **kang-data_prep.R**:
Performed `Seurat` preprocessing & constructs a `daFrame` from
the Kang et al. raw data available at accession # GSE96583
- **kang-data_overview.R**:
Generates general data overview plot for the Kang et al. data set
(reproduces: *kang_cluster_props*, *kang_tsne*, *kang_nb_cells*)
- **kang-DS_analysis.R**:
Performs differential analysis using `diffcyt` & `edgeR` methods
& compares obtained results with thouse published
(reproduces: *kang_nb_de_gs*, *kang_overlap*, *kang_pvals*,
*kang_top_undetected*, *kang_highest_pvals_xxx*)
- **runtimes-nb_gs.R**:
Measures method runtimes for increasing numbers of genes
(reproduces: *runtimes*)
- **runtimes-FDA_reso.R**, **runtimes-FDA_nperm.R**:
Measure `FDA` runtimes for increasing `reso` and `n_perm` parameters
(reproduces: *runtimes_FDA_reso/nperm*)