https://github.com/helenalc/type-state
https://github.com/helenalc/type-state
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/helenalc/type-state
- Owner: HelenaLC
- Created: 2023-04-14T08:47:23.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-11-20T10:19:30.000Z (over 1 year ago)
- Last Synced: 2025-02-17T01:29:44.683Z (over 1 year ago)
- Language: R
- Size: 298 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
### setup
- workflow was implemented and last executed successfully with
**R v4.4.1 with Bioc 3.20, and Python v3.11.3 with Snakemake v7.26.0**
- R version and library have to be specified in the `config.yaml` file
(e.g., `R: "R_LIBS_USER=/path/to/library /path/to/R/executable"`)
- `.Rprofile` is used for handling and printing command line arguments
- `logs/` capture `.Rout` files from `R CMD BATCH` executions
- `data/` contains any synthetic and real data
- intermediate results are generated in `outs/`
- visualizations are generated in `plts/`
### workflow
- `` denotes a wildcard, namely: `t`ype, `s`tate, `b`atch,
`sim`ulation, `sco`re, `sel`ection, `sta`tistic,
`das` = differential state analysis method
- `00-get_sim/dat.R`
- **out:** for simulations, `data/sim/00-raw/t,s,b.rds`,
for real data, `data/dat/00-raw/.rds` (`` = dataset identifier)
- synthetic data generation (`splatter::splatPopSimulate()`)
- hereafter, `t,s,b` = ``
- `01-pro_sim/dat.R`
- **in:** `data/sim|dat/00-raw/.rds`
**out:** `data/sim|dat/01-fil/.rds`
- minimal filtering keeping genes with count > 1
in ≥ 10 cells, and cells with ≥ 10 detected genes
- log-library size normalization (`scater::logNormCounts()`)
- highly variable gene (HVG) selection (`scran::modelGeneVar()`)
- principal component analysis (PCA) using HVGs (`scater::runPCA()`)
- `02-sco.R`
- **in:** `data/sim|dat/01-fil/.rds`
**out:** `outs/sim|dat/sco-,.rds`
- source method from one of `02-sco-.R`
- compute gene-level metrics to quantify type-/state-specificity
- `03-sel.R`
- **in:** `outs/sim|dat/sco-,.rds`
**out:** `outs/sim|dat/sel-,.rds`
- source method from one of `03-sel-.R`
- select genes for reprocessing
- `04-rep.R`
- **in:** `outs/sim|dat/sco-,.rds`
**out:** `data/sim|dat/02-rep/,.rds`
- data reprocessing (PCA, clustering, reduction)
- `05-sta.R`
- **in:** `data/sim|dat/02-rep/,.rds`
**out:** `outs/sim|dat/sta-,,.rds`
- source method from on of `05-sta-.R`
- compute evaluation statistics
- `06-das.R`
- **in:** `data/sim|dat/02-rep/,.rds`
**out:** `outs/sim|dat/das-,,.rds`
- source method from one of `06-das-.R`
- perform differential state analysis (DSA)
- `07-eva.R`
- standalone script applied to experimental data only
- collects results across all feature selection strategies,
selects [10, 20, ..., 90\%] for top-rank features, and recomputes
evaluation statistics for accordingly reprocessed data (PCA, clustering)
- `08-plt_-.R`
- **in:** `outs/sim/.rds`
**out:** `plt/sim/-.pdf`
- e.g., `08-plt_das-F1.pdf` collects all DSA results
(`outs/sim/das-,,.rds`) and plots F1 scores
- visualization of synthetic data analysis results
- `08-qlt_-.R`
- **in:** `outs/dat/.rds`
**out:** `plt/dat/-.pdf`
- visualization of experimental data analysis results
- `09-aes.R`
- sourced to fix the order of feature scores (`SCO`),
ground truth-based (`DES`) and other selections (`SEL`),
and differential state analysis methods (`DAS`) across plots
- `10-session_info.R`
- lists and may be used to install all R packages used
(across CRAN, GitHub, and Bioconductor), and writes the
corresponding `sessionInfo()` output to `session_info.txt`