{"id":16320100,"url":"https://github.com/helenalc/muscat-comparison","last_synced_at":"2026-01-23T13:46:56.234Z","repository":{"id":82293726,"uuid":"175012554","full_name":"HelenaLC/muscat-comparison","owner":"HelenaLC","description":null,"archived":false,"fork":false,"pushed_at":"2024-05-14T19:09:35.000Z","size":114164,"stargazers_count":12,"open_issues_count":3,"forks_count":9,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-13T14:31:04.475Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HelenaLC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-03-11T14:03:03.000Z","updated_at":"2025-01-05T15:13:43.000Z","dependencies_parsed_at":null,"dependency_job_id":"d96d76f1-0442-40c9-ae60-a4854da20fdd","html_url":"https://github.com/HelenaLC/muscat-comparison","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/HelenaLC/muscat-comparison","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HelenaLC%2Fmuscat-comparison","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HelenaLC%2Fmuscat-comparison/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HelenaLC%2Fmuscat-comparison/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HelenaLC%2Fmuscat-comparison/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HelenaLC","download_url":"https://codeload.github.com/HelenaLC/muscat-comparison/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HelenaLC%2Fmuscat-comparison/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28693331,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-23T11:01:27.039Z","status":"ssl_error","status_checked_at":"2026-01-23T11:00:26.909Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-10T22:29:05.089Z","updated_at":"2026-01-23T13:46:56.210Z","avatar_url":"https://github.com/HelenaLC.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# On the discovery of population-specific state transitions \u003cbr\u003e from multi-sample multi-condition scRNA-seq data\n\nThis repository contains all the necessary code to perform the evaluations and analyses from our preprint available on [bioRxiv](https://www.biorxiv.org/content/10.1101/713412v1).\n\nAnalyses discussed in the **Differential state analysis of mouse cortex exposed to LPS treatment** results section are provided as a browsable `workflowr`\u003csup\u003e[1](#f1)\u003c/sup\u003e website [HERE](http://htmlpreview.github.io/?https://github.com/HelenaLC/muscat-comparison/blob/master/LPS/docs/index.html).\n\n## Prerequisites\n\nFor installation of the required libraries, we'll fist install the `r BiocStyle::Biocpkg(\"BiocManager\")` package:\n\n```r\ninstall.packages(\"BiocManager\")\n```\n\nThe code in this repository was developed using **R v3.6.2** and **Bioconductor v3.10**. Versions of R and Bioconductor that are currently being run should be checked via:\n\n```r\nversion\nBiocManager::version()\n```\n\nFinally, the code chunk below will install all package dependencies:\n\n```r\n# install 'ggrastr' from GitHub\nBiocManager::install(\"VPetukhov/ggrastr\")\n\n# install packages from CRAN \u0026 Bioconductor\npkgs \u003c- c(\"AnnotationDbi\",\"circlize\",\"countsimQC\",\"cowplot\",\"data.table\",\n    \"DESeq2\",\"DropletUtils\",\"dplyr\",\"edgeR\",\"ggplot2\",\"iCOBRA\",\"kSamples\",\n    \"jsonlite\",\"limma\",\"M3C\",\"magrittr\",\"MAST\",\"Matrix\",\"muscat\",\"msigdbr\",\n    \"org.Mm.eg.db\",\"pheatmap\",\"purrr\",\"RColorBrewer\",\"readxl\",\"reshape2\",\n    \"S4Vectors\",\"scater\",\"scDD\",\"scds\",\"scran\",\"sctransform\",\"Seurat\",\n    \"SingleCellExperiment\",\"topGO\",\"UpSetR\",\"viridis\",\"workflowr\",\"yaml\")\nBiocManager::install(pkgs, ask = FALSE)\n```\n\n## Setup\n\nR version and library have to be specified under `R` in the `config.yaml` file (e.g., `R: \"R_LIBS_USER=/path/to/library /path/to/R/executable\"`). If you run into any issues, I recommend running the specified character string from the command line and assuring that the outputs of `version` and `.libPaths()` are what you expect them to be.\n\nWithout modifications, the `Snakemake` comparison relies on 2 reference datasets for data simulation, method execution and comparison. These (or any other references) have to be downloaded, saved as `.rds` objects with appropriate names, and placed inside the `data/raw_data` directory. This is exemplified here for the Kang et al. reference used in the preprint:\n\n```r\n# install \u0026 load 'ExperimentHub'\nBiocManager::install(\"ExperimentHub\")\nlibrary(ExperimentHub)\n\n# initialize hub instance\neh \u003c- ExperimentHub()\n\n# list data available in 'muscData'\n(q \u003c- query(eh, c(\"Kang\", \"muscData\")))\n\n# load 'SingleCellExperiment's using IDs from above\nsce \u003c- eh[[q$ah_id]]\n\n# save as .rds; name should be '\u003cid\u003e_sce0.rds'\nfn \u003c- \"kang_sce0.rds\"\ndir \u003c- file.path(\"...\", \"data\", \"raw_data\")\nsaveRDS(sce, file.path(dir, fn))\n```\n\nFinally, execution of the `Snakemake` file requires running the `setup.R` script **once** to create all required directories as well as simulation, method and run parameters:\n\n```r\n# from within R\nsource(\"setup.R\")\n\n# from the terminal\nRscript setup.R\n```\n\nThe `Snakemake` should run now. A couple more points to note:\n\n1. `sim/run/meth_pars.R` in the `scripts` are re-exected with every `Snakemake` run, and any changes made to them will automatically be recognized (e.g., when a new simulation scenario or method is added).\n1. Running the whole workflow is computationally expensive (~3 days using 40 cores). For development purposes, with recommend limiting to 1 reference, fewer simulation replicates and/or fewer genes per simulation. Most importantly, at least initially, including one or no mixed model based methods will greatly speed things up!\n\n## How to...\n\n1. **add a new reference**\n    * `\u003cid\u003e_sce0.rds` has to be in place as described above\n    * `\"\u003cid\u003e\"` has to be added under `dids` in the `config.yaml` file\n    * a `scripts/prep_\u003cid\u003e.R` has to be added to, for example, assure unique sample identifiers exist, remove un-assigned cells or cell multiplets etc.\n1. **skip an existing reference**\n    * simply remove the corresponding ID under `dids` in `config.yaml`\n1. **add a new method**\n    * for a single method, add a new line (with unique identifier) for that method in the corresponding `data.frame` constructed in `scripts/meth_pars.R`\n    * for a new group of methods, add `id` under `ids` in the first line of `scripts/meth_pars.R` and code to construct a `data.frame` of appropriate format (must include a `id` column; see current methods for examples). Secondly, add a `apply_\u003cid\u003e.R` script under `scripts` that takes as input a SCE with `colData` columns `cluster/sample/group_id` and returns a `data.frame` with `p_adj.loc` and `p_adj.glb` values for each cluster-gene (see current `apply_x.R` scripts for exmples)\n1. **skip an existing method**\n    * to exclude a group of methods, comment out the corresponding `ids` in the first line of `scripts/meth_pars.R` (e.g., to skip all mixed model based methods, one would comment out `\"mm\"`)\n    * to exclude a single method, remove that method from the corresponding `data.frame` in `scripts/meth_pars.R`\n\n***\n\n### Workflow structure in detail\n\nIn brief, our `Snakemake` workflow for method comparison is organized into\n\n- a `config.yaml` file specify key parameters and directories\n- a `scripts` folder housing all utilized scripts (see below)\n- a `data` folder containing raw (reference) and simulated data\n- a `meta` folder for simulation, runmode, and method parameters\n- a `results` folder where all results are generated (as `.rds` files)\n- a `plots` folder where all output plots are generated  \n(as `.pdf` or `.png` and `.rds` files for `ggplot` objects)\n\nThe table below summarizes the different R scripts in `scripts`:\n\nscript      | description \n:-----------|:-----------------------------------------------\n`prep_X`    | generates a references SCE for simulation by\u003cbr\u003ei) keeping samples from one condition only; and,\u003cbr\u003eii) unifying relevant cell metadata names to `\"cluster/sample/group_id\"`\n`prep_sim` | prepares a reference SCE for simulation by\u003cbr\u003ei) retaining subpopulation-sample combinations with at least 100 cells; and,\u003cbr\u003eii) estimating cell / gene parameters (offsets / coefficients and dispersions)\n`sim_pars`  | for ea. simulation ID, generates a `.json` file in `meta/sim_pars`\u003cbr\u003ethat specifies simulation parameters (e.g., prob. of DS, nb. of simulation replicates)\n`run_pars`  | for ea. reference and simulation ID, generates a `.json` file in `meta/run_pars`\u003cbr\u003ethat specifies runmode parameters (e.g., nb. of cells/genes to sample, nb. of run replicates) \n`meth_pars` | for ea. method ID, generates a `.json` file in `meta/meth_pars`\u003cbr\u003ethat specifies method parameters\n`sim_data`  | provided with a reference dataset and simulation parameters,\u003cbr\u003esimulates data and writes a SCE to `data/sim_data`\n`apply_X`   | wrapper to run DS method of type X (`pb`, `mm`, `ad`, `mast`, `scdd`)\n`run_meth`  | reads in simulated data, method parameters, and performs DS analysis\u003cbr\u003eby running the corresponding `apply_X` script\n`run_meth_lps` | wrapper to apply method to the LPS dataset\n`plot_null` | for ea. reference ID, plots nominal p-value distributions for all null simulations\n`plot_perf_cat`     | plots TPR-FDR-points across DD categories for ea. p-value adjustment type (`p_adj.loc/glb`)\n`plot_perf_by_nx`   | plots TPR-FDR-points across the nb. of `x` (cells = `c`, samples = `s`)\n`plot_perf_by_xs`   | plots TPR-FDR-points across increasingly unbalanced sample/group-sizes\n`plot_perf_by_expr` | plots TPR-FDR-points across expression-level groups\n`plot_upset`        | plots an upset plot for the top gene-subpopulation combinations across methods and simulation replications\n`plot_lfc`          | scatter plots of simulated vs. estimated logFC stratified by method and DD category\n`plot_pb_mean_disp` | provided with a reference dataset, simulates a null dataset (no DS, no type-genes)\u003cbr\u003eand plots pseudobulk-level mean-dispersion estimates for simulated vs. reference data\n`plot_runtimes`     | barplots of runtimes vs. nb. of genes/cells\n`utils`        | various helpers for data handling, formatting, and plotting\n`session_info` | generates a `.txt` file capturing the output of `session_info()`\n\n### References\n\n\u003ca name=\"f1\"\u003e[1]\u003c/a\u003e:\nJohn Blischak, Peter Carbonetto and Matthew Stephens (2019).  \nworkflowr: A Framework for Reproducible and Collaborative Data Science.  \nR package version 1.4.0. https://CRAN.R-project.org/package=workflowr","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhelenalc%2Fmuscat-comparison","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhelenalc%2Fmuscat-comparison","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhelenalc%2Fmuscat-comparison/lists"}