{"id":24527246,"url":"https://github.com/biobakery/maaslin2","last_synced_at":"2025-08-09T16:15:54.969Z","repository":{"id":38215415,"uuid":"186669667","full_name":"biobakery/Maaslin2","owner":"biobakery","description":"MaAsLin2: Microbiome Multivariate Association with Linear Models","archived":false,"fork":false,"pushed_at":"2024-11-11T02:56:50.000Z","size":1243,"stargazers_count":138,"open_issues_count":1,"forks_count":35,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-05-17T12:08:26.108Z","etag":null,"topics":["biobakery","bioconductor","differential-abundance-analysis","false-discovery-rate","metagenomics","microbiome","multiple-covariates","public","repeated-measures","tools"],"latest_commit_sha":null,"homepage":"http://huttenhower.sph.harvard.edu/maaslin2","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/biobakery.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-05-14T17:30:43.000Z","updated_at":"2025-05-15T05:52:45.000Z","dependencies_parsed_at":"2024-11-11T06:01:14.396Z","dependency_job_id":null,"html_url":"https://github.com/biobakery/Maaslin2","commit_stats":{"total_commits":497,"total_committers":18,"mean_commits":27.61111111111111,"dds":0.4386317907444668,"last_synced_commit":"550f3d1812900cd051cd714486dd5bb7d23436ea"},"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/biobakery/Maaslin2","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/biobakery%2FMaaslin2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/biobakery%2FMaaslin2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/biobakery%2FMaaslin2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/biobakery%2FMaaslin2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/biobakery","download_url":"https://codeload.github.com/biobakery/Maaslin2/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/biobakery%2FMaaslin2/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267177221,"owners_count":24047945,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-26T02:00:08.937Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["biobakery","bioconductor","differential-abundance-analysis","false-discovery-rate","metagenomics","microbiome","multiple-covariates","public","repeated-measures","tools"],"created_at":"2025-01-22T06:17:45.628Z","updated_at":"2025-08-09T16:15:54.909Z","avatar_url":"https://github.com/biobakery.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\n# MaAsLin2 User Manual #\n\nMaAsLin2 is the next generation of MaAsLin (Microbiome Multivariable Association with Linear Models).\n\n[MaAsLin2](http://huttenhower.sph.harvard.edu/maaslin2) is comprehensive R package for efficiently determining multivariable association between clinical metadata and microbial meta-omics features. MaAsLin2 relies on general linear models to accommodate most modern epidemiological study designs, including cross-sectional and longitudinal, along with a variety of filtering, normalization, and transform methods.\n\nIf you use the MaAsLin2 software, please cite our manuscript: \n\nMallick H, Rahnavard A, McIver LJ, Ma S, Zhang Y, Nguyen LH, Tickle TL, Weingart G, Ren B, Schwager EH, Chatterjee S, Thompson KN, Wilkinson JE, Subramanian A, Lu Y, Waldron L, Paulson JN, Franzosa EA, Bravo HC, Huttenhower C (2021). [Multivariable Association Discovery in Population-scale Meta-omics Studies](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009442). PLoS Computational Biology, 17(11):e1009442.\n\nCheck out the [MaAsLin 2 tutorial](https://github.com/biobakery/biobakery/wiki/maaslin2) for an overview of analysis options.\n\nIf you have questions, please direct it to :   \n[MaAsLin2 Forum](https://forum.biobakery.org/c/Downstream-analysis-and-statistics/MaAsLin2)    \n[Google Groups](https://groups.google.com/forum/#!forum/maaslin-users) (Read only)\n\n![](https://github.com/biobakery/Maaslin2/workflows/build%20and%20test/badge.svg)\n\n\u003ca href=\"http://www.bioconductor.org/packages/devel/bioc/html/Maaslin2.html#since\"\u003e\u003cimg border=\"0\" src=\"http://www.bioconductor.org/shields/years-in-bioc/Maaslin2.svg\" title=\"How long since the package was first in a released Bioconductor version (or is it in devel only).\"\u003e\u003c/a\u003e \n\n--------------------------------------------\n\n## Contents ##\n* [Description](#description)\n* [Requirements](#requirements)\n* [Installation](#installation)\n* [How to Run](#how-to-run)\n    * [Input Files](#input-files)\n    * [Output Files](#output-files)\n    * [Run a Demo](#run-a-demo)\n    * [Options](#options)\n* [Visualization](#visualization)\n* [Troubleshooting](#troubleshooting)\n\n## Description ##\n\nMaAsLin2 finds associations between microbiome multi-omics features and complex metadata in population-scale epidemiological studies. The software includes multiple analysis methods (including support for multiple covariates and repeated measures), filtering, normalization, and transform options to customize analysis for your specific study. \n\n## Requirements ##\n\nMaAsLin2 is an R package that can be run on the command line or as an R function.\n\n## Installation ##\n\nMaAsLin2 can be run from the command line or as an R function.\n\nIf only running from the command line, you do not need to install the MaAsLin2 package but you will need to install the MaAsLin2 dependencies.\n\n### From command line ###\n\n1. Download the source: [MaAsLin2.master.zip](https://github.com/biobakery/Maaslin2/archive/master.zip)\n2. Decompress the download: \n    * ``$ unzip master.zip``\n3. Install the Bioconductor dependencies edgeR and metagenomeSeq. \n4. Install the CRAN dependencies:\n    * ``$ R -q -e \"install.packages(c('lmerTest','pbapply','car','dplyr','vegan','chemometrics','ggplot2','pheatmap','hash','logging','data.table','glmmTMB','MASS','cplm','pscl'), repos='http://cran.r-project.org')\"``\n5. Install the MaAsLin2 package (only r,equired if running as an R function): \n    * ``$ R CMD INSTALL maaslin2``\n\n### From R ###\n\nTo install the latest release version of MaAsLin 2:\n\n```{r, eval=FALSE}\nif(!requireNamespace(\"BiocManager\", quietly = TRUE))\n    install.packages(\"BiocManager\")\nBiocManager::install(\"Maaslin2\")\n```\n\nTo install the latest development version of MaAsLin 2:\n\n```{r, eval=FALSE}\ninstall.packages(\"devtools\")\nlibrary(\"devtools\")\ninstall_github(\"biobakery/Maaslin2\")\n```\n\n## How to Run ##\n\nMaAsLin2 can be run from the command line or as an R function. Both \nmethods require the same arguments, have the same options, \nand use the same default settings.\n\n### Input Files ###\n\nMaAsLin2 requires two input files.\n\n1. Data (or features) file\n    * This file is tab-delimited.\n    * Formatted with features as columns and samples as rows.\n    * The transpose of this format is also okay.\n    * Possible features in this file include taxonomy or genes.\n2. Metadata file\n    * This file is tab-delimited.\n    * Formatted with features as columns and samples as rows.\n    * The transpose of this format is also okay.\n    * Possible metadata in this file include gender or age.\n\nThe data file can contain samples not included in the metadata file\n(along with the reverse case). For both cases, those samples not \nincluded in both files will be removed from the analysis. \nAlso the samples do not need to be in the same order in the two files.\n\nNOTE: If running MaAsLin2 as a function, the data and metadata \ninputs can be of type ``data.frame`` instead of a path to a file.\n\n### Output Files ###\n\nMaAsLin2 generates two types of output files: data and visualization.\n\n1. Data output files\n    * ``all_results.tsv``\n        * This includes the same data as the data.frame returned.\n        * This file contains all results ordered by increasing q-value.\n        * The first columns are the metadata and feature names.\n        * The next two columns are the value and coefficient from the model.\n        * The next column is the standard deviation from the model.\n        * The ``N`` column is the total number of data points.\n        * The ``N.not.zero`` column is the total of non-zero data points.\n        * The pvalue from the calculation is the second to last column.\n        * The qvalue is computed with `p.adjust` with the correction method.\n    * ``significant_results.tsv``\n        * This file is a subset of the results in the first file.\n        * It only includes associations with q-values \u003c= to the threshold.\n    * ``features```\n        * This folder includes the filtered, normalized, and transformed versions of the input feature table.\n        * These steps are performed sequentially in the above order.\n        * If an option is set such that a step does not change the data, the resulting table will still be output.\n    * ``models.rds``\n        * This file contains a list with every model fit object.\n        * It will only be generated if save_models is set to TRUE.\n    * ``residuals.rds``\n        * This file contains a data frame with residuals for each feature.\n    * ``fitted.rds``\n        * This file contains a data frame with fitted values for each feature.\n    * ``ranef.rds``\n        * This file contains a data frame with extracted random effects for each feature (if random effects are specified).\n    * ``maaslin2.log``\n        * This file contains all log information for the run.\n        * It includes all settings, warnings, errors, and steps run.\n2. Visualization output files\n    * ``heatmap.pdf``\n        * This file contains a heatmap of the significant associations.\n    * ``[a-z/0-9]+.pdf``\n        * A plot is generated for each significant association.\n        * Scatter plots are used for continuous metadata.\n        * Box plots are for categorical data.\n        * Data points plotted are after filtering but prior to normalization and transform.\n\n### Run a Demo ###\n\nExample input files can be found in the ``inst/extdata`` folder \nof the MaAsLin2 source. The files provided were generated from\nthe HMP2 data which can be downloaded from https://ibdmdb.org/ .\n\n``HMP2_taxonomy.tsv``: is a tab-demilited file with species as columns and samples as rows. It is a subset of the taxonomy file so it just includes the species abundances for all samples.\n\n``HMP2_metadata.tsv``: is a tab-delimited file with samples as rows and metadata as columns. It is a subset of the metadata file so that it just includes some of the fields.\n\n\n#### Command line ####\n\n``$ Maaslin2.R --fixed_effects=\"diagnosis,dysbiosisnonIBD,dysbiosisUC,dysbiosisCD,antibiotics,age\" --random_effects=\"site,subject\" --standardize=FALSE inst/extdata/HMP2_taxonomy.tsv inst/extdata/HMP2_metadata.tsv demo_output``\n\n* Make sure to provide the full path to the MaAsLin2 executable (ie ./R/Maaslin2.R).\n* In the demo command:\n    * ``HMP2_taxonomy.tsv`` is the path to your data (or features) file\n    * ``HMP2_metadata.tsv`` is the path to your metadata file\n    * ``demo_output`` is the path to the folder to write the output\n\n\n#### In R ####\n\n```{r}\nlibrary(Maaslin2)\ninput_data \u003c- system.file(\n    'extdata','HMP2_taxonomy.tsv', package=\"Maaslin2\")\ninput_metadata \u003c-system.file(\n    'extdata','HMP2_metadata.tsv', package=\"Maaslin2\")\nfit_data \u003c- Maaslin2(\n  input_data, input_metadata, 'demo_output',\n  fixed_effects = c('diagnosis', 'dysbiosisnonIBD','dysbiosisUC','dysbiosisCD', 'antibiotics', 'age'),\n  random_effects = c('site', 'subject'),\n  reference=c(\"diagnosis,CD\"),\n  standardize = FALSE, cores=1)\n```\n\n##### Session Info #####\n\nSession info from running the demo in R can be displayed with the following command.\n\n```{r}\nsessionInfo()\n```\n\n### Options ###\n\nRun MaAsLin2 help to print a list of the options and the default settings.\n\n\n$ Maaslin2.R --help\nUsage: ./R/Maaslin2.R [options] \u003cdata.tsv\u003e \u003cmetadata.tsv\u003e \u003coutput_folder\u003e\n\n\nOptions:\n    -h, --help\n        Show this help message and exit\n\n    -a MIN_ABUNDANCE, --min_abundance=MIN_ABUNDANCE\n        The minimum abundance for each feature [ Default: 0 ]   \n\n    -p MIN_PREVALENCE, --min_prevalence=MIN_PREVALENCE\n        The minimum percent of samples for which a feature \n        is detected at minimum abundance [ Default: 0.1 ]\n\n    -b MIN_VARIANCE, --min_variance=MIN_VARIANCE\n       Keep features with variance greater than [ Default: 0.0 ]\n\n    -s MAX_SIGNIFICANCE, --max_significance=MAX_SIGNIFICANCE\n        The q-value threshold for significance [ Default: 0.25 ]\n\n    -n NORMALIZATION, --normalization=NORMALIZATION\n        The normalization method to apply [ Default: TSS ]\n        [ Choices: TSS, CLR, CSS, NONE, TMM ]\n\n    -t TRANSFORM, --transform=TRANSFORM\n        The transform to apply [ Default: LOG ]\n        [ Choices: LOG, LOGIT, AST, NONE ]\n\n    -m ANALYSIS_METHOD, --analysis_method=ANALYSIS_METHOD\n        The analysis method to apply [ Default: LM ]\n        [ Choices: LM, CPLM, NEGBIN, ZINB ]\n\n    -r RANDOM_EFFECTS, --random_effects=RANDOM_EFFECTS\n        The random effects for the model, comma-delimited\n        for multiple effects [ Default: none ]\n\n    -f FIXED_EFFECTS, --fixed_effects=FIXED_EFFECTS\n        The fixed effects for the model, comma-delimited\n        for multiple effects [ Default: all ]\n\n    -c CORRECTION, --correction=CORRECTION\n        The correction method for computing the \n        q-value [ Default: BH ]\n\n    -z STANDARDIZE, --standardize=STANDARDIZE\n        Apply z-score so continuous metadata are \n        on the same scale [ Default: TRUE ]\n\n    -l PLOT_HEATMAP, --plot_heatmap=PLOT_HEATMAP\n        Generate a heatmap for the significant \n        associations [ Default: TRUE ]\n\n    -i HEATMAP_FIRST_N, --heatmap_first_n=HEATMAP_FIRST_N\n        In heatmap, plot top N features with significant \n        associations [ Default: TRUE ]\n\n    -o PLOT_SCATTER, --plot_scatter=PLOT_SCATTER\n        Generate scatter plots for the significant\n        associations [ Default: TRUE ]\n        \n    -g MAX_PNGS, --max_pngs=MAX_PNGS\n        The maximum number of scatter plots for signficant associations \n        to save as png files [ Default: 10 ]\n    \n    -O SAVE_SCATTER, --save_scatter=SAVE_SCATTER\n        Save all scatter plot ggplot objects\n        to an RData file [ Default: FALSE ]\n\n    -e CORES, --cores=CORES\n        The number of R processes to run in parallel\n        [ Default: 1 ]\n        \n    -j SAVE_MODELS --save_models=SAVE_MODELS\n        Return the full model outputs and save to an RData file\n        [ Default: FALSE ]\n    \n    -d REFERENCE, --reference=REFERENCE\n        The factor to use as a reference level for a categorical variable \n        provided as a string of 'variable,reference', semi-colon delimited for \n        multiple variables. Not required if metadata is passed as a factor or \n        for variables with less than two levels but can be set regardless.\n        [ Default: NA ] \n\n## Contributions ##\nThanks go to these wonderful people:\n\n- Nick Waters \u003cnickp60@gmail.com\u003e\n    * Design of the PR and attribution process\n\n## Troubleshooting ##\n\n1. Question: When I run from the command line I see the error ``Maaslin2.R: command not found``. How do I fix this? \n    * Answer: Provide the full path to the executable when running Maaslin2.R.\n2. Question: When I run as a function I see the error ``Error in library(Maaslin2): there is no package called 'Maaslin2'``. How do I fix this? \n    * Answer: Install the R package and then try loading the library again.\n3. Question: When I try to install the R package I see errors about dependencies not being installed. Why is this?\n    * Answer: Installing the R package will not automatically install the packages MaAsLin2 requires. Please install the dependencies and then install the MaAsLin2 R package.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbiobakery%2Fmaaslin2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbiobakery%2Fmaaslin2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbiobakery%2Fmaaslin2/lists"}