{"id":22303095,"url":"https://github.com/Al-Murphy/MungeSumstats","last_synced_at":"2025-07-29T03:33:41.179Z","repository":{"id":41460223,"uuid":"352932559","full_name":"Al-Murphy/MungeSumstats","owner":"Al-Murphy","description":"Rapid standardisation and quality control of GWAS or QTL summary statistics","archived":false,"fork":false,"pushed_at":"2025-07-14T12:51:43.000Z","size":9814,"stargazers_count":83,"open_issues_count":10,"forks_count":18,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-07-14T16:30:06.823Z","etag":null,"topics":["bioconductor-package","bioinformatics","database-api","genomics","gwas","qtl","r","r-package","standardisation","summary-statistics","vcf-files"],"latest_commit_sha":null,"homepage":"https://al-murphy.github.io/MungeSumstats/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Al-Murphy.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-03-30T08:49:37.000Z","updated_at":"2025-07-14T15:57:46.000Z","dependencies_parsed_at":"2024-08-01T13:03:49.725Z","dependency_job_id":"ed21caa2-d209-4299-b99e-2a15b42ec495","html_url":"https://github.com/Al-Murphy/MungeSumstats","commit_stats":{"total_commits":349,"total_committers":12,"mean_commits":"29.083333333333332","dds":"0.42979942693409745","last_synced_commit":"6b36d67ce8ae8de04f1149ab94f7e3ded47bb5f3"},"previous_names":["al-murphy/mungesumstats"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Al-Murphy/MungeSumstats","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Al-Murphy%2FMungeSumstats","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Al-Murphy%2FMungeSumstats/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Al-Murphy%2FMungeSumstats/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Al-Murphy%2FMungeSumstats/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Al-Murphy","download_url":"https://codeload.github.com/Al-Murphy/MungeSumstats/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Al-Murphy%2FMungeSumstats/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267623198,"owners_count":24117140,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-29T02:00:12.549Z","response_time":2574,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioconductor-package","bioinformatics","database-api","genomics","gwas","qtl","r","r-package","standardisation","summary-statistics","vcf-files"],"created_at":"2024-12-03T18:42:28.884Z","updated_at":"2025-07-29T03:33:41.170Z","avatar_url":"https://github.com/Al-Murphy.png","language":"R","funding_links":[],"categories":["Genomic data wrangling"],"sub_categories":["Mendelian randomization in _cis_"],"readme":"---\ntitle: \"`MungeSumstats`: Standardise the format of GWAS summary statistics\"\nauthor: \"\u003ch5\u003e\u003ci\u003eAuthors\u003c/i\u003e: Alan Murphy, Brian Schilder and Nathan Skene\u003c/h5\u003e\"\ndate: \"\u003ch5\u003e\u003ci\u003eUpdated\u003c/i\u003e: `r format(Sys.Date(), '%b-%d-%Y')`\u003c/h5\u003e\"\nbibliography: vignettes/MungeSumstats.bib\ncsl: vignettes/nature.csl\noutput: rmarkdown::github_document\nvignette: \u003e\n  %\\VignetteIndexEntry{MungeSumstats}\n  %\\VignetteEngine{knitr::rmarkdown}\n  %\\usepackage[utf8]{inputenc}    \n---\n\n\u003c!-- Readme.md is generated from Readme.Rmd. Please edit that file --\u003e\n\n```{r, echo=FALSE}\nknitr::opts_chunk$set(tidy = FALSE,\n                      warning = FALSE, \n                      message = FALSE)\n```\n\n\u003c!-- badges: start --\u003e\n`r badger::badge_bioc_release(color = \"black\")`\n`r badger::badge_github_version(color = \"black\")`\n`r badger::badge_last_commit(branch = \"master\")`\n`r badger::badge_bioc_download(by = \"total\", color = \"blue\")`\n`r badger::badge_license()`\n`r badger::badge_doi(doi = \"https://doi.org/10.1093/bioinformatics/btab665\", color=\"blue\")`\n\u003c!-- badges: end --\u003e\n\u003c!--`r badger::badge_github_actions(action = \"rworkflows\")`--\u003e\n\u003c!--``r badger::badge_codecov(branch = \"master\")` --\u003e\n\n# Introduction\n\nThe `MungeSumstats` package is designed to facilitate the standardisation of GWAS summary statistics. \n\n## Overview\n\nThe package is designed to handle the lack of standardisation of output files by the GWAS community. The [MRC IEU Open GWAS](https://gwas.mrcieu.ac.uk/) team have \nprovided full summary statistics for \u003e10k GWAS, which are API-accessible via the  [`ieugwasr`](https://mrcieu.github.io/ieugwasr/) and [`gwasvcf`](https://github.com/MRCIEU/gwasvcf) packages. But these GWAS are only standardised in the sense that they are VCF format, and can be \nfully standardised with `MungeSumstats`.\n\n`MungeSumstats` provides a framework to standardise the format for any GWAS summary statistics, including those in VCF format, enabling downstream integration and analysis. It addresses the most common discrepancies across summary statistic files, and offers a range of adjustable Quality Control (QC) steps.\n\n## Citation\n\nIf you use `MungeSumstats`, please cite the original authors of the GWAS \nas well as:  \n\n\u003e Alan E Murphy, Brian M Schilder, Nathan G Skene (2021) \nMungeSumstats: A Bioconductor package for the\nstandardisation and quality control of many GWAS summary\nstatistics. \n*Bioinformatics*, btab665, https://doi.org/10.1093/bioinformatics/btab665\n\n\n# Installing `MungeSumstats`\n\n`MungeSumstats` is available on \n[Bioconductor](https://bioconductor.org/packages/MungeSumstats). \nTo install `MungeSumstats` on Bioconductor run:\n\n```R\nif (!require(\"BiocManager\")) install.packages(\"BiocManager\")\n\nBiocManager::install(\"MungeSumstats\")\n```\n\nYou can then load the package and data package:\n\n```R\nlibrary(MungeSumstats)\n```\n\nNote that there is also a \n[docker image for MungeSumstats](https://hub.docker.com/r/neurogenomicslab/mungesumstats).\n\nNote that for a number of the checks implored by `MungeSumstats` a reference \ngenome is used. If your GWAS summary statistics file of interest relates to\n*GRCh38*, you will need to install `SNPlocs.Hsapiens.dbSNP155.GRCh38` and \n`BSgenome.Hsapiens.NCBI.GRCh38` from Bioconductor as follows:\n\n```R\nBiocManager::install(\"SNPlocs.Hsapiens.dbSNP155.GRCh38\")\nBiocManager::install(\"BSgenome.Hsapiens.NCBI.GRCh38\")\n```\n\nIf your GWAS summary statistics file of interest relates to *GRCh37*, you will \nneed to install `SNPlocs.Hsapiens.dbSNP155.GRCh37` and \n`BSgenome.Hsapiens.1000genomes.hs37d5` from Bioconductor as follows:\n\n```R\nBiocManager::install(\"SNPlocs.Hsapiens.dbSNP155.GRCh37\")\nBiocManager::install(\"BSgenome.Hsapiens.1000genomes.hs37d5\")\n```\n\nThese may take some time to install and are not included in the package as some \nusers may only need one of *GRCh37*/*GRCh38*. If you are unsure of the genome \nbuild, MungeSumstats can also infer this information from your data.\n\n# Getting started\n\nSee the [Getting started vignette website](https://al-murphy.github.io/MungeSumstats/articles/MungeSumstats.html)\nfor up-to-date instructions on usage.\n\nSee the [OpenGWAS vignette website](https://al-murphy.github.io/MungeSumstats/articles/OpenGWAS.html)\nfor information on how to use MungeSumstats to access, standardise and perform\nquality control on GWAS Summary Statistics from the MRC IEU [Open GWAS Project](https://gwas.mrcieu.ac.uk/).\n\n**NOTE** to authenticate, you need to generate a token from the OpenGWAS website. \nThe token behaves like a password, and it will be used to authorise the requests \nyou make to the OpenGWAS API. Here are the steps to generate the token and then \nhave `ieugwasr` automatically use it for your queries:\n  \n1. Login to https://api.opengwas.io/profile/\n2. Generate a new token\n3. Add `OPENGWAS_JWT=\u003ctoken\u003e` to your .Renviron file, thi can be edited in R by \nrunning `usethis::edit_r_environ()`\n4. Restart your R session\n5. To check that your token is being recognised, run \n`ieugwasr::get_opengwas_jwt()`. If it returns a long random string then you are \nauthenticated.\n6. To check that your token is working, run `ieugwasr::user()`. It will make a \nrequest to the API for your user information using your token. It should return \na list with your user information. If it returns an error, then your token is \nnot working.\n7. Make sure you have submitted use\n\nPlease read carefully through the [FAQ website](https://github.com/Al-Murphy/MungeSumstats/wiki/FAQ) \nfor an queries about running MungeSumstats. If you have any outside of this \nproblems please do file an [Issue](https://github.com/al-murphy/MungeSumstats/issues) \nhere on GitHub.\n\n# Future Enhancements\n\nThe `MungeSumstats` package aims to be able to handle the most common\nsummary statistic file formats including VCF. If your file can not be\nformatted by `MungeSumstats` feel free to report the [Issue](https://github.com/al-murphy/MungeSumstats/issues) \non GitHub along with your summary statistics file header. \n\nWe also encourage people to edit the code to resolve their particular issues \ntoo and are happy to incorporate these through pull requests on github. If your\nsummary statistic file headers are not recognised by `MungeSumstats` but \ncorrespond to one of \n\n```\nSNP, BP, CHR, A1, A2, P, Z, OR, BETA, LOG_ODDS, SIGNED_SUMSTAT, N, N_CAS, N_CON, \nNSTUDY, INFO or FRQ, \n```\n\nFeel free to update the `data(\"sumstatsColHeaders\")` following the \napproach in the *data.R* file and add your mapping. Then use a [Pull Request](https://github.com/al-murphy/MungeSumstats/pulls) on \nGitHub and we will incorporate this change into the package.\n\n# Contributors\n\nWe would like to acknowledge all those who have contributed to `MungeSumstats` \ndevelopment:\n\n * [Alan Murphy](https://github.com/Al-Murphy)\n * [Nathan Skene](https://github.com/NathanSkene)\n * [Brian Schilder](https://github.com/bschilder)\n * [Shea Andrews](https://github.com/sjfandrews)\n * [Jonathan Griffiths](https://github.com/jonathangriffiths)\n * [Kitty Murphy](https://github.com/KittyMurphy)\n * [Mykhaylo Malakhov](https://github.com/MykMal)\n * [Alasdair Warwick](https://github.com/rmgpanw)\n * [Ao Lu](https://github.com/leoarrow1)\n * [Sufyan Sualeman](https://github.com/sufyansuleman)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAl-Murphy%2FMungeSumstats","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FAl-Murphy%2FMungeSumstats","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAl-Murphy%2FMungeSumstats/lists"}