{"id":22303098,"url":"https://github.com/Ararder/tidyGWAS","last_synced_at":"2025-07-29T03:33:45.236Z","repository":{"id":185053376,"uuid":"672908506","full_name":"Ararder/tidyGWAS","owner":"Ararder","description":"Clean and identify errors in GWAS summary statistics","archived":false,"fork":false,"pushed_at":"2025-07-11T17:57:14.000Z","size":38704,"stargazers_count":9,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-11T19:34:38.455Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://ararder.github.io/tidyGWAS/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Ararder.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-07-31T12:45:05.000Z","updated_at":"2025-07-11T17:54:59.000Z","dependencies_parsed_at":"2024-09-11T20:37:56.354Z","dependency_job_id":"f36ab4d2-7e02-4523-9162-c4d8ea023914","html_url":"https://github.com/Ararder/tidyGWAS","commit_stats":null,"previous_names":["ararder/tidygwas"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Ararder/tidyGWAS","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ararder%2FtidyGWAS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ararder%2FtidyGWAS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ararder%2FtidyGWAS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ararder%2FtidyGWAS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Ararder","download_url":"https://codeload.github.com/Ararder/tidyGWAS/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ararder%2FtidyGWAS/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267623217,"owners_count":24117146,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-29T02:00:12.549Z","response_time":2574,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-03T18:42:28.937Z","updated_at":"2025-07-29T03:33:43.992Z","avatar_url":"https://github.com/Ararder.png","language":"R","funding_links":[],"categories":["Genomic data wrangling"],"sub_categories":["Mendelian randomization in _cis_"],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# tidyGWAS\n\n\u003c!-- badges: start --\u003e\n\n[![Codecov test coverage](https://codecov.io/gh/Ararder/tidyGWAS/branch/main/graph/badge.svg)](https://app.codecov.io/gh/Ararder/tidyGWAS?branch=main)\n\u003c!-- badges: end --\u003e\n\n\nGenome-wide summary statistics are becoming a staple in many different genetics and genomcis\nanalysis pipelines. Often, the specific filters suggested for pipelines can be different,\nrequiring each pipeline to have a step where summary statistics are \"munged\".\n\ntidyGWAS aims to provide a standardized format *before* before any pipeline specific munging is done.\nWith that in mind, tidyGWAS is conservative in removing rows, and \nby default keeps both indels and multi-allelic variants.\n\n`tidyGWAS` does the following: \n\n1.  Detection of duplicated rows (based on RSID_REF_ALT or CHR_POS_REF_ALT)\n\n2.  Standardized column names\n\n3.  Automatic updating of [merged](https://www.ncbi.nlm.nih.gov/books/NBK573473/) RSIDs\n\n4.  Detection and optional removal of deletions/insertions (\"indels\")\n\n5.  Detection of non rsID values in RSID column, and automatic parsing of the common CHR:POS or CHR:POS:REF:ALT format\n\n6.  Standardization of CHR values (ex: \"23\" -\\\u003e \"X\", \"chr1\" -\\\u003e \"1\")\n\n7.  Validation of standard GWAS columns, B, SE, P, N, FREQ, Z, CaseN, ControlN, A1, A2\n\n    1.  Extremely small pvalues are by default converted to 2.225074e-308 (minimum pvalue in R)\n\n8.  Imputation of missing columns: RSID from CHR:POS or CHR:POS from RSID. Any of B,SE, P, Z if missing and possible\n\n9.  Validation of CHR:POS:RSID by matching with dbSNP v.155\n\n10. Cleaned sumstats are provided with coordinates on both GRCh37 and GRCh38, with TRUE/FALSE flags for indels and variants that are multi-allelic in the dataset\n\nFrom working with standardized GWAS formats, we've found that having both GRCh37 and GRCh38 coordinates, and standardized column names significantly speeds up downstream analysis.\n\nThe computationally intensive part of aligning summary statistics with dbSNP 155 (\\\u003e 940 million rows) for both GRCh37 and GRCh38 (in total 1.8 billion rows) is implemented using the [Apache Arrow R](https://arrow.apache.org/docs/r/) implementation, allowing for the full function to run in \\\u003c5 minutes, using less than 16gb, with \\~7 million rows on a Macbook Pro M2.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FArarder%2FtidyGWAS","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FArarder%2FtidyGWAS","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FArarder%2FtidyGWAS/lists"}