{"id":32110815,"url":"https://github.com/abbvie-computationalgenomics/saigegds","last_synced_at":"2026-02-21T17:02:28.929Z","repository":{"id":80789872,"uuid":"199720416","full_name":"AbbVie-ComputationalGenomics/SAIGEgds","owner":"AbbVie-ComputationalGenomics","description":"Scalable Implementation of generalized mixed models using GDS files in Phenome-Wide Association Studies","archived":false,"fork":false,"pushed_at":"2023-03-30T23:13:52.000Z","size":2739,"stargazers_count":7,"open_issues_count":3,"forks_count":4,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-10-22T10:38:32.697Z","etag":null,"topics":["gds","gwas","mixed-model","phewas"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AbbVie-ComputationalGenomics.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-07-30T20:11:22.000Z","updated_at":"2023-09-06T19:28:58.000Z","dependencies_parsed_at":"2023-10-20T16:14:06.557Z","dependency_job_id":null,"html_url":"https://github.com/AbbVie-ComputationalGenomics/SAIGEgds","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AbbVie-ComputationalGenomics/SAIGEgds","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AbbVie-ComputationalGenomics%2FSAIGEgds","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AbbVie-ComputationalGenomics%2FSAIGEgds/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AbbVie-ComputationalGenomics%2FSAIGEgds/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AbbVie-ComputationalGenomics%2FSAIGEgds/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AbbVie-ComputationalGenomics","download_url":"https://codeload.github.com/AbbVie-ComputationalGenomics/SAIGEgds/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AbbVie-ComputationalGenomics%2FSAIGEgds/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29688216,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-21T15:51:39.154Z","status":"ssl_error","status_checked_at":"2026-02-21T15:49:03.425Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gds","gwas","mixed-model","phewas"],"created_at":"2025-10-20T14:04:03.495Z","updated_at":"2026-02-21T17:02:28.923Z","avatar_url":"https://github.com/AbbVie-ComputationalGenomics.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"SAIGEgds: Scalable Implementation of Generalized mixed models in PheWAS using GDS files\n====\n\n![GPLv3](http://www.gnu.org/graphics/gplv3-88x31.png)\n[GNU General Public License, GPLv3](http://www.gnu.org/copyleft/gpl.html)\n\n\n## Features\n\nScalable implementation of generalized mixed mode with the support of Genomic Data Structure ([GDS](https://github.com/zhengxwen/SeqArray)) files and highly optimized C++ implementation. It is designed for single variant tests in large-scale phenome-wide association studies (PheWAS) with millions of variants and hundreds of thousands of samples (e.g., [UK Biobank genotype data](https://www.ukbiobank.ac.uk/scientists-3/genetic-data)), controlling for case-control imbalance and sample structure in single variant association studies.\n\nThe implementation of SAIGEgds is based on the original [SAIGE](https://github.com/weizhouUMICH/SAIGE) R package (v0.29.4.4) [Zhou et al. 2018]. It is implemented with optimized C++ codes taking advantage of sparse structure of genotypes. All of the calculation with single-precision floating-point numbers in [SAIGE](https://github.com/weizhouUMICH/SAIGE) are replaced by the double-precision calculation in SAIGEgds. SAIGEgds also implements some of the [SPAtest](https://cran.r-project.org/web/packages/SPAtest/index.html) functions in C to speed up the calculation of Saddlepoint approximation.\n\nBenchmarks using the UK Biobank White British genotype data (N=430K) with coronary heart disease and simulated cases, show that SAIGEgds is 5 to 6 times faster than the SAIGE R package in the steps of fitting null models and p-value calculations. When used in conjunction with high-performance computing (HPC) clusters and/or cloud resources, SAIGEgds provides an efficient analysis pipeline for biobank-scale PheWAS.\n\n\n## Bioconductor:\n\nRelease Version: v1.12.1 ([http://www.bioconductor.org/packages/SAIGEgds](http://www.bioconductor.org/packages/SAIGEgds))\n\n* [Help Documents](https://rdrr.io/bioc/SAIGEgds/man)\n* [Tutorial](http://www.bioconductor.org/packages/release/bioc/vignettes/SAIGEgds/inst/doc/SAIGEgds.html)\n* [News](http://www.bioconductor.org/packages/release/bioc/news/SAIGEgds/NEWS)\n\n\n## Package Maintainer\n\n[Xiuwen Zheng](xiuwen.zheng@abbvie.com)\n\n\n## Installation\n\n* Requires R (≥ v3.5.0), [gdsfmt](http://www.bioconductor.org/packages/gdsfmt) (≥ v1.20.0), [SeqArray](http://www.bioconductor.org/packages/SeqArray) (≥ v1.32.0)\n\n* Recommend [GNU GCC (≥ v6.0)](https://gcc.gnu.org), requiring C++11\n\n* Bioconductor repository\n```R\nif (!requireNamespace(\"BiocManager\", quietly=TRUE))\n    install.packages(\"BiocManager\")\nBiocManager::install(\"SAIGEgds\")\n```\nThe `BiocManager::install()` approach may require that you build from source, i.e. `make` and compilers must be installed on your system -- see the [R FAQ](http://cran.r-project.org/faqs.html) for your operating system; you may also need to install dependencies manually.\n\n* Development version from Github (for developers/testers only)\n```R\nlibrary(\"devtools\")\ninstall_github(\"AbbVie-ComputationalGenomics/SAIGEgds\")\n```\n\n\n## Package vignette\n\nIf the package is installed from Bioconductor repository or package rebuilding, users can start R and enter to view documentation:\n```R\nbrowseVignettes(\"SAIGEgds\")\n```\n\n\n## Examples\n\n```R\nlibrary(SeqArray)\nlibrary(SAIGEgds)\n\n# open the GDS file for genetic relationship matrix (GRM)\ngrm_fn \u003c- system.file(\"extdata\", \"grm1k_10k_snp.gds\", package=\"SAIGEgds\")\n(grm_gds \u003c- seqOpen(grm_fn))\n\n# load phenotype\nphenofn \u003c- system.file(\"extdata\", \"pheno.txt.gz\", package=\"SAIGEgds\")\npheno \u003c- read.table(phenofn, header=TRUE, as.is=TRUE)\nhead(pheno)\n##   sample.id y     yy      x1 x2\n## 1        s1 0 4.5542  1.5118  1\n## 2        s2 0 3.7941  0.3898  1\n## 3        s3 0 5.0411 -0.6212  1\n## ...\n\n# fit the null model\nglmm \u003c- seqFitNullGLMM_SPA(y ~ x1 + x2, pheno, grm_gds, trait.type=\"binary\",\n    sample.col=\"sample.id\", num.thread=2)\n## SAIGE association analysis:\n## Filtering variants:\n## [==================================================] 100%, completed, 0s\n## Fit the null model: y ~ x1 + x2 + var(GRM)\n##     # of samples: 1,000\n##     # of variants: 9,976\n##     using 2 threads\n## ...\n\n# close the file\nseqClose(grm_gds)\n\n\n\n################################\n\n# open the GDS file for association testing\ngeno_fn \u003c- system.file(\"extdata\", \"assoc_100snp.gds\", package=\"SAIGEgds\")\n(geno_gds \u003c- seqOpen(geno_fn))\n## File: assoc_100snp.gds (10.5K)\n## +    [  ] *\n## |--+ description   [  ] *\n## |--+ sample.id   { Str8 1000 LZMA_ra(12.6%), 625B }\n## |--+ variant.id   { Int32 100 LZMA_ra(48.5%), 201B } *\n## ...\n\n\n# p-value calculation\nassoc \u003c- seqAssocGLMM_SPA(geno_gds, glmm, mac=10, parallel=2)\n## SAIGE association analysis:\n##     # of samples: 1,000\n##     # of variants: 100\n##     MAF threshold: NaN\n##     MAC threshold: 10\n##     missing threshold for variants: 0.1\n##     p-value threshold for SPA adjustment: 0.05\n##     variance ratio for approximation: 0.9391186\n##     # of processes: 2\n## [==================================================] 100%, completed, 0s\n## # of variants after filtering by MAF, MAC and missing thresholds: 38\n## Done.\n\nhead(assoc)\n##   id chr pos rs.id ref alt AF.alt mac  num       beta        SE      pval pval.noadj converged\n## 1  4   1   4   rs4   A   C 0.0100  20 1000  -0.074992  0.791685  0.924533   0.924533      TRUE\n## 2 12   1  12  rs12   A   C 0.0150  30 1000  -0.091001  0.657140  0.889861   0.889861      TRUE\n## 3 14   1  14  rs14   A   C 0.0375  75 1000  -0.075455  0.434152  0.862023   0.862023      TRUE\n## ...\n\n\n# close the file\nseqClose(geno_gds)\n```\n\n\n## Citations\n\nZheng X, Davis J.Wade. SAIGEgds -- an efficient statistical tool for large-scale PheWAS with mixed models. *Bioinformatics* (2020). [DOI: 10.1093/bioinformatics/btaa731](http://dx.doi.org/10.1093/bioinformatics/btaa731).\n\nZhou W, Nielsen JB, Fritsche LG, Dey R, Gabrielsen ME, Wolford BN, LeFaive J, VandeHaar P, Gagliano SA, Gifford A, Bastarache LA, Wei WQ, Denny JC, Lin M, Hveem K, Kang HM, Abecasis GR, Willer CJ, Lee S. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. *Nat Genet* (2018). Sep;50(9):1335-1341. [DOI: 10.1038/s41588-018-0184-y](https://www.nature.com/articles/s41588-018-0184-y).\n\nZheng X, Gogarten S, Lawrence M, Stilp A, Conomos M, Weir BS, Laurie C, Levine D. SeqArray -- A storage-efficient high-performance data format for WGS variant calls. *Bioinformatics* (2017). [DOI: 10.1093/bioinformatics/btx145](http://dx.doi.org/10.1093/bioinformatics/btx145).\n\n\n## See Also\n\n[SeqArray](https://www.bioconductor.org/packages/SeqArray): Data management of large-scale whole-genome sequence variant calls\n\n[gds2bgen](https://github.com/zhengxwen/gds2bgen): Format conversion from bgen to gds\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabbvie-computationalgenomics%2Fsaigegds","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabbvie-computationalgenomics%2Fsaigegds","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabbvie-computationalgenomics%2Fsaigegds/lists"}