{"id":14067212,"url":"https://github.com/privefl/bigsnpr","last_synced_at":"2026-02-18T22:36:50.516Z","repository":{"id":38326166,"uuid":"62644144","full_name":"privefl/bigsnpr","owner":"privefl","description":"R package for the analysis of massive SNP arrays.","archived":false,"fork":false,"pushed_at":"2025-08-21T06:39:06.000Z","size":114602,"stargazers_count":218,"open_issues_count":21,"forks_count":45,"subscribers_count":8,"default_branch":"master","last_synced_at":"2026-01-26T17:58:29.879Z","etag":null,"topics":["big-data","bioinformatics","memory-mapped-file","parallel-computing","polygenic-scores","population-structure-inference","r","r-package","snp-data","statistical-methods"],"latest_commit_sha":null,"homepage":"https://privefl.github.io/bigsnpr/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/privefl.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2016-07-05T14:36:34.000Z","updated_at":"2025-12-31T05:33:16.000Z","dependencies_parsed_at":"2023-02-13T21:46:16.014Z","dependency_job_id":"6877c1b1-d326-4bb9-8ae3-94a130325370","html_url":"https://github.com/privefl/bigsnpr","commit_stats":{"total_commits":1047,"total_committers":13,"mean_commits":80.53846153846153,"dds":0.06399235912129897,"last_synced_commit":"5ad76449c51ce34d6422170a47bd4889d6299aba"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/privefl/bigsnpr","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/privefl%2Fbigsnpr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/privefl%2Fbigsnpr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/privefl%2Fbigsnpr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/privefl%2Fbigsnpr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/privefl","download_url":"https://codeload.github.com/privefl/bigsnpr/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/privefl%2Fbigsnpr/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29597270,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-18T22:25:43.180Z","status":"ssl_error","status_checked_at":"2026-02-18T22:25:42.766Z","response_time":162,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","bioinformatics","memory-mapped-file","parallel-computing","polygenic-scores","population-structure-inference","r","r-package","snp-data","statistical-methods"],"created_at":"2024-08-13T07:05:29.026Z","updated_at":"2026-02-18T22:36:50.493Z","avatar_url":"https://github.com/privefl.png","language":"R","funding_links":[],"categories":["R","Genomic data wrangling"],"sub_categories":["Mendelian randomization in _cis_"],"readme":"\u003c!-- badges: start --\u003e\n[![R build status](https://github.com/privefl/bigsnpr/workflows/R-CMD-check/badge.svg)](https://github.com/privefl/bigsnpr/actions)\n[![Codecov test coverage](https://codecov.io/gh/privefl/bigsnpr/branch/master/graph/badge.svg)](https://app.codecov.io/gh/privefl/bigsnpr?branch=master)\n[![CRAN status](https://www.r-pkg.org/badges/version/bigsnpr)](https://CRAN.R-project.org/package=bigsnpr)\n[![DOI](https://zenodo.org/badge/doi/10.1093/bioinformatics/bty185.svg)](http://dx.doi.org/10.1093/bioinformatics/bty185)\n\u003c!-- badges: end --\u003e\n \n \n# bigsnpr\n\n{bigsnpr} is an R package for the analysis of massive SNP arrays, primarily designed for human genetics. It enhances the features of [package {bigstatsr}](https://privefl.github.io/bigstatsr/) for the purpose of analyzing genotype data.\n\nTo get you started:\n\n- [Quick demo](https://privefl.github.io/bigsnpr/articles/demo.html)\n\n- List of functions [from bigsnpr](https://privefl.github.io/bigsnpr/reference/index.html) and [from bigstatsr](https://privefl.github.io/bigstatsr/reference/index.html)\n\n- [Extended documentation with more examples](https://privefl.github.io/bigsnpr-extdoc/) + [course recording](https://youtu.be/7VxBT5A_AcA)\n\n\n## Installation\n\nIn R, run\n\n```r\n# install.packages(\"remotes\")\nremotes::install_github(\"privefl/bigsnpr\")\n```\n\nor for the CRAN version\n\n```r\ninstall.packages(\"bigsnpr\")\n```\n\n\n## Input formats\n\nThis package reads *bed*/*bim*/*fam* files (PLINK preferred format) using functions `snp_readBed()` and `snp_readBed2()`. Before reading into this package's special format, quality control and conversion can be done using PLINK, which can be called directly from R using `snp_plinkQC()` and `snp_plinkKINGQC()`.\n\nThis package can also read **UK Biobank BGEN files** using function `snp_readBGEN()`. This function takes around 40 minutes to read 1M variants for 400K individuals using 15 cores.\n\nThis package uses a class called `bigSNP` for representing SNP data. A `bigSNP` object is a list with some elements:\n\n- `$genotypes`: A [`FBM.code256`](https://privefl.github.io/bigstatsr/reference/FBM.code256-class.html). Rows are samples and columns are variants. This stores genotype calls or **dosages** (rounded to 2 decimal places).\n- `$fam`: A `data.frame` with some information on the individuals.\n- `$map`: A `data.frame` with some information on the variants.\n\n**Note that most of the algorithms of this package don't handle missing values.** You can use `snp_fastImpute()` (taking a few hours for a chip of 15K x 300K) and `snp_fastImputeSimple()` (taking a few minutes only) to impute missing values of *genotyped* variants.\n\nPackage {bigsnpr} also provides functions that directly work on bed files with a few missing values (the `bed_*()` functions). See paper [\"Efficient toolkit implementing..\"](https://doi.org/10.1093/bioinformatics/btaa520).\n\n\n## Polygenic scores\n\nPolygenic scores are one of the main focus of this package. There are 3 main methods currently available:\n\n- Penalized regressions with individual-level data (see [paper](https://doi.org/10.1534/genetics.119.302019) and [tutorial](https://privefl.github.io/bigstatsr/articles/penalized-regressions.html))\n\n- Clumping and Thresholding (C+T) and Stacked C+T (SCT) with summary statistics and individual level data (see [paper](https://doi.org/10.1016/j.ajhg.2019.11.001) and [tutorial](https://privefl.github.io/bigsnpr/articles/SCT.html)).\n\n- LDpred2 with summary statistics (see [paper](https://doi.org/10.1093/bioinformatics/btaa1029) and [tutorial](https://privefl.github.io/bigsnpr/articles/LDpred2.html)), and lassosum2\n\n\n## Possible upcoming features\n\n- Multiple imputation for GWAS (https://doi.org/10.1371/journal.pgen.1006091).\n\n- More interactive (visual) QC.\n\nYou can request some feature by opening an issue.\n\n\n## Bug report / Support\n\n[How to make a great R reproducible example?](https://stackoverflow.com/q/5963269/6103040)\n\nPlease open an issue if you find a bug.\n\nIf you want help using {bigstatsr} (the `big_*()` functions), please open an issue on [{bigstatsr}'s repo](https://github.com/privefl/bigstatsr/issues), or post on Stack Overflow with the tag *bigstatsr*.\n\nI will always redirect you to GitHub issues if you email me, so that others can benefit from our discussion.\n\n\n## References\n\n- Privé, Florian, et al. [\"Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnpr.\"](https://doi.org/10.1093/bioinformatics/bty185) *Bioinformatics* 34.16 (2018): 2781-2787.\n\n- Privé, Florian, et al. [\"Efficient implementation of penalized regression for genetic risk prediction.\"](https://doi.org/10.1534/genetics.119.302019) *Genetics* 212.1 (2019): 65-74.\n\n- Privé, Florian, et al. [\"Making the most of Clumping and Thresholding for polygenic scores.\"](https://doi.org/10.1016/j.ajhg.2019.11.001) *The American Journal of Human Genetics* 105.6 (2019): 1213-1221.\n\n- Privé, Florian, et al. [\"Efficient toolkit implementing best practices for principal component analysis of population genetic data.\"](https://doi.org/10.1093/bioinformatics/btaa520) *Bioinformatics* 36.16 (2020): 4449-4457.\n\n- Privé, Florian, et al. [\"LDpred2: better, faster, stronger.\"](https://doi.org/10.1093/bioinformatics/btaa1029) *Bioinformatics* 36.22-23 (2020): 5424-5431.\n\n- Privé, Florian. [\"Optimal linkage disequilibrium splitting.\"](https://doi.org/10.1093/bioinformatics/btab519) *Bioinformatics* 38.1 (2022): 255–256.\n\n- Privé, Florian. [\"Using the UK Biobank as a global reference of worldwide populations: application to measuring ancestry diversity from GWAS summary statistics.\"](https://doi.org/10.1093/bioinformatics/btac348) *Bioinformatics* 38.13 (2022): 3477-3480.\n\n- Privé, Florian, et al. [\"Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores.\"](https://doi.org/10.1016/j.xhgg.2022.100136) *Human Genetics and Genomics Advances* 3.4 (2022).\n\n- Privé, Florian, et al. [Inferring disease architecture and predictive ability with LDpred2-auto](https://doi.org/10.1101/2022.10.10.511629). *The American Journal of Human Genetics* 110.12 (2023): 2042-2055.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprivefl%2Fbigsnpr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprivefl%2Fbigsnpr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprivefl%2Fbigsnpr/lists"}