{"id":13858026,"url":"https://github.com/privefl/bigstatsr","last_synced_at":"2026-02-19T20:31:42.346Z","repository":{"id":14781132,"uuid":"68272648","full_name":"privefl/bigstatsr","owner":"privefl","description":"R package for statistical tools with big matrices stored on disk.","archived":false,"fork":false,"pushed_at":"2025-07-29T09:30:41.000Z","size":40209,"stargazers_count":180,"open_issues_count":8,"forks_count":29,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-12-09T00:29:02.403Z","etag":null,"topics":["big-data","large-matrices","memory-mapped-file","parallel-computing","r","r-package","statistical-methods"],"latest_commit_sha":null,"homepage":"https://privefl.github.io/bigstatsr/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/privefl.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2016-09-15T06:50:50.000Z","updated_at":"2025-12-06T06:46:16.000Z","dependencies_parsed_at":"2024-06-19T11:18:06.586Z","dependency_job_id":"493b024a-95db-477e-86ee-4c8e169d3412","html_url":"https://github.com/privefl/bigstatsr","commit_stats":{"total_commits":906,"total_committers":6,"mean_commits":151.0,"dds":0.04304635761589404,"last_synced_commit":"f307d297570d91127f118ce3bd4557a155057e84"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/privefl/bigstatsr","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/privefl%2Fbigstatsr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/privefl%2Fbigstatsr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/privefl%2Fbigstatsr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/privefl%2Fbigstatsr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/privefl","download_url":"https://codeload.github.com/privefl/bigstatsr/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/privefl%2Fbigstatsr/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29630829,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-19T18:02:07.722Z","status":"ssl_error","status_checked_at":"2026-02-19T18:01:46.144Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","large-matrices","memory-mapped-file","parallel-computing","r","r-package","statistical-methods"],"created_at":"2024-08-05T03:01:54.256Z","updated_at":"2026-02-19T20:31:42.322Z","avatar_url":"https://github.com/privefl.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"\u003c!-- badges: start --\u003e\n[![R build status](https://github.com/privefl/bigstatsr/workflows/R-CMD-check/badge.svg)](https://github.com/privefl/bigstatsr/actions)\n[![Codecov test coverage](https://codecov.io/gh/privefl/bigstatsr/branch/master/graph/badge.svg)](https://app.codecov.io/gh/privefl/bigstatsr?branch=master)\n[![CRAN status](https://www.r-pkg.org/badges/version/bigstatsr)](https://CRAN.R-project.org/package=bigstatsr)\n[![DOI](https://zenodo.org/badge/doi/10.1093/bioinformatics/bty185.svg)](https://doi.org/10.1093/bioinformatics/bty185)\n\u003c!-- badges: end --\u003e\n\n\n# bigstatsr\n\n\u003cimg src=\"https://raw.githubusercontent.com/privefl/bigstatsr/master/bigstatsr.png\" width=\"130\" align=\"right\"\u003e\n\nR package {bigstatsr} provides functions for fast statistical analysis of large-scale data encoded as matrices. The package can handle matrices that are too large to fit in memory thanks to memory-mapping to binary files on disk. This is very similar to the format `big.matrix` provided by [R package {bigmemory}](https://github.com/kaneplusplus/bigmemory), which is **no longer used** by this package (see [the corresponding vignette](https://privefl.github.io/bigstatsr/articles/bigstatsr-and-bigmemory.html)).\nAs inputs, package {bigstatsr} uses [Filebacked Big Matrices (FBM)](https://privefl.github.io/bigstatsr/reference/FBM-class.html).\n\n[**LIST OF FEATURES**](https://privefl.github.io/bigstatsr/reference/index.html)\n\n**Note that most of the algorithms of this package don't handle missing values.**\n\n\n## Installation\n\n```r\n# For the CRAN version\ninstall.packages(\"bigstatsr\")\n# For the latest version\nremotes::install_github(\"privefl/bigstatsr\")\n```\n\n## Small example\n\n```r\nlibrary(bigstatsr)\n\n# Create the data on disk\nX \u003c- FBM(5e3, 10e3, backingfile = \"test\")$save()\n# If you open a new session you can do\nX \u003c- big_attach(\"test.rds\")\n\n# Fill it by chunks with random values\nU \u003c- matrix(0, nrow(X), 5); U[] \u003c- rnorm(length(U))\nV \u003c- matrix(0, ncol(X), 5); V[] \u003c- rnorm(length(V))\nNCORES \u003c- nb_cores()\n# X = U V^T + E\nbig_apply(X, a.FUN = function(X, ind, U, V) {\n  X[, ind] \u003c- tcrossprod(U, V[ind, ]) + rnorm(nrow(X) * length(ind))\n  NULL  ## you don't want to return anything here\n}, a.combine = 'c', ncores = NCORES, U = U, V = V)\n# Check some values\nX[1:5, 1:5]\n\n# Compute first 10 PCs\nobj.svd \u003c- big_randomSVD(X, fun.scaling = big_scale(), \n                         k = 10, ncores = NCORES)\nplot(obj.svd)\n\n# Cleanup\nunlink(paste0(\"test\", c(\".bk\", \".rds\")))\n```\n\nLearn more with this \n[introduction to package {bigstatsr}](https://privefl.github.io/R-presentation/bigstatsr.html).\n\nIf you want to use Rcpp code, look at [this tutorial](https://privefl.github.io/R-presentation/Rcpp.html).\n\n\n## Some use cases\n\n### Parallelization\n\nPackage {bigstatsr} uses package {foreach} for its parallelization tasks. Learn more on parallelism with {foreach} with [this tutorial](https://privefl.github.io/blog/a-guide-to-parallelism-in-r/).\n\n- [Permute matrix columns in parallel](https://stackoverflow.com/q/48832010/6103040)\n\n- [Parallelized search until found](https://stackoverflow.com/q/49056271/6103040)\n\n### Large datasets\n\n- [Computing the null space of a big matrix](https://stackoverflow.com/questions/46253537/computing-the-null-space-of-a-bigmatrix-in-r/) (works if one dimension is not too large)\n\n- [Rowwise matrix multiplication](https://stackoverflow.com/q/48879643/6103040)\n\n- [Operating with a big.matrix](https://stackoverflow.com/q/42111876/6103040)\n\n\n## Bug report / Help\n\n[How to make a great R reproducible example?](https://stackoverflow.com/q/5963269/6103040)\n\nPlease open an issue if you find a bug.\n\nIf you want help using {bigstatsr}, please open an issue as well or post on Stack Overflow with the tag *bigstatsr*. \n\nI will always redirect you to GitHub issues if you email me, so that others can benefit from our discussion.\n\n\n## References\n\n- Privé, Florian, et al. \"Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnpr.\" Bioinformatics 34.16 (2018): 2781-2787.\n\n- Privé, Florian, Hugues Aschard, and Michael GB Blum. \"Efficient implementation of penalized regression for genetic risk prediction.\" Genetics 212.1 (2019): 65-74.\n\n\u003cbr\u003e\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprivefl%2Fbigstatsr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprivefl%2Fbigstatsr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprivefl%2Fbigstatsr/lists"}