{"id":13857961,"url":"https://github.com/privefl/bigreadr","last_synced_at":"2026-02-21T06:33:08.630Z","repository":{"id":56935071,"uuid":"141929348","full_name":"privefl/bigreadr","owner":"privefl","description":"R package to read large text files based on splitting + data.table::fread","archived":false,"fork":false,"pushed_at":"2022-12-06T14:47:23.000Z","size":276,"stargazers_count":43,"open_issues_count":1,"forks_count":5,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-12-09T15:53:23.578Z","etag":null,"topics":["large-dataset","r-package","read-csv"],"latest_commit_sha":null,"homepage":"https://privefl.github.io/bigreadr/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/privefl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-07-22T20:44:47.000Z","updated_at":"2025-10-06T09:25:02.000Z","dependencies_parsed_at":"2023-01-24T02:01:29.614Z","dependency_job_id":null,"html_url":"https://github.com/privefl/bigreadr","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/privefl/bigreadr","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/privefl%2Fbigreadr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/privefl%2Fbigreadr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/privefl%2Fbigreadr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/privefl%2Fbigreadr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/privefl","download_url":"https://codeload.github.com/privefl/bigreadr/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/privefl%2Fbigreadr/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29675471,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-21T06:23:40.028Z","status":"ssl_error","status_checked_at":"2026-02-21T06:23:39.222Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["large-dataset","r-package","read-csv"],"created_at":"2024-08-05T03:01:52.075Z","updated_at":"2026-02-21T06:33:08.609Z","avatar_url":"https://github.com/privefl.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"\u003c!-- badges: start --\u003e\n[![R-CMD-check](https://github.com/privefl/bigreadr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/privefl/bigreadr/actions/workflows/R-CMD-check.yaml)\n[![CRAN status](https://www.r-pkg.org/badges/version/bigreadr)](https://cran.r-project.org/package=bigreadr)\n[![Codecov test coverage](https://codecov.io/gh/privefl/bigreadr/branch/master/graph/badge.svg)](https://app.codecov.io/gh/privefl/bigreadr?branch=master)\n\u003c!-- badges: end --\u003e\n\n\n# R package {bigreadr}\n\nRead large text files based on splitting + `data.table::fread`\n\n\n## Example\n\n```r\n# remotes::install_github(\"privefl/bigreadr\")\nlibrary(bigreadr)\n\n# Create a temporary file of ~141 MB (just as an example)\ncsv \u003c- fwrite2(iris[rep(seq_len(nrow(iris)), 1e4), rep(1:5, 4)], tempfile())\nformat(file.size(csv), big.mark = \",\")\n\n## Splitting lines (1)\n# Read (by parts) all data -\u003e using `fread` would be faster\nnlines(csv)  ## 1M5 lines -\u003e split every 500,000\nbig_iris1 \u003c- big_fread1(csv, every_nlines = 5e5)\n# Read and subset (by parts)\nbig_iris1_setosa \u003c- big_fread1(csv, every_nlines = 5e5, .transform = function(df) {\n  dplyr::filter(df, Species == \"setosa\")\n})\n\n## Splitting columns (2)\nbig_iris2 \u003c- big_fread2(csv, nb_parts = 3)\n# Read and subset (by parts)\nspecies_setosa \u003c- (fread2(csv, select = 5)[[1]] == \"setosa\")\nbig_iris2_setosa \u003c- big_fread2(csv, nb_parts = 3, .transform = function(df) {\n  dplyr::filter(df, species_setosa)\n})\n\n## Verification\nidentical(big_iris1_setosa, dplyr::filter(big_iris1, Species == \"setosa\"))\nidentical(big_iris2, big_iris1)\nidentical(big_iris2_setosa, big_iris1_setosa)\n```\n\n## Use cases\n\nPlease send me your use cases!\n\n- [Convert a CSV to SQLite by parts](https://privefl.github.io/bigreadr/articles/csv2sqlite.html)\n\n- [Read a text file as a disk.frame](https://diskframe.com/articles/ingesting-data.html)\n\n- [Read a text file as a Filebacked Big Matrix](https://privefl.github.io/bigstatsr/reference/big_read.html)\n\n- [Read a text file as a Filebacked Data Frame](https://privefl.github.io/bigdfr/reference/FDF_read.html)\n\n- Read multiple files at once using `bigreadr::fread2()`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprivefl%2Fbigreadr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprivefl%2Fbigreadr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprivefl%2Fbigreadr/lists"}