{"id":13398617,"url":"https://github.com/tidyverse/vroom","last_synced_at":"2025-12-12T01:04:51.743Z","repository":{"id":37908442,"uuid":"161399255","full_name":"tidyverse/vroom","owner":"tidyverse","description":"Fast reading of delimited files","archived":false,"fork":false,"pushed_at":"2024-08-24T00:03:22.000Z","size":22349,"stargazers_count":626,"open_issues_count":74,"forks_count":62,"subscribers_count":17,"default_branch":"main","last_synced_at":"2025-03-31T15:18:54.930Z","etag":null,"topics":["csv","csv-parser","fixed-width-text","r","tsv","tsv-parser"],"latest_commit_sha":null,"homepage":"https://vroom.r-lib.org","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tidyverse.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-12-11T22:00:39.000Z","updated_at":"2025-03-29T20:21:32.000Z","dependencies_parsed_at":"2024-01-24T05:47:45.815Z","dependency_job_id":"5738cb8c-c909-4d7d-8d29-4535d013023f","html_url":"https://github.com/tidyverse/vroom","commit_stats":{"total_commits":1260,"total_committers":27,"mean_commits":"46.666666666666664","dds":"0.13968253968253963","last_synced_commit":"73c90c4fe490c0588b20ac527c40fcb1c683683e"},"previous_names":["r-lib/vroom"],"tags_count":23,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidyverse%2Fvroom","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidyverse%2Fvroom/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidyverse%2Fvroom/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidyverse%2Fvroom/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tidyverse","download_url":"https://codeload.github.com/tidyverse/vroom/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248154995,"owners_count":21056542,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","csv-parser","fixed-width-text","r","tsv","tsv-parser"],"created_at":"2024-07-30T19:00:29.363Z","updated_at":"2025-12-12T01:04:51.671Z","avatar_url":"https://github.com/tidyverse.png","language":"C++","funding_links":[],"categories":["C++"],"sub_categories":[],"readme":"---\noutput:\n  github_document:\n    html_preview: false\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\noptions(tibble.print_min = 3)\n```\n\n# 🏎💨vroom \u003ca href='https:/vroom.r-lib.org'\u003e\u003cimg src='man/figures/logo.png' align=\"right\" height=\"135\" /\u003e\u003c/a\u003e\n\n\u003c!-- badges: start --\u003e\n[![R-CMD-check](https://github.com/tidyverse/vroom/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidyverse/vroom/actions/workflows/R-CMD-check.yaml)\n[![Codecov test coverage](https://codecov.io/gh/tidyverse/vroom/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/vroom?branch=main)\n[![CRAN status](https://www.r-pkg.org/badges/version/vroom)](https://cran.r-project.org/package=vroom)\n[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)\n\u003c!-- badges: end --\u003e\n\n```{r echo = FALSE, message = FALSE}\ntm \u003c- vroom::vroom(system.file(\"bench\", \"taxi.tsv\", package = \"vroom\"))\nversions \u003c- vroom::vroom(system.file(\"bench\", \"session_info.tsv\", package = \"vroom\"))\n\n# Use the base version number for read.delim\nversions$package[versions$package == \"base\"] \u003c- \"read.delim\"\n\nlibrary(dplyr)\ntbl \u003c- tm %\u003e% filter(type == \"real\", op == \"read\", reading_package %in% c(\"data.table\", \"readr\", \"read.delim\") | manip_package == \"base\") %\u003e%\n  rename(package = reading_package) %\u003e%\n  left_join(versions) %\u003e%\n  transmute(\n    package = package,\n    version = ondiskversion,\n    \"time (sec)\" = time,\n    speedup = max(time) / time,\n    \"throughput\" = paste0(prettyunits::pretty_bytes(size / time), \"/sec\")\n  ) %\u003e%\n  arrange(desc(speedup))\n```\n\nThe fastest delimited reader for R, **`r filter(tbl, package == \"vroom\") %\u003e% pull(\"throughput\") %\u003e% trimws()`**.\n\n\u003cimg src=\"https://raw.githubusercontent.com/tidyverse/vroom/main/img/taylor.gif\" align=\"right\" /\u003e\n\nBut that's impossible! How can it be [so fast](https://vroom.r-lib.org/articles/benchmarks.html)?\n\nvroom doesn't stop to actually _read_ all of your data, it simply indexes where each record is located so it can be read later.\nThe vectors returned use the [Altrep framework](https://svn.r-project.org/R/branches/ALTREP/ALTREP.html) to lazily load the data on-demand when it is accessed, so you only pay for what you use.\nThis lazy access is done automatically, so no changes to your R data-manipulation code are needed.\n\nvroom also uses multiple threads for indexing, materializing non-character columns, and when writing to further improve performance.\n\n```{r, echo = FALSE}\nknitr::kable(tbl, digits = 2, align = \"lrrrr\")\n```\n\n## Features\n\nvroom has nearly all of the parsing features of\n[readr](https://readr.tidyverse.org) for delimited and fixed width files, including\n\n- delimiter guessing\\*\n- custom delimiters (including multi-byte\\* and Unicode\\* delimiters)\n- specification of column types (including type guessing)\n  - numeric types (double, integer, big integer\\*, number)\n  - logical types\n  - datetime types (datetime, date, time)\n  - categorical types (characters, factors)\n- column selection, like `dplyr::select()`\\*\n- skipping headers, comments and blank lines\n- quoted fields\n- double and backslashed escapes\n- whitespace trimming\n- windows newlines\n- [reading from multiple files or connections\\*](#reading-multiple-files)\n- embedded newlines in headers and fields\\*\\*\n- writing delimited files with as-needed quoting.\n- robust to invalid inputs (vroom has been extensively tested with the\n  [afl](https://lcamtuf.coredump.cx/afl/) fuzz tester)\\*.\n\n\\* *these are additional features not in readr.*\n\n\\*\\* *requires `num_threads = 1`.*\n\n## Installation\n\nInstall vroom from CRAN with:\n\n```r\ninstall.packages(\"vroom\")\n```\n\nAlternatively, if you need the development version from\n[GitHub](https://github.com/) install it with:\n\n``` r\n# install.packages(\"pak\")\npak::pak(\"tidyverse/vroom\")\n```\n## Usage\n\nSee [getting started](https://vroom.r-lib.org/articles/vroom.html)\nto jump start your use of vroom!\n\nvroom uses the same interface as readr to specify column types.\n\n```{r, include = FALSE}\ntibble::rownames_to_column(mtcars, \"model\") %\u003e%\n  vroom::vroom_write(\"mtcars.tsv\", delim = \"\\t\")\n```\n\n```{r example}\nvroom::vroom(\"mtcars.tsv\",\n  col_types = list(cyl = \"i\", gear = \"f\",hp = \"i\", disp = \"_\",\n                   drat = \"_\", vs = \"l\", am = \"l\", carb = \"i\")\n)\n```\n\n```{r, include = FALSE}\nunlink(\"mtcars.tsv\")\n```\n\n## Reading multiple files\n\nvroom natively supports reading from multiple files (or even multiple\nconnections!).\n\nFirst we generate some files to read by splitting the nycflights dataset by\nairline.\nFor the sake of the example, we'll just take the first 2 lines of each file.\n```{r}\nlibrary(nycflights13)\npurrr::iwalk(\n  split(flights, flights$carrier),\n  ~ { .x$carrier[[1]]; vroom::vroom_write(head(.x, 2), glue::glue(\"flights_{.y}.tsv\"), delim = \"\\t\") }\n)\n```\n\nThen we can efficiently read them into one tibble by passing the filenames directly to vroom.\nThe `id` argument can be used to request a column that reveals the filename that each row originated from.\n\n```{r}\nfiles \u003c- fs::dir_ls(glob = \"flights*tsv\")\nfiles\nvroom::vroom(files, id = \"source\")\n```\n\n```{r, include = FALSE}\nfs::file_delete(files)\n```\n\n## Learning more\n\n- [Getting started with vroom](https://vroom.r-lib.org/articles/vroom.html)\n- [📽 vroom: Because Life is too short to read slow](https://www.youtube.com/watch?v=RA9AjqZXxMU\u0026t=10s) - Presentation at UseR!2019 ([slides](https://speakerdeck.com/jimhester/vroom))\n- [📹 vroom: Read and write rectangular data quickly](https://www.youtube.com/watch?v=ZP_y5eaAc60) - a video tour of the vroom features.\n\n## Benchmarks\n\nThe speed quoted above is from a real `r format(fs::fs_bytes(tm$size[[1]]))` dataset with `r format(tm$rows[[1]], big.mark = \",\")` rows and `r tm$cols[[1]]` columns,\nsee the [benchmark article](https://vroom.r-lib.org/articles/benchmarks.html)\nfor full details of the dataset and\n[bench/](https://github.com/tidyverse/vroom/tree/main/inst/bench) for the code\nused to retrieve the data and perform the benchmarks.\n\n# Environment variables\n\nIn addition to the arguments to the `vroom()` function, you can control the\nbehavior of vroom with a few environment variables. Generally these will not\nneed to be set by most users.\n\n- `VROOM_TEMP_PATH` - Path to the directory used to store temporary files when\n  reading from a R connection. If unset defaults to the R session's temporary\n  directory (`tempdir()`).\n- `VROOM_THREADS` - The number of processor threads to use when indexing and\n  parsing. If unset defaults to `parallel::detectCores()`.\n- `VROOM_SHOW_PROGRESS` - Whether to show the progress bar when indexing.\n  Regardless of this setting the progress bar is disabled in non-interactive\n  settings, R notebooks, when running tests with testthat and when knitting\n  documents.\n- `VROOM_CONNECTION_SIZE` - The size (in bytes) of the connection buffer when\n  reading from connections (default is 128 KiB).\n- `VROOM_WRITE_BUFFER_LINES` - The number of lines to use for each buffer when\n  writing files (default: 1000).\n\nThere are also a family of variables to control use of the Altrep framework.\nFor versions of R where the Altrep framework is unavailable (R \u003c 3.5.0) they\nare automatically turned off and the variables have no effect. The variables\ncan take one of `true`, `false`, `TRUE`, `FALSE`, `1`, or `0`.\n\n- `VROOM_USE_ALTREP_NUMERICS` - If set use Altrep for _all_ numeric types\n  (default `false`).\n\nThere are also individual variables for each type. Currently only\n`VROOM_USE_ALTREP_CHR` defaults to `true`.\n\n- `VROOM_USE_ALTREP_CHR`\n- `VROOM_USE_ALTREP_FCT`\n- `VROOM_USE_ALTREP_INT`\n- `VROOM_USE_ALTREP_BIG_INT`\n- `VROOM_USE_ALTREP_DBL`\n- `VROOM_USE_ALTREP_NUM`\n- `VROOM_USE_ALTREP_LGL`\n- `VROOM_USE_ALTREP_DTTM`\n- `VROOM_USE_ALTREP_DATE`\n- `VROOM_USE_ALTREP_TIME`\n\n## RStudio caveats\n\nRStudio's environment pane calls `object.size()` when it refreshes the pane, which\nfor Altrep objects can be extremely slow. RStudio 1.2.1335+ includes the fixes\n([RStudio#4210](https://github.com/rstudio/rstudio/pull/4210),\n[RStudio#4292](https://github.com/rstudio/rstudio/pull/4292)) for this issue,\nso it is recommended you use at least that version.\n\n## Thanks\n\n- [Gabe Becker](https://github.com/gmbecker), [Luke\n  Tierney](https://homepage.divms.uiowa.edu/~luke/) and [Tomas Kalibera](https://github.com/kalibera) for\n  conceiving, Implementing and maintaining the [Altrep\n  framework](https://svn.r-project.org/R/branches/ALTREP/ALTREP.html)\n- [Romain François](https://github.com/romainfrancois), whose\n  [Altrepisode](https://web.archive.org/web/20200315075838/https://purrple.cat/blog/2018/10/14/altrep-and-cpp/) package\n  and [related blog-posts](https://web.archive.org/web/20200315075838/https://purrple.cat/blog/2018/10/14/altrep-and-cpp/) were a great guide for creating new Altrep objects in C++.\n- [Matt Dowle](https://github.com/mattdowle) and the rest of the [Rdatatable](https://github.com/Rdatatable) team, `data.table::fread()` is blazing fast and great motivation to see how fast we could go faster!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftidyverse%2Fvroom","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftidyverse%2Fvroom","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftidyverse%2Fvroom/lists"}