{"id":16572620,"url":"https://github.com/knapply/fastgz","last_synced_at":"2025-10-26T20:02:11.795Z","repository":{"id":105873341,"uuid":"226726589","full_name":"knapply/fastgz","owner":"knapply","description":"Fast reading of .gz files to R character vectors.","archived":false,"fork":false,"pushed_at":"2019-12-09T01:31:20.000Z","size":60,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-05T14:28:40.071Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://knapply.github.io/fastgz/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/knapply.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-12-08T20:24:23.000Z","updated_at":"2020-01-22T18:19:43.000Z","dependencies_parsed_at":null,"dependency_job_id":"0964830f-acda-445e-8562-9bd803d9de0d","html_url":"https://github.com/knapply/fastgz","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/knapply/fastgz","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/knapply%2Ffastgz","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/knapply%2Ffastgz/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/knapply%2Ffastgz/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/knapply%2Ffastgz/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/knapply","download_url":"https://codeload.github.com/knapply/fastgz/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/knapply%2Ffastgz/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260040202,"owners_count":22949770,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-11T21:28:06.817Z","updated_at":"2025-10-10T19:18:13.294Z","avatar_url":"https://github.com/knapply.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput:\n  github_document:\n    html_preview: true\n  html_document:\n    keep_md: yes\nalways_allow_html: yes\neditor_options: \n  chunk_output_type: console\n---\n\n\u003c!-- README.Rmd generates README.md. --\u003e\n\n```{r, echo=FALSE}\nknitr::opts_chunk$set(\n  # collapse = TRUE,\n  fig.align = \"center\",\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/\",\n  message = FALSE,\n  warning = FALSE,\n  out.width=\"100%\"\n)\noptions(width = 120)\n```\n\n\n# `{fastgz}`\n\n\n\u003c!-- badges: start --\u003e\n[![CRAN status](https://www.r-pkg.org/badges/version/fastgz)](https://cran.r-project.org/package=fastgz)\n[![Lifecycle](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)\n[![GitHub last commit](https://img.shields.io/github/last-commit/knapply/fastgz.svg)](https://github.com/knapply/fastgz/commits/master)\n[![Codecov test coverage](https://codecov.io/gh/knapply/fastgz/branch/master/graph/badge.svg)](https://codecov.io/gh/knapply/fastgz?branch=master)\n[![AppVeyor build status](https://ci.appveyor.com/api/projects/status/github/knapply/fastgz?branch=master\u0026svg=true)](https://ci.appveyor.com/project/knapply/fastgz)\n[![Travis-CI Build Status](https://travis-ci.org/knapply/fastgz.svg?branch=master)](https://travis-ci.org/knapply/fastgz)\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n[![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/knapply/fastgz.svg)](https://github.com/knapply/fastgz)\n[![HitCount](http://hits.dwyl.io/knapply/fastgz.svg)](http://hits.dwyl.io/knapply/fastgz)\n\u003c!-- badges: end --\u003e\n\n\n# Why?\n\nFiles of non-trivial sizes are typically gzip files. `base::readLines()` is suprisingly quick at reading them, but we can go a tad faster. On the other hand, `readr::read_lines()` decompresses the file before reading it, which is... less than ideal.\n\n`{fastgz}` contains two simple helpers:\n\n1. `fastgz::read_gz_file()` reads an entire file(s) into a single `character()`\n2. `fastgz::read_gz_lines()`is the equivalent of `base::readLines()`/`readr::read_lines()`\n\nRather than relying the `apply()`/`purrr::map()` families, you can pass multiple file paths to both.\n\n# Benchmarks\n\n```{r}\nlibrary(fastgz)\nlibrary(microbenchmark)\nlibrary(ggplot2)\n\nfile_dir \u003c- readRDS(\"big_file_path\")\n\nfiles \u003c- dir(file_dir, pattern = \"\\\\.gz$\", full.names = TRUE)[1:3]\n\nscales::number_bytes(sum(file.size(files)))\n```\n\n\n```{r}\nres \u003c- microbenchmark(\n  fastgz_single = fastgz_single \u003c- read_gz_lines(files[[1]]),\n  base_single   = base_single   \u003c- readLines(files[[1]]),\n  fastgz_multi  = fastgz_multi  \u003c- read_gz_lines(files),\n  base_multi    = base_multi    \u003c- unlist(lapply(files, readLines), \n                                          use.names = FALSE)\n  ,\n  times = 3\n)\n\nidentical(fastgz_single, base_single) \u0026\u0026 identical(fastgz_multi, base_multi)\nlapply(list(single = fastgz_single, multi = fastgz_multi), pryr::object_size)\nres\nautoplot(res)\n```\n\n\n# Shout Outs\n\n* [`{Rcpp}`](http://www.rcpp.org/)\n* [`Gzstream`](https://www.cs.unc.edu/Research/compgeom/gzstream/)\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fknapply%2Ffastgz","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fknapply%2Ffastgz","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fknapply%2Ffastgz/lists"}