{"id":13423396,"url":"https://github.com/coolbutuseless/zstdlite","last_synced_at":"2025-10-11T14:35:18.725Z","repository":{"id":95359935,"uuid":"273480771","full_name":"coolbutuseless/zstdlite","owner":"coolbutuseless","description":"Fast, configurable in-memory compression of R objects with zstd","archived":false,"fork":false,"pushed_at":"2024-04-13T10:16:12.000Z","size":919,"stargazers_count":25,"open_issues_count":7,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-04-13T20:09:07.047Z","etag":null,"topics":["zstd"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/coolbutuseless.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2020-06-19T11:47:11.000Z","updated_at":"2024-04-14T21:29:40.338Z","dependencies_parsed_at":null,"dependency_job_id":"e422b6bb-7a1a-49b3-a578-1f2ea876c94d","html_url":"https://github.com/coolbutuseless/zstdlite","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/coolbutuseless/zstdlite","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coolbutuseless%2Fzstdlite","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coolbutuseless%2Fzstdlite/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coolbutuseless%2Fzstdlite/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coolbutuseless%2Fzstdlite/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/coolbutuseless","download_url":"https://codeload.github.com/coolbutuseless/zstdlite/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coolbutuseless%2Fzstdlite/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279007448,"owners_count":26084313,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["zstd"],"created_at":"2024-07-31T00:00:33.627Z","updated_at":"2025-10-11T14:35:18.706Z","avatar_url":"https://github.com/coolbutuseless.png","language":"C","funding_links":[],"categories":["C"],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = FALSE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n\nlibrary(dplyr)\nlibrary(zstdlite)\n\n\nif (FALSE) {\n  covr::report(covr::package_coverage(\n    line_exclusions = list('src/zstd.c', 'src/zstd.h')\n  ))\n}\n\nif (FALSE) {\n  pkgdown::build_site(override = list(destination = \"../coolbutuseless.github.io/package/yyjsonr\"))\n}\n```\n\n# zstdlite\n\n\u003c!-- badges: start --\u003e\n![](https://img.shields.io/badge/cool-useless-green.svg)\n[![R-CMD-check](https://github.com/coolbutuseless/zstdlite/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/coolbutuseless/zstdlite/actions/workflows/R-CMD-check.yaml)\n\u003c!-- badges: end --\u003e\n\n`zstdlite` provides access to the very fast (and highly configurable) \n [zstd](https://github.com/facebook/zstd) library for serialization of\nR objects and compression/decompression of raw byte buffers and strings.\n\n[zstd](https://github.com/facebook/zstd) code provided with this package is v1.5.6, \nand is included under its BSD license (compatible with the MIT license for this package).\n\n\n\n## What's in the box\n\n* `zstdfile()`\n    * A connection object (like `gzfile()` or `url()`) which supports Zstandard \n      compressed data.\n    * Supports read/write of both text and binary data.  e.g. `readLines()` and \n      `readBin()`\n    * Can be used by any R code which supports connections.\n    * Like `gzcon()`, `zstdfile()` can also write to other connections.\n        * e.g. `zstdfile(fifo(\"out\"))`\n* `zstd_serialize()` and `zstd_unserialize()` \n    * convert arbitrary R objects to/from a compressed representation\n    * this is equivalent to base R's `serialize()`/`unserialize()` with the addition\n      of `zstd` compression on the serialized data\n* `zstd_compress()` and `zstd_decompress()` are for compressing/decompressing strings and raw vectors -\n  usually for interfacing with other systems e.g. data was already compressed on the command line.\n* `zstd_info()` returns a named list of information about a compressed data source\n* `zstd_cctx()` and `zstd_dctx()` initialize compression and \n  decompression contexts, respectively.  Options:\n    * `level` compression level in range [-5, 22]. Default: 3\n    * `num_threads` number of threads to use. Default: 1\n    * `include_checksum`/`validate_checksum` Default: FALSE\n    * `dict` specify dictionary for aiding compression\n* `zstd_train_dict_compress()` and `zstd_train_dict_serialize()` for creating\n  dictionaries which can speed up compression/decompression\n\n\n## Comparison to `saveRDS()`/`readRDS()`\n\nThe image below compares `{zstdlite}` with `saveRDS()` for saving compressed\nrepresentations of R objects.  (See `man/benchmarks.R` for code)\n\nThings to note in this comparison for the particular data used:\n\n* `zstd` compression can be much faster than the compression options \n  offered by `saveRDS()`\n* `zstd` decompression speed is very fast and (mostly) independent of compression\n  settings\n* Compressing with `xz` and `bzip2` can both produce more compressed representations but\n  at the expense of slow compression/decompression.\n\n\n\u003cimg src=\"man/figures/comparison.png\" width=\"100%\" /\u003e\n\n\n## Installation\n\nTo install from r-universe:\n\n``` r\ninstall.packages('zstdlite', repos = c('https://coolbutuseless.r-universe.dev', 'https://cloud.r-project.org'))\n```\n\nTo install latest version from [GitHub](https://github.com/coolbutuseless/zstdlite):\n\n``` r\n# install.package('remotes')\nremotes::install_github('coolbutuseless/zstdlite')\n```\n\n## Basic Usage of `zstd_serialize()` and `zstd_unserialize()`\n\n`zstd_serialize()` and `zstd_unserialize()` are direct analogues of base R's\n`serialize()` and `unserialize()`.\n\nBecause `zstd_serialize()` and `zstd_unserialize()` use R's serialization \nmechanism, they will save/load (almost) any R object e.g. data.frames, lists, environments, etc\n\n```{r}\ncompressed_bytes \u003c- zstd_serialize(head(mtcars))\nlength(compressed_bytes) \nhead(compressed_bytes, 100)\n\nzstd_unserialize(compressed_bytes) \n```\n\n\n## Using a `zstdfile()` connection\n\nUse `zstdfile()` to allow read/write access of compressed data from any R code\nor package which supports connections.\n\n```{r}\ntmp \u003c- tempfile()\ndat \u003c- as.raw(1:255)\nwriteBin(dat, zstdfile(tmp))\nreadBin(zstdfile(tmp), raw(), 255)\n```\n\n\n## Using contexts to set compression arguments\n\nThe `zstd` algorithm uses *contexts* to control the compression and decompression.\nEvery time data is compressed/decompressed a context is created.\n\n*Contexts* can be created ahead-of-time, or created on-the-fly.  There can \nbe a some speed advantages to creating a *context* ahead of time and reusing\nit for multiple compression operations.\n\nThe following ways of calling `zstd_serialize()` are equivalent:\n\n```{r eval=FALSE}\nzstd_serialize(data1, num_threads = 3, level = 20)\nzstd_serialize(data2, num_threads = 3, level = 20)\n```\n\n```{r eval=FALSE}\ncctx \u003c- zstd_cctx(num_threads = 3, level = 20)\nzstd_serialize(data1, cctx = cctx)\nzstd_serialize(data2, cctx = cctx)\n```\n\n\n\n\n\n\n## Using `zstd_compression()`/`zstd_decompress()` for raw data and strings\n\n`zstd_serialize()`/`zstd_unserialize()` compress R objects that are really only \nuseful when working in R, or sharing data with other R users.\n\nIn contrast, `zstd_compress()`/`zstd_decompress()` operate on raw vectors\nand strings, and these functions are suitable \nfor handling compressed data which is compatible with other systems and languages\n\nExamples:\n\n* reading compressed JSON files\n* writing compressed data for storage in a database that will be accessed \n  by different computer languages and operating systems.\n\n\n```{r eval=FALSE, echo=FALSE}\nzz \u003c- as.list(mtcars[1,]) |\u003e jsonlite::toJSON(pretty = TRUE)\nwriteLines(zz, \"man/figures/data.json\")\nsystem(\"zstd man/figures/data.json\")\n```\n\n#### Reading a compressed JSON file\n\nJSON files are often compressed.  In this case, the `data.json` file was \ncompressed using the `zstd` command-line tool, i.e.\n\n    zstd data.json -o data.json.zst\n    \nThis compressed file can be read directly into R as uncompressed bytes (in a \nraw vector), or as a string\n\n```{r}\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n# Read a compressed JSON file as raw bytes or as a string\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nzstd_decompress(\"man/figures/data.json.zst\")\nzstd_decompress(\"man/figures/data.json.zst\", type = 'string') |\u003e cat()\n```\n\n#### Compressing a string\n\nWhen transmitting large amounts of text to another system, we may wish to \nfirst compress it.\n\nThe following string (`manifesto`) is compressed by `zstd_compress()` and\ncan be uncompressed by any system which supports the `zstd` library, or\neven just using the `zstd` command-line tool.\n\n\n```{r}\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n# Compress a string directly \n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nmanifesto \u003c- paste(lorem::ipsum(paragraphs = 100), collapse = \"\\n\")\nlobstr::obj_size(manifesto)\n\ncompressed \u003c- zstd_compress(manifesto, level = 22)\nlobstr::obj_size(compressed)\n\nidentical(\n  zstd_decompress(compressed, type = 'string'),\n  manifesto\n)\n```\n\n\n## Limitations\n\n* Reference objects which need to be serialized with a `refhook` approach are not handled.\n\n\n## Dictionary-based compression\n\nFrom the `zstd` documentation:\n\n    Zstd can use dictionaries to improve compression ratio of small data.\n    Traditionally small files don't compress well because there is very little\n    repetition in a single sample, since it is small. But, if you are compressing\n    many similar files, like a bunch of JSON records that share the same\n    structure, you can train a dictionary on ahead of time on some samples of\n    these files. Then, zstd can use the dictionary to find repetitions that are\n    present across samples. This can vastly improve compression ratio.\n\n\n### Dictionary Example\n\n\n\nThe following shows that using a dictionary for this specific \nexample doubles the compression ratio!\n\n```{r}\nset.seed(2024)\ncountries \u003c- rownames(LifeCycleSavings)\n\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n# In this example consider the case of having a named vector of rankings of \n# countries.  Each ranking will be compressed separately and stored (say in a database)\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nrankings \u003c- lapply(\n  1:1000, \n  \\(x) setNames(sample(length(countries)), countries)\n)\n\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n# Create a dictionary\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\ndict \u003c- zstd_train_dict_serialize(rankings, size = 1500, optim = TRUE)\n\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n# Setup Compression contexts to use this dictionary\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\ncctx_nodict \u003c- zstd_cctx(level = 13) # No dictionary. For comparison\ncctx_dict   \u003c- zstd_cctx(level = 13, dict = dict)\n\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n# When using the dictionary, what is the size of the compressed data compared\n# to not using a dicionary here?\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\ns1 \u003c- lapply(rankings, \\(x) zstd_serialize(x, cctx = cctx_nodict)) |\u003e lengths() |\u003e sum()\ns2 \u003c- lapply(rankings, \\(x) zstd_serialize(x, cctx = cctx_dict  )) |\u003e lengths() |\u003e sum()\n```\n\n```{r echo=FALSE}\ns0 \u003c- lapply(rankings, \\(x) serialize(x, NULL)) |\u003e lengths() |\u003e sum()\ncat(\"Compression ratio                :\", round(s0/s1, 1), \"\\n\")\ncat(\"Compression ratio with dictionary:\", round(s0/s2, 1), \"\\n\")\n```\n\n\n\n\n## Acknowledgements\n\n* Yann Collett for creating \n[lz4](https://github.com/lz4/lz4) and [zstd](https://github.com/facebook/zstd)\n* R Core for developing and maintaining such a wonderful language.\n* CRAN maintainers, for patiently shepherding packages onto CRAN and maintaining\n  the repository\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoolbutuseless%2Fzstdlite","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcoolbutuseless%2Fzstdlite","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoolbutuseless%2Fzstdlite/lists"}