{"id":15554881,"url":"https://github.com/qsbase/qs","last_synced_at":"2025-12-12T01:03:09.322Z","repository":{"id":46121368,"uuid":"163551739","full_name":"qsbase/qs","owner":"qsbase","description":"Quick serialization of R objects","archived":false,"fork":false,"pushed_at":"2025-03-20T23:50:23.000Z","size":4941,"stargazers_count":421,"open_issues_count":2,"forks_count":20,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-05-06T17:55:04.513Z","etag":null,"topics":["compression","data-storage","encoding","r","serialization"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/qsbase.png","metadata":{"files":{"readme":"README.md","changelog":"ChangeLog","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-12-30T00:54:43.000Z","updated_at":"2025-05-04T08:53:11.000Z","dependencies_parsed_at":"2023-12-14T08:31:16.767Z","dependency_job_id":"f828944c-0163-4f10-bba3-d58245fd1fbd","html_url":"https://github.com/qsbase/qs","commit_stats":{"total_commits":243,"total_committers":9,"mean_commits":27.0,"dds":"0.15637860082304522","last_synced_commit":"3838fbe07802e33d979054cc9a45ce2eb4e510a5"},"previous_names":["traversc/qs"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qsbase%2Fqs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qsbase%2Fqs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qsbase%2Fqs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qsbase%2Fqs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/qsbase","download_url":"https://codeload.github.com/qsbase/qs/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254436949,"owners_count":22070947,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression","data-storage","encoding","r","serialization"],"created_at":"2024-10-02T15:03:57.613Z","updated_at":"2025-12-12T01:03:09.282Z","avatar_url":"https://github.com/qsbase.png","language":"C","readme":"Using qs\n================\n\n[![R-CMD-check](https://github.com/qsbase/qs/workflows/R-CMD-check/badge.svg)](https://github.com/qsbase/qs/actions)\n[![CRAN-Status-Badge](https://www.r-pkg.org/badges/version/qs)](https://cran.r-project.org/package=qs)\n[![CRAN-Downloads-Badge](https://cranlogs.r-pkg.org/badges/qs)](https://cran.r-project.org/package=qs)\n[![CRAN-Downloads-Total-Badge](https://cranlogs.r-pkg.org/badges/grand-total/qs)](https://cran.r-project.org/package=qs)\n\n*Quick serialization of R objects*\n\n`qs` provides an interface for quickly saving and reading objects to and\nfrom disk. The goal of this package is to provide a lightning-fast and\ncomplete replacement for the `saveRDS` and `readRDS` functions in R.\n\nInspired by the `fst` package, `qs` uses a similar block-compression\ndesign using either the `lz4` or `zstd` compression libraries. It\ndiffers in that it applies a more general approach for attributes and\nobject references.\n\n`saveRDS` and `readRDS` are the standard for serialization of R data,\nbut these functions are not optimized for speed. On the other hand,\n`fst` is extremely fast, but only works on `data.frame`’s and certain\ncolumn types.\n\n`qs` is both extremely fast and general: it can serialize any R object\nlike `saveRDS` and is just as fast and sometimes faster than `fst`.\n\n## Usage\n\n``` r\nlibrary(qs)\ndf1 \u003c- data.frame(x = rnorm(5e6), y = sample(5e6), z=sample(letters, 5e6, replace = T))\nqsave(df1, \"myfile.qs\")\ndf2 \u003c- qread(\"myfile.qs\")\n```\n\n## Installation\n\n``` r\n# CRAN version\ninstall.packages(\"qs\")\n\n# CRAN version compile from source (recommended)\nremotes::install_cran(\"qs\", type = \"source\", configure.args = \"--with-simd=AVX2\")\n```\n\n## Features\n\nThe table below compares the features of different serialization\napproaches in R.\n\n|                      | qs |        fst         | saveRDS |\n| -------------------- | :-: | :----------------: | :-----: |\n| Not Slow             | ✔  |         ✔          |    ❌    |\n| Numeric Vectors      | ✔  |         ✔          |    ✔    |\n| Integer Vectors      | ✔  |         ✔          |    ✔    |\n| Logical Vectors      | ✔  |         ✔          |    ✔    |\n| Character Vectors    | ✔  |         ✔          |    ✔    |\n| Character Encoding   | ✔  | (vector-wide only) |    ✔    |\n| Complex Vectors      | ✔  |         ❌          |    ✔    |\n| Data.Frames          | ✔  |         ✔          |    ✔    |\n| On disk row access   | ❌  |         ✔          |    ❌    |\n| Random column access | ❌  |         ✔          |    ❌    |\n| Attributes           | ✔  |        Some        |    ✔    |\n| Lists / Nested Lists | ✔  |         ❌          |    ✔    |\n| Multi-threaded       | ✔  |         ✔          |    ❌    |\n\n`qs` also includes a number of advanced features:\n\n  - For character vectors, qs also has the option of using the new\n    ALTREP system (R version 3.5+) to quickly read in string data.\n  - For numerical data (numeric, integer, logical and complex vectors)\n    `qs` implements byte shuffling filters (adopted from the Blosc\n    meta-compression library). These filters utilize extended CPU\n    instruction sets (either SSE2 or AVX2).\n  - `qs` also efficiently serializes S4 objects, environments, and other\n    complex objects.\n\nThese features have the possibility of additionally increasing\nperformance by orders of magnitude, for certain types of data. See\nsections below for more details.\n\n## Summary Benchmarks\n\nThe following benchmarks were performed comparing `qs`, `fst` and\n`saveRDS`/`readRDS` in base R for serializing and de-serializing a\nmedium sized `data.frame` with 5 million rows (approximately 115 Mb in\nmemory):\n\n``` r\ndata.frame(a = rnorm(5e6), \n           b = rpois(5e6, 100),\n           c = sample(starnames$IAU, 5e6, T),\n           d = sample(state.name, 5e6, T),\n           stringsAsFactors = F)\n```\n\n`qs` is highly parameterized and can be tuned by the user to extract as\nmuch speed and compression as possible, if desired. For simplicity, `qs`\ncomes with 4 presets, which trades speed and compression ratio: “fast”,\n“balanced”, “high” and “archive”.\n\nThe plots below summarize the performance of `saveRDS`, `qs` and `fst`\nwith various parameters:\n\n### Serializing\n\n![](vignettes/df_bench_write.png \"df_bench_write\")\n\n### De-serializing\n\n![](vignettes/df_bench_read.png \"df_bench_read\")\n\n*(Benchmarks are based on `qs` ver. 0.21.2, `fst` ver. 0.9.0 and R\n3.6.1.)*\n\nBenchmarking write and read speed is a bit tricky and depends highly on\na number of factors, such as operating system, the hardware being run\non, the distribution of the data, or even the state of the R instance.\nReading data is also further subjected to various hardware and software\nmemory caches.\n\nGenerally speaking, `qs` and `fst` are considerably faster than\n`saveRDS` regardless of using single threaded or multi-threaded\ncompression. `qs` also manages to achieve superior compression ratio\nthrough various optimizations (e.g. see “Byte Shuffle” section below).\n\n## ALTREP character vectors\n\nThe ALTREP system (new as of R 3.5.0) allows package developers to\nrepresent R objects using their own custom memory layout. This allows a\npotentially large speedup in processing certain types of data.\n\nIn `qs`, `ALTREP` character vectors are implemented via the\n[`stringfish`](https://github.com/traversc/stringfish) package and can\nbe used by setting `use_alt_rep=TRUE` in the `qread` function. The\nbenchmark below shows the time it takes to `qread` several million\nrandom strings (`nchar = 80`) with and without `ALTREP`.\n\n![](vignettes/altrep_bench.png \"altrep_bench\")\n\nThe large speedup demonstrates why one would want to consider the\nsystem, but there are caveats. Downstream processing functions must be\n`ALTREP`-aware. See the\n[`stringfish`](https://github.com/traversc/stringfish) package for more\ndetails.\n\n## Byte shuffle\n\nByte shuffling (adopted from the Blosc meta-compression library) is a\nway of re-organizing data to be more amenable to compression. An integer\ncontains four bytes and the limits of an integer in R are +/- 2^31-1.\nHowever, most real data doesn’t use anywhere near the range of possible\ninteger values. For example, if the data were representing percentages,\n0% to 100%, the first three bytes would be unused and zero.\n\nByte shuffling rearranges the data such that all of the first bytes are\nblocked together, all of the second bytes are blocked together, and so\non. This procedure often makes it very easy for compression algorithms\nto find repeated patterns and can often improve compression ratio by\norders of magnitude. In the example below, shuffle compression achieves\na compression ratio of over 1:1000. See `?qsave` for more details.\n\n``` r\n# With byte shuffling\nx \u003c- 1:1e8\nqsave(x, \"mydat.qs\", preset = \"custom\", shuffle_control = 15, algorithm = \"zstd\")\ncat( \"Compression Ratio: \", as.numeric(object.size(x)) / file.info(\"mydat.qs\")$size, \"\\n\" )\n# Compression Ratio:  1389.164\n\n# Without byte shuffling\nx \u003c- 1:1e8\nqsave(x, \"mydat.qs\", preset = \"custom\", shuffle_control = 0, algorithm = \"zstd\")\ncat( \"Compression Ratio: \", as.numeric(object.size(x)) / file.info(\"mydat.qs\")$size, \"\\n\" )\n# Compression Ratio:  1.479294 \n```\n\n## Serializing to memory\n\nYou can use `qs` to directly serialize objects to memory.\n\nExample:\n\n``` r\nlibrary(qs)\nx \u003c- qserialize(c(1, 2, 3))\nqdeserialize(x)\n[1] 1 2 3\n```\n\n## Serializing objects to ASCII\n\nThe `qs` package includes two sets of utility functions for converting\nbinary data to ASCII:\n\n  - `base85_encode` and `base85_decode`\n  - `base91_encode` and `base91_decode`\n\nThese functions are similar to base64 encoding functions found in\nvarious packages, but offer greater efficiency.\n\nExample:\n\n``` r\nenc \u003c- base91_encode(qserialize(datasets::mtcars, preset = \"custom\", compress_level = 22))\ndec \u003c- qdeserialize(base91_decode(enc))\n```\n\n(Note: base91 strings contain double quote characters (`\"`) and need to\nbe single quoted if stored as a string.)\n\nSee the help files for additional details and history behind these\nalgorithms.\n\n## Using qs within Rcpp\n\n`qs` functions can be called directly within C++ code via Rcpp.\n\nExample C++ script:\n\n    // [[Rcpp::depends(qs)]]\n    #include \u003cRcpp.h\u003e\n    #include \u003cqs.h\u003e\n    using namespace Rcpp;\n    \n    // [[Rcpp::export]]\n    void test() {\n      qs::qsave(IntegerVector::create(1,2,3), \"/tmp/myfile.qs\", \"high\", \"zstd\", 1, 15, true, 1);\n    }\n\nR side:\n\n``` r\nlibrary(qs)\nlibrary(Rcpp)\nsourceCpp(\"test.cpp\")\n# save file using Rcpp interface\ntest()\n# read in file created through Rcpp interface\nqread(\"/tmp/myfile.qs\")\n[1] 1 2 3\n```\n\nThe C++ functions do not have default parameters; all parameters must be\nspecified.\n\n## Future developments\n\n  - Additional compression algorithms\n  - Improved ALTREP serialization\n  - Re-write of multithreading code\n  - Mac M1 optimizations (NEON) and checking\n\nFuture versions will be backwards compatible with the current version.\n","funding_links":[],"categories":["C"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqsbase%2Fqs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqsbase%2Fqs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqsbase%2Fqs/lists"}