https://github.com/coolbutuseless/lz4lite
Very Fast compression/decompression of in-memory numeric vectors with LZ4
https://github.com/coolbutuseless/lz4lite
Last synced: about 2 months ago
JSON representation
Very Fast compression/decompression of in-memory numeric vectors with LZ4
- Host: GitHub
- URL: https://github.com/coolbutuseless/lz4lite
- Owner: coolbutuseless
- License: other
- Created: 2020-06-16T10:12:12.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-09-26T00:18:23.000Z (over 4 years ago)
- Last Synced: 2025-02-01T21:02:28.618Z (2 months ago)
- Language: C
- Homepage:
- Size: 117 KB
- Stars: 20
- Watchers: 4
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - coolbutuseless/lz4lite - Very Fast compression/decompression of in-memory numeric vectors with LZ4 (C)
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = FALSE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)library(dplyr)
library(lz4lite)
```# lz4lite

[](https://www.tidyverse.org/lifecycle/#experimental)
[](https://github.com/coolbutuseless/lz4lite/actions)`lz4lite` provides access to the extremely fast compression in [lz4](https://github.com/lz4/lz4)
for performing in-memory compression.As of v0.2.0, `lz4lite` can now serialize and compress any R object understood by
`base::serialize()`.If the input is known to be an atomic, numeric vector, and you do not care about any attributes or
names on this vector, then `lz4_compress()`/`lz4_uncompress()` can be used. These
are bespoke serialization routines for atomic numeric vectors that run faster since
they avoid R's internals.For a more general solution to fast serialization of R objects, see the
[fst](https://github.com/fstpackage/fst) or [qs](https://cran.r-project.org/package=qs) packages.Currently lz4 code provided with this package is v1.9.3.
### What's in the box
* **For arbitrary R objects**
* `lz4_serialize`/`lz4_unserialize` serialize and compress any R object.
* **For atomic vectors with numeric values**
* `lz4_compress()`/`lz4_uncompress()`
- compress the data within a vector of raw, integer, real, complex or logical values
- faster than `lz4_serialize/unserialize` but throws away all attributes i.e. names, dims etc### Installation
You can install from [GitHub](https://github.com/coolbutuseless/lz4lite) with:
``` r
# install.package('remotes')
remotes::install_github('coolbutuseless/lz4lite)
```## Basic usage of lz4lite
```{r}
dat <- mtcarsbuf <- lz4_serialize(dat)
length(buf) # Number of bytes# compression ratio
length(buf)/length(serialize(dat, NULL))head(lz4_unserialize(buf))
```## Compressing 1 million Integers
```{r}
library(lz4lite)max_hc <- 12
set.seed(1)
N <- 5e6
input_ints <- sample(1:3, N, prob = (1:3)^3, replace = TRUE)
serialize_base <- serialize(input_ints, NULL, xdr = FALSE)
serialize_lo <- lz4_serialize(input_ints, acceleration = 1)
serialize_hi_3 <- lz4hc_serialize(input_ints, level = 3)
serialize_hi_9 <- lz4hc_serialize(input_ints, level = 9)
serialize_hi_12 <- lz4hc_serialize(input_ints, level = max_hc)
compress_lo <- lz4_compress(input_ints, acceleration = 1)
compress_hi_3 <- lz4hc_compress(input_ints, level = 3)
compress_hi_9 <- lz4hc_compress(input_ints, level = 9)
compress_hi_12 <- lz4hc_compress(input_ints, level = max_hc)
``````{r echo = FALSE}
lens <- c(
length(serialize(input_ints, NULL, xdr = FALSE)),
length(lz4_serialize(input_ints, acceleration = 1)),
length(lz4hc_serialize(input_ints, level = 3)),
length(lz4hc_serialize(input_ints, level = 9)),
length(lz4hc_serialize(input_ints, level = max_hc)),
length(lz4_compress (input_ints, acceleration = 1)),
length(lz4hc_compress (input_ints, level = 3)),
length(lz4hc_compress (input_ints, level = 9)),
length(lz4hc_compress (input_ints, level = max_hc))
)
```Click here to show/hide benchmark code
```{r}
library(lz4lite)res <- bench::mark(
serialize(input_ints, NULL, xdr = FALSE),
lz4_serialize(input_ints, acceleration = 1),
lz4hc_serialize(input_ints, level = 3),
lz4hc_serialize(input_ints, level = 9),
lz4hc_serialize(input_ints, level = max_hc),
lz4_compress (input_ints, acceleration = 1),
lz4hc_compress (input_ints, level = 3),
lz4hc_compress (input_ints, level = 9),
lz4hc_compress (input_ints, level = max_hc),
check = FALSE
)
``````{r echo = FALSE}
res %>%
mutate(`MB/s` = round(N*4/1024^2 / as.numeric(median), 1)) %>%
mutate(`itr/sec` = round(`itr/sec`)) %>%
mutate(compression_ratio = round(lens/(N*4), 3)) %>%
select(expression, median, `itr/sec`, `MB/s`, compression_ratio) %>%
knitr::kable()
```### uncompressing 1 million integers
uncompression speed varies slightly depending upon the compressed size.
Click here to show/hide benchmark code
```{r}
res <- bench::mark(
lz4_uncompress(compress_lo),
lz4_uncompress(compress_hi_3),
lz4_uncompress(compress_hi_9),
lz4_uncompress(compress_hi_12)
)
``````{r echo = FALSE}
res %>%
mutate(`MB/s` = round(N*4/1024^2 / as.numeric(median), 1)) %>%
mutate(`itr/sec` = round(`itr/sec`)) %>%
select(expression, median, `itr/sec`, `MB/s`) %>%
knitr::kable()
```### uncompressing 1 million integers
uncompression speed varies slightly depending upon the compressed size.
Click here to show/hide benchmark code
```{r}
res <- bench::mark(
unserialize(serialize_base),
lz4_unserialize(serialize_lo),
lz4_unserialize(serialize_hi_3),
lz4_unserialize(serialize_hi_9),
lz4_unserialize(serialize_hi_12)
)
``````{r echo = FALSE}
res %>%
mutate(`MB/s` = round(N*4/1024^2 / as.numeric(median), 1)) %>%
mutate(`itr/sec` = round(`itr/sec`)) %>%
select(expression, median, `itr/sec`, `MB/s`) %>%
knitr::kable()
```## Technical bits
### Framing of the compressed data
* `lz4lite` does **not** use the standard LZ4 frame to store data.
* The compressed representation is the compressed data prefixed with
a custom 8 byte header consisting of
* 3 bytes = 'LZ4'
* If this was produced with `lz4_serialize()` the next byte is 0x00,
otherwise it is a byte representing the SEXP of the encoded object.
* 4-byte length value i.e. the number of bytes in the
original uncompressed data.
* This data representation
* is not compatible with the standard LZ4 frame
format.
* is likely to evolve (so currently do not plan on compressing something in
one version of `lz4lite` and uncompressing in another version.)## Related Software
* [lz4](https://github.com/lz4/lz4) and [zstd](https://github.com/facebook/zstd) - both by Yann Collet
* [fst](https://github.com/fstpackage/fst) for serialisation of data.frames using
lz4 and zstd
* [qs](https://cran.r-project.org/package=qs) for fast serialization of arbitrary R objects
with lz4 and zstd## Acknowledgements
* Yann Collett for releasing, maintaining and advancing
[lz4](https://github.com/lz4/lz4) and [zstd](https://github.com/facebook/zstd)
* R Core for developing and maintaining such a wonderful language.
* CRAN maintainers, for patiently shepherding packages onto CRAN and maintaining
the repository