https://github.com/hyparam/hyparquet-compressors
Decompressors for hyparquet
https://github.com/hyparam/hyparquet-compressors
brotli decompress decompression decompressor gzip hyperparam javascript js lz4 parquet zstd
Last synced: about 1 year ago
JSON representation
Decompressors for hyparquet
- Host: GitHub
- URL: https://github.com/hyparam/hyparquet-compressors
- Owner: hyparam
- License: mit
- Created: 2024-05-09T04:18:52.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2025-03-20T09:51:34.000Z (about 1 year ago)
- Last Synced: 2025-05-10T03:01:32.393Z (about 1 year ago)
- Topics: brotli, decompress, decompression, decompressor, gzip, hyperparam, javascript, js, lz4, parquet, zstd
- Language: JavaScript
- Homepage:
- Size: 820 KB
- Stars: 13
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# hyparquet decompressors

[](https://www.npmjs.com/package/hyparquet-compressors)
[](https://github.com/hyparam/hyparquet-compressors/actions)
[](https://opensource.org/licenses/MIT)

This package exports a `compressors` object intended to be passed into [hyparquet](https://github.com/hyparam/hyparquet).
[Apache Parquet](https://parquet.apache.org) is a popular columnar storage format that is widely used in data engineering, data science, and machine learning applications for efficiently storing and processing large datasets. It supports a number of different compression formats, but most parquet files use snappy compression.
The hyparquet library by default only supports `uncompressed` and `snappy` compressed files. The `hyparquet-compressors` package extends support for all legal parquet compression formats.
The `hyparquet-compressors` package works in both node.js and the browser. Uses js and wasm packages, no system dependencies.
## Usage
```js
import { parquetRead } from 'hyparquet'
import { compressors } from 'hyparquet-compressors'
await parquetRead({ file, compressors, onComplete: console.log })
```
See [hyparquet](https://github.com/hyparam/hyparquet) repo for further info.
# Compression formats
Parquet compression types supported with `hyparquet-compressors`:
- [X] Uncompressed
- [X] Snappy
- [x] Gzip
- [ ] LZO
- [X] Brotli
- [X] LZ4
- [X] ZSTD
- [X] LZ4_RAW
## Snappy
Snappy compression uses [hysnappy](https://github.com/hyparam/hysnappy) for fast snappy decompression using minimal wasm.
## Gzip
New gzip implementation adapted from [fflate](https://github.com/101arrowz/fflate).
Includes modifications to handle repeated back-to-back gzip streams that sometimes occur in parquet files (but was not supported by fflate).
## Brotli
Includes a minimal port of [brotli.js](https://github.com/foliojs/brotli.js) which pre-compresses the brotli dictionary using gzip to minimize the distribution bundle size.
## LZ4
New LZ4 implementation includes support for legacy hadoop LZ4 frame format used on some old parquet files.
## Zstd
Uses [fzstd](https://github.com/101arrowz/fzstd) for Zstandard decompression.
# Bundle size
| File | Size |
| --- | --- |
| hyparquet-compressors.min.js | 116.1kb |
| hyparquet-compressors.min.js.gz | 75.2kb |
# References
- https://parquet.apache.org/docs/file-format/data-pages/compression/
- https://en.wikipedia.org/wiki/Brotli
- https://en.wikipedia.org/wiki/Gzip
- https://en.wikipedia.org/wiki/LZ4_(compression_algorithm)
- https://en.wikipedia.org/wiki/Snappy_(compression)
- https://en.wikipedia.org/wiki/Zstd
- https://github.com/101arrowz/fflate
- https://github.com/101arrowz/fzstd
- https://github.com/foliojs/brotli.js
- https://github.com/hyparam/hysnappy