https://github.com/philipaconrad/gzipstreamwriter
Ever wanted to merge gzipped []byte blobs together without decompressing first? Now you can.
https://github.com/philipaconrad/gzipstreamwriter
compression go golang gzip gzip-compression
Last synced: 12 months ago
JSON representation
Ever wanted to merge gzipped []byte blobs together without decompressing first? Now you can.
- Host: GitHub
- URL: https://github.com/philipaconrad/gzipstreamwriter
- Owner: philipaconrad
- License: apache-2.0
- Created: 2024-10-09T06:45:07.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-25T17:40:20.000Z (about 1 year ago)
- Last Synced: 2025-04-25T18:42:01.100Z (about 1 year ago)
- Topics: compression, go, golang, gzip, gzip-compression
- Language: Go
- Homepage:
- Size: 68.4 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# GzipStreamWriter
*Ever wanted to merge gzipped `[]byte` blobs together without decompressing first? Now you can.*
This project exists to solve a very specific problem: efficiently concatenating gzip blobs together, as if they'd all been written as a single stream. This is trickier than it sounds, because we *don't* want to decompress the gzipped blobs while writing them!
## Design Goals:
- Allow writing either raw bytes or compressed gzip blobs to a destination, resulting in a valid, concatenated gzip blob at the destination, as if the blob had been written "all at once" as a single gzip byte stream.
- Avoid decompressing the compressed blobs.
- Avoid excessive memory and CPU burn if possible.
- Avoid exposing the awful complex parts to happy-path users. Still provide some support scaffolding for hard-mode folks.
## Design Anti-Goals:
- Best possible compression performance: We know that some usage patterns can result in a less-than-optimal overall compression ratio.
## Algorithm
For writing compressed blobs:
- Drop the header.
- Extract CRC32 and uncompressed length fields from the trailer, and drop the trailer.
- Write the blob to the stream.
- Update the running CRC32 by the XOR trick from zlib.
- Update the length field using the trailer length field.
For uncompressed `[]byte` writes:
- Update the length field using `len(slice)`.
- Compress into a gzip blob.
- Extract the CRC32 from the trailer.
- Drop header and trailer.
- Write the blob to the stream.
This gives us a powerful abstraction that "does the right thing" behind the scenes, while being ridiculously cheaper to compute than decompressing and recompressing compressed gzip data.
## Go Version Support
I'm currently supporting the latest Go major version.
Previously, I'd aimed to support the current major version - 2, but there's a world of performance and features availble on Go 1.24+, and I'd like to have access to those things.
The larger open source project that this library was originally developed for has caught up to at least Go 1.23+, so this library should be usable for them soon.