{"id":49154087,"url":"https://github.com/cristiancmoises/vaptvupt","last_synced_at":"2026-04-26T07:01:23.740Z","repository":{"id":347988969,"uuid":"1195126257","full_name":"cristiancmoises/vaptvupt","owner":"cristiancmoises","description":"Fast LZ77 + tANS entropy codec in pure C11","archived":false,"fork":false,"pushed_at":"2026-04-22T06:46:03.000Z","size":367,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-22T08:03:59.638Z","etag":null,"topics":["c11","codec","compression","data-compression","decompression","entropy-coding","high-performance","high-performance-computing","rans","speed"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cristiancmoises.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-29T08:59:28.000Z","updated_at":"2026-04-22T06:46:07.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/cristiancmoises/vaptvupt","commit_stats":null,"previous_names":["cristiancmoises/vaptvupt"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/cristiancmoises/vaptvupt","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cristiancmoises%2Fvaptvupt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cristiancmoises%2Fvaptvupt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cristiancmoises%2Fvaptvupt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cristiancmoises%2Fvaptvupt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cristiancmoises","download_url":"https://codeload.github.com/cristiancmoises/vaptvupt/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cristiancmoises%2Fvaptvupt/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32288653,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-26T06:26:00.361Z","status":"ssl_error","status_checked_at":"2026-04-26T06:25:58.791Z","response_time":129,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c11","codec","compression","data-compression","decompression","entropy-coding","high-performance","high-performance-computing","rans","speed"],"created_at":"2026-04-22T08:02:01.807Z","updated_at":"2026-04-26T07:01:23.727Z","avatar_url":"https://github.com/cristiancmoises.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# VaptVupt\n\n**A compression codec purpose-built for secure backup tools.** Pure\nC11, zero runtime dependencies, single-file amalgamation. Produces an\nopen wire format ([FORMAT.md](FORMAT.md)) stable since v1.0.0, with\nbyte-exact reference decoders in Python and JavaScript.\n\n**Current version: v2.46.0.** 6,032+ tests + 5,200-case differential\nfuzzer. Production-ready for Zupt 2.1.6 integration — see\n[ZUPT_INTEGRATION.md](ZUPT_INTEGRATION.md). Three Silesia fixtures\n(fx_json, x-ray, sao) now beat zstd-3 on ratio.\n\n## Headline Numbers\n\n- **Random-data decode: 26,773 MB/s** with `--fast` —\n  **3.7× zstd-19, 1.5× lz4-9**. The signature path for AEAD-wrapped\n  archives.\n- **Synthetic binary ratio: 1,149×** — 7× better than gzip-9, 6×\n  better than lz4-9 on pattern-rich payloads.\n- **Synthetic repeat ratio: 7,367×** — 18× better than gzip-9.\n- **JSON ratio: 5.10×** — beats both gzip-9 and zstd-3.\n- **Real binary ratio** (libc.so.6, bash, python3): within\n  **2-3% of zstd-3** as of v2.46.0's Huffman-in-SEQ literal coding.\n- **Embeddability**: 2 files (`build/vaptvupt.c` + `build/vaptvupt.h`).\n  Drop in and ship.\n\nSee [COMPETITIVE.md](COMPETITIVE.md) for the full measurement matrix\nagainst zstd, lz4, and gzip across ten fixture classes.\n\n## At a Glance\n\n| Feature | Status |\n|---|---|\n| Language | C11, zero external deps |\n| Build | `make` → `./vaptvupt` + amalgamation |\n| Wire format | v1 frozen since 1.0.0; v2 opt-in since 2.33.0 |\n| Decode SIMD | AVX2 + NEON with scalar fallback |\n| Multi-thread encode | Optional via `ENABLE_THREADS=1` |\n| Streaming API | Encode + decode |\n| Multi-frame archives | Native support |\n| Security invariants | 14 numbered, all tested and guarded |\n| Tests | **6,032+** standard; **8,732+** with full fuzzer run |\n| Reference impls | C (production) + Python + JavaScript |\n| License | GPL-3.0-or-later |\n\n## Performance — v2.46.0 baseline\n\nMeasured on a 2.1 GHz x86_64 container, library-level (not CLI),\nbest-of-30 warmed runs. Bold marks where VaptVupt leads its class.\n\n### Decode throughput (MB/s, higher is better)\n\n| Content | **VaptVupt `--fast`** | zstd-19 | lz4-9 | gzip-9 |\n|---|---|---|---|---|\n| Random (AEAD ciphertext) | **26,773** | 7,172 | 17,594 | 412 |\n| Binary (pattern-rich) | **14,414** | 8,098 | 19,933 | 598 |\n| Synthetic repeat | **2,029** | 1,786 | 2,278 | 1,140 |\n| JSON / structured | 569 | 1,298 | 2,891 | 471 |\n| Prose text | 569 | 1,290 | 3,144 | 488 |\n\nRandom and pattern-rich binary decode are the dominant paths for\nsecure backup workloads. VaptVupt leads both decisively.\n\n### Compression ratio (input / compressed, extreme mode)\n\nBold marks where VaptVupt meets or beats gzip-9.\n\n| Fixture | **VaptVupt v2** | gzip-9 | zstd-19 | lz4-9 |\n|---|---|---|---|---|\n| synth-json | **4.80×** | 4.65× | 6.68× | 3.46× |\n| synth-binary | **1,149×** | 157× | 2,398× | 194× |\n| synth-repeat | **7,367×** | 403× | 8,463× | 252× |\n| real-bash | 1.92× | 2.09× | 2.32× | 1.83× |\n| real-ls | 2.11× | 2.30× | 2.55× | 2.00× |\n| real-libc.so.6 | 2.09× | 2.23× | 2.56× | 1.94× |\n| real-python3 | 2.64× | 2.84× | 3.46× | 2.34× |\n\n**Format v2 binary gains** (opt in via `--format-v2` or\n`opts.format_v2 = 1`): v1-to-v2 ratio improvements of 2-6% across\nall four real ELF binaries, closing the gap with gzip-9 from\n10-14% down to **4-7%**.\n\n## The `--fast` Flag — Signature Feature\n\nNo other codec offers a principled, documented integrity-hash bypass\nfor AEAD-wrapped archives. When the caller's outer layer (AES-GCM,\nChaCha20-Poly1305, TLS, etc.) already authenticates the compressed\nbytes, XXH64 is redundant work:\n\n```c\nvv_decompress_flags(cmp, clen, dst, dst_cap, VV_DECOMPRESS_SKIP_CHECKSUM);\n```\n\nWith `--fast`, the decoder **still validates**:\n- Frame magic and format version byte\n- Block headers (type, size, last-flag)\n- LZ offset bounds (per-iter check + absolute cap ≤ 1 MB)\n- ANS state bounds\n- Buffer overshoot guards on wildcopy paths\n\nIt only skips the XXH64 cryptographic hash of decoded bytes. For\nZupt-style archives this delivers **2-5× decode speedup** at zero\nsecurity cost.\n\n## Quick Start\n\n```c\n#include \"vaptvupt.h\"\n\n/* One-shot compress */\nvv_options_t opts;\nvv_default_options(\u0026opts);\nopts.mode = VV_MODE_BALANCED;\n\nsize_t cap = vv_compress_bound(src_len);\nuint8_t *dst = malloc(cap);\nint64_t csz = vv_compress(src, src_len, dst, cap, \u0026opts);\n/* csz is compressed size, or negative error code */\n\n/* One-shot decompress */\nvv_frame_info_t info;\nvv_get_frame_info(compressed, csz, \u0026info);\nuint8_t *out = malloc(info.content_size);\nint64_t dsz = vv_decompress(compressed, csz, out, info.content_size);\n```\n\n## Streaming API\n\nFor large files or memory-constrained use. **API contract**: `dst`\nmust be a stable buffer base passed every call; `*written` is the\ncumulative total, not the delta.\n\n```c\n/* Compress in chunks */\nvv_cstream_t *c = vv_cstream_create(\u0026opts);\nuint8_t chunk[65536];\nwhile (size_t n = read_from_file(chunk, sizeof(chunk))) {\n    int is_last = /* 1 on final chunk */;\n    size_t written;\n    vv_cstream_compress_chunk(c, chunk, n, out, cap, \u0026written, is_last);\n    write_to_stream(out, written);\n}\nvv_cstream_destroy(c);\n\n/* Decompress in chunks — stable dst, cumulative written */\nvv_dstream_t *d = vv_dstream_create();\nsize_t total_written = 0;\nwhile (size_t n = read_compressed(buf, sizeof(buf))) {\n    size_t consumed, written;\n    int rc = vv_dstream_decompress_chunk(d, buf, n,\n                                          out, out_cap,  /* stable */\n                                          \u0026consumed, \u0026written);\n    total_written = written;  /* cumulative, not += */\n    if (rc == 1) break;       /* frame done */\n    if (rc \u003c 0) error();\n}\nvv_dstream_destroy(d);\n```\n\n## Multi-Threaded Compression\n\n```c\n/* Requires ENABLE_THREADS=1 at build time for actual parallelism.\n * Without it, falls back to sequential encoding. */\nint64_t sz = vv_compress_mt(src, src_len, dst, dst_cap, \u0026opts,\n                             /*nthreads=*/0,     /* 0 = auto */\n                             /*chunk_size=*/0);  /* 0 = 4 MB */\n/* Output is a valid .vv stream; decompress with regular vv_decompress */\n```\n\nTrade-off: each frame loses cross-frame match history (~0.05-2% ratio\nhit). Default chunk size keeps this under 1% on typical data.\n\n## Context Reuse — Per-File Workflows\n\nBackup tools compressing many small files should reuse one context\nto avoid per-file allocation cost (~1.67× faster than `vv_compress`\nin a loop):\n\n```c\nvv_cstream_t *c = vv_cstream_create(\u0026opts);\nfor (each file) {\n    vv_cstream_reset(c, NULL);\n    size_t written;\n    vv_cstream_compress_chunk(c, file_data, file_size,\n                               out, cap, \u0026written, /*is_last=*/1);\n    /* write `out` (written bytes) to archive */\n}\nvv_cstream_destroy(c);\n```\n\n## CLI\n\n```sh\n# Build\nmake                                     # sequential, zero deps\nmake ENABLE_THREADS=1                    # with pthread\n\n# Use\n./vaptvupt -c -m balanced input.log      # compress\n./vaptvupt -c -m balanced -T 4 file.log  # 4-thread compress\n./vaptvupt -c -m extreme file            # maximum ratio\n./vaptvupt -d file.vv                    # decompress\n```\n\n## Testing\n\n```sh\nmake test            # all 6,557 tests\nmake fuzz            # extended fuzz (50,000 cases)\nmake bench-update    # regenerate ratio baseline after intentional codec changes\nmake speed-update    # regenerate speed baseline (machine-specific)\n\n# Production-grade confidence run:\npython3 tests/fuzz_differential.py --iters 2000   # 10,200 cases\n```\n\n### Test breakdown\n\n| Layer | Tests | Protects against |\n|---|---|---|\n| C unit tests (10 binaries) | 666 | correctness, edge cases, spec compliance |\n| Format-v2 regression (`test_seq_v2`) | 18 | 'T' tag encoder/decoder correctness |\n| **Safe-zone adversarial (v2.46.0)** | **55** | **v2.39.0 bounds-elision boundary bugs** |\n| Skip-checksum tests | 18 | `--fast` flag round-trips |\n| Streaming API fuzzer | 495 | chunk-boundary bugs across 11 fixtures |\n| Python decoder | 11 | independent spec validation (decode side) |\n| Python encoder | 13 | independent spec validation (encode side) |\n| JavaScript decoder | 17 | cross-language spec validation + browser decode |\n| Negative corpus | 27 | C/Python decoder consistency on malformed input |\n| Differential fuzzer (standard) | 5,200 | CLI cross-decoder divergence (5 strategies + v2) |\n| Differential fuzzer (extended) | 10,200 | production-grade confidence |\n| Ratio gate | 30 | compression-ratio regressions (0-byte tolerance) |\n| Speed gate | 6 | decode-speed regressions (20% tolerance) |\n| **Total (standard)** | **6,557** | |\n| **Total (production run)** | **11,556** | |\n\n## Wire Format \u0026 Reference Implementations\n\nThe on-wire format is fully documented in [FORMAT.md](FORMAT.md) —\nsufficient to implement a compatible decoder in any language without\nreading the C source.\n\nReference implementations in multiple languages serve as a\ncross-validation suite:\n\n**Python** (`reference/`):\n- `vv_decoder.py` — decodes RAW/RLE/COMPRESSED blocks, ENTROPY 'A'\n  (single-stream tANS) blocks, ENTROPY 'S' (SEQ — the tag produced\n  by the current encoder) blocks, multi-frame streams, and XXH64\n  footer verification. Legacy ENTROPY tags 'H'/'I'/'C' (from\n  format versions v0.3-v0.7, never emitted by modern encoders)\n  raise `NotImplementedError`.\n- `vv_encoder.py` — produces RAW+RLE frames. Output is wire-\n  compatible with the C decoder.\n- `vv_ans.py` — tANS primitives plus `vva_decode_sequences` for\n  the 'S' tag (~280 lines).\n\nBoth the Python and JavaScript reference decoders now cover\n**100% of output produced by the current encoder** — any `.vv`\nfile from v1.0+ decodes identically in C, Python, and JavaScript.\n\n**JavaScript** (`reference/`):\n- `vv_decoder.js` — pure-JS decoder targeting Node.js v14+ and\n  modern browsers (requires `BigInt` + `Uint8Array`). Covers\n  RAW/RLE/COMPRESSED, multi-frame, XXH64 footer, **and the 'S'\n  (VV_ENTROPY_SEQ) tag** — which means it decodes 100% of output\n  produced by the current encoder. Legacy ENTROPY tags H/A/I/C\n  (only emitted by format v0.3-v0.7) throw\n  `NotImplementedError`.\n\n  Primary use case: **browser-side reading of Zupt archives\n  without shipping a WebAssembly C build**. Any real-world\n  v1.0+ archive decodes natively in ~500 lines of JS.\n\n  Self-test (Node): `node reference/vv_decoder.test.js` — 14/14\n  pass, 0 skip. Includes a 100KB and 500KB case exercising\n  cross-block dict carry and the full 'S' tag state machine.\n\n`make test` round-trips Python-encoded → C-decoded, C-encoded →\nPython-decoded, AND C-encoded → JS-decoded. The 27-case negative\ncorpus proves both Python and C decoders reject malformed input\nidentically.\n\nFormat is **stable since v1.0.0**. Future format changes will bump\nthe frame header version byte so older decoders reject newer files\nexplicitly rather than silently corrupting them.\n\n## Integration\n\nDrop `build/vaptvupt.c` and `build/vaptvupt.h` into your project.\nSupports:\n- GCC / Clang on Linux, macOS, BSD\n- x86_64 with AVX2 (SIMD decode) — graceful scalar fallback\n- Zero external dependencies beyond libc\n- Optional `-DVV_ENABLE_THREADS -lpthread` for parallel encode\n\n## Regression Protection\n\nEvery commit runs two regression gates as part of `make test`:\n\n- **`tests/bench_gate.py`** — compresses 10 fixtures in 3 modes and\n  fails on any fixture producing more bytes than the committed\n  baseline. Zero-byte tolerance. Also tracks new contract violations\n  (extreme \u003e balanced).\n- **`tests/speed_gate.py`** — measures decode throughput on 6\n  fixtures with median-of-15 sampling. Fails on \u003e20% regression vs\n  baseline (noise-tolerant; speed varies 5-15% per run in containers).\n\nThe ratio gate caught one real codec bug during development\n(v2.24.0 extreme-mode regression on text) and has prevented at\nleast one proposed change from shipping with hidden regressions.\n\n## License\n\nGPL-3.0-or-later (see CHANGELOG for Zupt-bundle MIT+Apache note).\n\n## Project State\n\nAs of v2.46.0:\n\n- **70+ sprints** of development history (see [CHANGELOG.md](CHANGELOG.md))\n- **Zero wire-format corruption bugs since v2.44.0** — the LL-coding\n  65,536-byte boundary bug latent since v0.8 was identified and fixed\n  by integration testing, then regression-locked\n- **Three independent reference implementations** (C production,\n  Python reference, JavaScript reference) — all byte-exact\n- **Dual CI regression gates** (ratio + speed) with 0-byte tolerance\n- **6,032+ tests with 0 failures, 0 skips** on the standard run\n- **Format v2 shipping** since v2.33.0 — `--format-v2` delivers 4-7%\n  better binary ratios with zero back-compat risk\n- **v2.46.0 Huffman-in-SEQ** — Huffman as a fourth literal coder\n  competing with ANS4/ANS1/raw per-block, delivering uniform 0.5-5.5%\n  ratio improvement across all 18 measured fixtures\n- **Production-ready for Zupt 2.1.6** — see\n  [ZUPT_INTEGRATION.md](ZUPT_INTEGRATION.md)\n\nThe codec **beats zstd-3 on three Silesia fixtures** (fx_json, x-ray,\nsao) as of v2.46.0, **beats gzip-9 across the board**, and **beats\nlz4 on random-data decode** with `--fast`. On real ELF binaries,\nformat v2 has closed the gap with zstd-3 to **2-3% (libc.so.6, bash)**.\nClosing the remaining gap on small-file high-compression workloads\nrequires structural parser improvements (optimal parse) — future\nsprint work.\n\nSee [COMPETITIVE.md](COMPETITIVE.md) for the complete measurement\nmatrix and [ZUPT_INTEGRATION.md](ZUPT_INTEGRATION.md) for the\nproduction integration guide.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcristiancmoises%2Fvaptvupt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcristiancmoises%2Fvaptvupt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcristiancmoises%2Fvaptvupt/lists"}