{"id":35771466,"url":"https://github.com/pmarreck/par2z","last_synced_at":"2026-01-07T04:19:06.168Z","repository":{"id":331158057,"uuid":"1125371729","full_name":"pmarreck/par2z","owner":"pmarreck","description":"A cleanroom reimplementation of the par2 error-correction algorithm, library and cli in Zig, based only on publicly-available specs (NOT its source code), and with a more permissive license, a C ABI wrapper, and a LuaJIT FFI example. Full test coverage. NOTE: gpt-5.2-codex and Claude Opus 4.5 AI assisted in parts.","archived":false,"fork":false,"pushed_at":"2026-01-03T20:07:12.000Z","size":401,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"yolo","last_synced_at":"2026-01-03T20:29:13.720Z","etag":null,"topics":["claude-code","codex-cli","error-correcting-code","library","par2"],"latest_commit_sha":null,"homepage":"","language":"Zig","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pmarreck.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2025-12-30T15:57:59.000Z","updated_at":"2026-01-03T20:07:15.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/pmarreck/par2z","commit_stats":null,"previous_names":["pmarreck/par2z"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/pmarreck/par2z","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pmarreck%2Fpar2z","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pmarreck%2Fpar2z/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pmarreck%2Fpar2z/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pmarreck%2Fpar2z/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pmarreck","download_url":"https://codeload.github.com/pmarreck/par2z/tar.gz/refs/heads/yolo","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pmarreck%2Fpar2z/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28232396,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2026-01-07T02:00:05.975Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["claude-code","codex-cli","error-correcting-code","library","par2"],"created_at":"2026-01-07T04:19:05.156Z","updated_at":"2026-01-07T04:19:06.162Z","avatar_url":"https://github.com/pmarreck.png","language":"Zig","readme":"# par2z\n\n## Overview\nCleanroom PAR2 implementation with a Zig core, C ABI for FFI (Swift/LuaJIT), and a standalone CLI.\n\n## API Layers\n- High-level recovery/verification API in `src/core/api.zig`.\n- Block-level API in `src/core/block_api.zig` for slice-by-slice workflows and custom storage backends.\n- Storage adapters in `src/core/storage.zig` (memory-backed and file-backed) so recovery can run without loading whole files up front.\n- CLI operations moved into `src/ops.zig` (callable from Zig and suitable for C/Swift wrappers). `src/cli.zig` is now a thin CLI parser + I/O shim.\n\n## C API Examples\nThe C ABI is declared in `include/par2.h`. Memory and stream inputs do not touch disk.\n\nThread pool configuration (optional): the library uses a global thread pool by default. You can configure the global pool size or supply your own pool handle via the C ABI.\nHandles are independent and safe to run concurrently. The only shared global state is the thread pool configuration, so set or swap pools before starting work and avoid changing it while operations are active.\n\nCreate from memory (no temp files), write `.par2` to a path:\n```c\n#include \"par2.h\"\n\nconst uint8_t data[] = {0,1,2,3,4,5,6,7};\nPar2CreateHandle *create = NULL;\npar2_create_new(NULL, \u0026create);\npar2_create_add_memory(create, \"data.bin\", data, sizeof(data));\npar2_create_set_output_path(create, \"set.par2\");\npar2_create_run(create);\npar2_create_destroy(create);\n\n// optional: configure global pool size (0 = default)\npar2_thread_pool_configure(0);\n```\n\nVerify from in-memory PAR2 bytes and a stream input:\n```c\n#include \"par2.h\"\n\nstruct MemCtx { const uint8_t *data; size_t len; };\nstatic size_t read_at(void *ctx, uint64_t off, uint8_t *out, size_t len) {\n\tstruct MemCtx *m = (struct MemCtx *)ctx;\n\tif (off \u003e= m-\u003elen) return 0;\n\tsize_t avail = m-\u003elen - (size_t)off;\n\tsize_t n = (avail \u003c len) ? avail : len;\n\tmemcpy(out, m-\u003edata + off, n);\n\treturn n;\n}\n\nPar2VerifyHandle *verify = NULL;\npar2_verify_new(NULL, \u0026verify);\npar2_verify_add_par2_data(verify, par2_bytes, par2_len, \"set.par2\"); // call multiple times for volumes\npar2_verify_add_stream(verify, \"data.bin\", data_len, read_at, \u0026mem_ctx);\npar2_verify_run(verify);\npar2_verify_destroy(verify);\n```\n\nRecover with output callback (no disk output):\n```c\n#include \"par2.h\"\n\nstatic size_t write_out(void *ctx, const uint8_t *data, size_t len) {\n\t(void)ctx;\n\t/* append to a buffer */\n\treturn len;\n}\n\nstatic Par2Error open_out(void *ctx, const char *path, Par2Output *out) {\n\t(void)path;\n\tout-\u003ectx = ctx;\n\tout-\u003ewrite = write_out;\n\tout-\u003eclose = NULL;\n\treturn PAR2_OK;\n}\n\nPar2RecoverHandle *recover = NULL;\npar2_recover_new(NULL, \u0026recover);\npar2_recover_set_par2_path(recover, \"set.par2\");\npar2_recover_add_path(recover, \"data.bin\");\npar2_recover_set_output_open(recover, open_out, NULL);\npar2_recover_run(recover);\npar2_recover_destroy(recover);\n\n// optional: caller-owned pool\nPar2ThreadPool *pool = NULL;\npar2_thread_pool_create(4, \u0026pool);\npar2_thread_pool_set_global(pool);\npar2_thread_pool_set_global(NULL);\npar2_thread_pool_destroy(pool);\n```\n\n### Swift (FFI)\nMinimal Swift usage with `dlopen` (or link against a built dylib):\n```swift\nimport Foundation\n\ntypealias Par2CreateHandle = OpaquePointer\ntypealias Par2Error = Int32\n\n@_silgen_name(\"par2_create_new\") func par2_create_new(_ opts: UnsafeRawPointer?, _ out: UnsafeMutablePointer\u003cPar2CreateHandle?\u003e) -\u003e Par2Error\n@_silgen_name(\"par2_create_add_memory\") func par2_create_add_memory(_ h: Par2CreateHandle?, _ name: UnsafePointer\u003cCChar\u003e, _ data: UnsafePointer\u003cUInt8\u003e, _ len: Int) -\u003e Par2Error\n@_silgen_name(\"par2_create_set_output_path\") func par2_create_set_output_path(_ h: Par2CreateHandle?, _ path: UnsafePointer\u003cCChar\u003e) -\u003e Par2Error\n@_silgen_name(\"par2_create_run\") func par2_create_run(_ h: Par2CreateHandle?) -\u003e Par2Error\n@_silgen_name(\"par2_create_destroy\") func par2_create_destroy(_ h: Par2CreateHandle?)\n\nlet payload: [UInt8] = [0,1,2,3,4,5,6,7]\nvar handle: Par2CreateHandle?\n_ = par2_create_new(nil, \u0026handle)\npayload.withUnsafeBytes { buf in\n\t_ = par2_create_add_memory(handle, \"data.bin\", buf.bindMemory(to: UInt8.self).baseAddress!, buf.count)\n}\n_ = par2_create_set_output_path(handle, \"set.par2\")\n_ = par2_create_run(handle)\npar2_create_destroy(handle)\n```\n\n### LuaJIT (FFI)\n```lua\nlocal ffi = require(\"ffi\")\nffi.cdef[[\ntypedef struct Par2CreateHandle Par2CreateHandle;\ntypedef int Par2Error;\nPar2Error par2_create_new(const void *opts, Par2CreateHandle **out_handle);\nPar2Error par2_create_add_memory(Par2CreateHandle *h, const char *name, const uint8_t *data, size_t len);\nPar2Error par2_create_set_output_path(Par2CreateHandle *h, const char *par2_path);\nPar2Error par2_create_run(Par2CreateHandle *h);\nvoid par2_create_destroy(Par2CreateHandle *h);\n]]\n\nlocal lib = ffi.load(\"par2\") -- or full path to libpar2.dylib/.so\nlocal data = ffi.new(\"uint8_t[8]\", {0,1,2,3,4,5,6,7})\nlocal handle = ffi.new(\"Par2CreateHandle*[1]\")\nlib.par2_create_new(nil, handle)\nlib.par2_create_add_memory(handle[0], \"data.bin\", data, 8)\nlib.par2_create_set_output_path(handle[0], \"set.par2\")\nlib.par2_create_run(handle[0])\nlib.par2_create_destroy(handle[0])\n```\n\n## CLI\n- Verify: `par2z-cli verify [options] \u003cpar2 file\u003e [data files...]`\n- Recover: `par2z-cli recover [options] \u003cpar2 file\u003e [data files...]`\n- Recover to stdout: `par2z-cli recover --stdout [options] \u003cpar2 file\u003e [data files...]`\n- Create: `par2z-cli create [options] \u003cpar2 file\u003e \u003cdata files...\u003e`\n- LuaJIT adapter CLI (FFI): `par2z-cli-luajit` (installed to `zig-out/bin/par2z-cli-luajit` by `zig build`)\n\nBehavior notes:\n- `verify`/`recover` match inputs by exact path when possible, then by basename. Ambiguous basenames cause an error unless exact paths are used.\n- Defaults: redundancy 5%, block size via file-size heuristic (bitrot_guard).\n- Use `--mute-defaults` or set `PAR2_MUTE_DEFAULTS` (non-empty, not `0`/`false`) to suppress default reporting and derived plan on stderr.\n- Set `STDOUT_TO_STDERR` (non-empty, not `0`/`false`) to redirect informational stdout messages to stderr (does not affect `--stdout` file data).\n- `--tar` on `create` or `recover` emits a tar stream on stdout (main+volumes or recovered files).\n- Binary output note: for `--stdout`/`--tar`, avoid capturing stdout into shell variables unless you use a binary-safe wrapper (e.g., `capture -p`).\n- `--include-input-slices` emits `FileSlic` packets (large size increase).\n- `--emit-packed` emits `PkdMain` and `PkdRecvS` packets.\n- RFSC packets are emitted by default when recovery volumes exceed 16 KiB; use `--no-rfsc` to skip.\n- Volume files duplicate `Main`, `FileDesc`, `IFSC`, and `Creator` by default for compatibility; use `--no-volume-meta` to omit.\n- Unicode filename packets are emitted when non-ASCII file names are present.\n- Unicode comment packets are emitted when transliteration is possible; otherwise Unicode-only.\n\nVerify/Recover options:\n- `-B \u003cpath\u003e`: basepath used to resolve relative `FileDesc` names.\n- `-m \u003cMB\u003e`: memory cap (fail if estimated or actual usage exceeds).\n- `-v/-q`: verbosity control (`-q -q` is silent).\n- `-o, --out-dir \u003cdir\u003e`: output directory for recovered files.\n- `--stdout`: recover to stdout (requires exactly one missing file).\n- `--allow-unsafe-paths`: allow absolute/`..` paths from `FileDesc` (unsafe).\n\nCreate options:\n- `-s \u003cbytes\u003e` / `--block-size \u003cbytes\u003e`: block size (mutually exclusive with `-b`).\n- `-b \u003ccount\u003e` / `--block-count \u003ccount\u003e`: block count (mutually exclusive with `-s`).\n- `-r \u003cpercent\u003e` / `--redundancy-percent \u003cpercent\u003e`: redundancy percent (mutually exclusive with `-c`).\n- `-c \u003ccount\u003e` / `--recovery-blocks \u003ccount\u003e`: recovery blocks (mutually exclusive with `-r`).\n- `-f \u003cindex\u003e`: first recovery block number (offsets volume indices).\n- `-u`: uniform recovery file sizes.\n- `-l`: limit recovery file sizes (based on largest input file).\n- `-n \u003ccount\u003e`: number of recovery files (max 31; incompatible with `-l`).\n- `-R`: recurse into subdirectories for input paths.\n\nFull-file hash verification:\n- `verify` falls back to full-file MD5 when IFSC packets are missing.\n- `recover` always validates the full-file MD5 after reconstruction.\n\n## Testing\n- Unit tests: `nix develop -c ./test`\n- If `zig build test` hangs on C-API tests (Zig `--listen` runner issue), use: `nix develop -c zig build test-direct`\n- Integration recovery test (par2 cross-check): `nix develop -c ./test-integration`\n- Optional stress tests:\n  - `PAR2_STRESS=1` enables stress-only unit tests.\n  - `PAR2_STRESS_SIZE=\u003cbytes\u003e` sets large-file size for the stress test (default 134217728).\n  - Example: `PAR2_STRESS=1 PAR2_STRESS_SIZE=268435456 nix develop -c ./test`\n- Memory usage (RSS) logging: `./memtest`\n  - Uses `/usr/bin/time -l` on macOS or `/usr/bin/time -v` on Linux.\n  - Logs max RSS in bytes to `mem-results.tsv` by default.\n  - `PAR2_MEM_SIZE`, `PAR2_MEM_BLOCK_SIZE`, `PAR2_MEM_REDUNDANCY`, `PAR2_MEM_ITERS`, `PAR2_MEM_SEED`, `PAR2_MEM_SEQ` are supported.\n\n## Benchmarks\n\nRecent results (16 MiB file, 4KB blocks, 10% redundancy, Apple M-series):\n\n| Tool | Create | Verify | Repair |\n|------|--------|--------|--------|\n| par2cmdline 0.8.1 | 12.2 MiB/s | 168.4 MiB/s | 85.6 MiB/s |\n| par2cmdline-turbo 1.3.0 | 172.0 MiB/s | 363.6 MiB/s | 111.9 MiB/s |\n| par2z-cli | 10.4 MiB/s | 166.7 MiB/s | 56.9 MiB/s |\n\nSee `bench-results.tsv` for the full benchmark log (last updated 2025-12-31T18:50:43Z).\npar2z performs comparably to the original par2cmdline. par2cmdline-turbo is significantly faster, likely due to hand-optimized SIMD assembly for GF(2^16) multiplication (we have not examined its source code to maintain cleanroom status). See `TODO.md` for optimization opportunities.\n\nRun `bench` or `bench-all` to compare implementations:\n\nEnv vars:\n- `PAR2_CLI_BIN` path to our CLI (default `zig-out/bin/par2z-cli`)\n- `PAR2_OTHER_BIN` path to other PAR2 CLI (default `par2`)\n- `PAR2_BENCH_SIZE` bytes (default 67108864)\n- `PAR2_BENCH_BLOCK_SIZE` bytes (default 4096)\n- `PAR2_BENCH_REDUNDANCY` percent (default 10)\n- `PAR2_BENCH_ITERS` iterations (default 3)\n- `PAR2_BENCH_CORRUPT_BYTES` bytes to corrupt before repair (default 4096)\n- `PAR2_PRNG_GEN` path to deterministic generator (default `zig-out/bin/prng-gen`)\n- `PAR2_BENCH_SEED` seed for deterministic data (default 1)\n- `PAR2_BENCH_SEQ` stream selector for deterministic data (default 1)\n- `PAR2_BENCH_OPTIMIZE` Zig optimize mode (default ReleaseFast)\n- `PAR2_BENCH_BUILD` rebuild par2z-cli before running (default 1; set 0 to skip)\n- `PAR2_BENCH_LOG` path to bench log (default `bench-results.tsv`)\n\nExample:\n```\nPAR2_BENCH_SIZE=134217728 PAR2_BENCH_ITERS=1 ./bench\n```\n\nSweep example:\n```\nPAR2_BENCH_SIZE=16777216 PAR2_BENCH_ITERS=3 ./bench\nPAR2_BENCH_SIZE=67108864 PAR2_BENCH_ITERS=3 ./bench\nPAR2_BENCH_SIZE=268435456 PAR2_BENCH_BLOCK_SIZE=16384 PAR2_BENCH_ITERS=3 ./bench\n```\n\n## Long-Term Data Integrity (Research Notes)\n\nThese notes summarize published guidance and field studies that shape how much redundancy is needed for long-term storage. They are design constraints, not guarantees.\n\nSSD unpowered retention (JEDEC context):\n- Enterprise-class SSDs are typically required to retain data for at least 3 months at 40C when fully worn (JEDEC JESD218/JESD219 context).\n- Client-class SSDs are typically required to retain data for at least 1 year at 30C when fully worn (JEDEC JESD218/JESD219 context).\n- Retention degrades with higher temperature and higher wear; vendors recommend periodic power-on refresh or full read to refresh NAND charge.\n\nHDD latent sector errors:\n- Large field studies show latent sector errors are not independent and exhibit spatial/temporal locality; scrubbing helps catch these before they stack up.\n\nDesign implications for redundancy:\n- Parity budgets (1-5%) are most effective when paired with periodic scrubbing.\n- For SSDs stored unpowered beyond JEDEC retention windows, parity alone is not sufficient; require periodic refresh or additional independent copies.\n\nSources:\n- Dell SSD/NVMe data retention guidance (JEDEC references and power-off recommendations): https://www.dell.com/support/kbdoc/en-mv/000198930/ssd-data-retention-considerations-when-powering-off-systems-for-a-prolonged-duration\n- NetApp latent sector error study (1.53M disks over 32 months): https://www.netapp.com/atg/publications/publications-an-analysis-of-latent-sector-errors-in-disk-drives-20074817/\n- Curtiss-Wright summary of JEDEC client retention requirements and temperature effects: https://defense-solutions.curtisswright.com/media-center/blog/extended-temperatures-flash-memory\n- Example enterprise SSD spec listing 3 months power-off retention at 40C (JESD218): https://www.digikey.com/en/htmldatasheets/production/2042810/0/0/1/intel-ssd-dc-s3520-series-for-150gb.html\n\n## Archival Use Guidance (Non-Normative)\n\nThis project targets long-term archival workflows that want strong integrity without full data duplication. It is designed to add a parity layer on top of existing storage, not to replace independent backups.\n\nPractical expectations:\n- 1-5% parity can address many bit-rot and small loss events when paired with periodic scrubbing.\n- Parity cannot guarantee recovery after catastrophic device failure or long unpowered retention beyond vendor guidance.\n- For higher confidence, use parity plus at least one independent copy stored on a separate device or location.\n\n## License\nApache-2.0. See `LICENSE`.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpmarreck%2Fpar2z","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpmarreck%2Fpar2z","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpmarreck%2Fpar2z/lists"}