{"id":13836050,"url":"https://github.com/avaneev/lzav","last_synced_at":"2025-07-10T13:31:24.849Z","repository":{"id":182373415,"uuid":"668395975","full_name":"avaneev/lzav","owner":"avaneev","description":"Fast In-Memory Data Compression Algorithm (inline C/C++) 460+MB/s compress, 2500+MB/s decompress, ratio% better than LZ4, Snappy, and Zstd@-1","archived":false,"fork":false,"pushed_at":"2024-02-26T03:10:29.000Z","size":64,"stargazers_count":282,"open_issues_count":0,"forks_count":15,"subscribers_count":9,"default_branch":"main","last_synced_at":"2024-03-02T06:15:34.970Z","etag":null,"topics":["compress","compression","compression-algorithm","compression-library","compressor","data-compression","data-compression-algorithms","lossless-compression","lossless-compression-algorithm","lossless-data-compression","lz4","lz77","lzf","snappy"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/avaneev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-19T17:50:42.000Z","updated_at":"2024-05-30T00:44:06.707Z","dependencies_parsed_at":"2023-12-03T05:20:17.555Z","dependency_job_id":"348b6645-cff8-4d08-9547-44e25fb1b901","html_url":"https://github.com/avaneev/lzav","commit_stats":null,"previous_names":["avaneev/lzav"],"tags_count":17,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avaneev%2Flzav","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avaneev%2Flzav/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avaneev%2Flzav/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avaneev%2Flzav/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/avaneev","download_url":"https://codeload.github.com/avaneev/lzav/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225638946,"owners_count":17500654,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compress","compression","compression-algorithm","compression-library","compressor","data-compression","data-compression-algorithms","lossless-compression","lossless-compression-algorithm","lossless-data-compression","lz4","lz77","lzf","snappy"],"created_at":"2024-08-04T15:00:34.132Z","updated_at":"2025-07-10T13:31:24.844Z","avatar_url":"https://github.com/avaneev.png","language":"C","readme":"# LZAV - Fast Data Compression Algorithm (in C/C++)\r\n\r\n## Introduction\r\n\r\nLZAV is a fast general-purpose in-memory data compression algorithm based on\r\nnow-classic [LZ77](https://wikipedia.org/wiki/LZ77_and_LZ78) lossless data\r\ncompression method. LZAV holds a good position on the Pareto landscape of\r\nfactors, among many similar in-memory (non-streaming) compression algorithms.\r\n\r\nLZAV algorithm's code is portable, cross-platform, scalar, header-only,\r\ninlineable C (C++ compatible). It supports big- and little-endian platforms,\r\nand any memory alignment models. The algorithm is efficient on both 32- and\r\n64-bit platforms. Incompressible data almost does not expand. Compliant with\r\nWebAssembly (WASI libc), and runs at just twice lower performance than native\r\ncode.\r\n\r\nLZAV does not sacrifice internal out-of-bounds (OOB) checks for decompression\r\nspeed. This means that LZAV can be used in strict conditions where OOB memory\r\nwrites (and especially reads) that lead to a trap, are unacceptable (e.g.,\r\nreal-time, system, server software). LZAV can be used safely (causing no\r\ncrashing nor UB) even when decompressing malformed or damaged compressed data.\r\nWhich means that LZAV does not require calculation of a checksum (or hash) of\r\nthe compressed data. Only a checksum of the uncompressed data may be required,\r\ndepending on application's guarantees.\r\n\r\nThe internal functions available in the `lzav.h` file allow you to easily\r\nimplement, and experiment with, your own compression algorithms. LZAV stream\r\nformat and decompressor have a potential of high decompression speeds and\r\ncompression ratios, which depends on the way data is compressed.\r\n\r\n## Usage\r\n\r\nTo compress data:\r\n\r\n```c\r\n#include \"lzav.h\"\r\n\r\nint max_len = lzav_compress_bound( src_len );\r\nvoid* comp_buf = malloc( max_len );\r\nint comp_len = lzav_compress_default( src_buf, comp_buf, src_len, max_len );\r\n\r\nif( comp_len == 0 \u0026\u0026 src_len != 0 )\r\n{\r\n    // Error handling.\r\n}\r\n```\r\n\r\nTo decompress data:\r\n\r\n```c\r\n#include \"lzav.h\"\r\n\r\nvoid* decomp_buf = malloc( src_len );\r\nint l = lzav_decompress( comp_buf, decomp_buf, comp_len, src_len );\r\n\r\nif( l \u003c 0 )\r\n{\r\n    // Error handling.\r\n}\r\n```\r\n\r\nTo compress data with a higher ratio, for non-time-critical uses (e.g.,\r\ncompression of application's static assets):\r\n\r\n```c\r\n#include \"lzav.h\"\r\n\r\nint max_len = lzav_compress_bound_hi( src_len ); // Note another bound function!\r\nvoid* comp_buf = malloc( max_len );\r\nint comp_len = lzav_compress_hi( src_buf, comp_buf, src_len, max_len );\r\n\r\nif( comp_len == 0 \u0026\u0026 src_len != 0 )\r\n{\r\n    // Error handling.\r\n}\r\n```\r\n\r\nLZAV algorithm and its source code (which is\r\n[ISO C99](https://en.wikipedia.org/wiki/C99)) were quality-tested with:\r\nClang, GCC, MSVC, Intel C++ compilers; on x86, x86-64 (Intel, AMD), AArch64\r\n(Apple Silicon) architectures; Windows 10, AlmaLinux 9.3, macOS 15.3.1.\r\nFull C++ compliance is enabled conditionally and automatically, when the\r\nsource code is compiled with a C++ compiler.\r\n\r\n## Customizing C++ namespace\r\n\r\nIf for some reason, in C++ environment, it is undesired to export LZAV symbols\r\ninto the global namespace, the `LZAV_NS_CUSTOM` macro can be defined\r\nexternally:\r\n\r\n```c++\r\n#define LZAV_NS_CUSTOM lzav\r\n#include \"lzav.h\"\r\n```\r\n\r\nSimilarly, LZAV symbols can be placed into any other custom namespace (e.g.,\r\na namespace with data compression functions):\r\n\r\n```c++\r\n#define LZAV_NS_CUSTOM my_namespace\r\n#include \"lzav.h\"\r\n```\r\n\r\nThis way, LZAV symbols and functions can be referenced like\r\n`my_namespace::lzav_compress_default(...)`. Note that since all LZAV functions\r\nhave a `static inline` specifier, there can be no ABI conflicts, even if the\r\nheader is included in unrelated, mixed C/C++, compilation units.\r\n\r\n## Comparisons\r\n\r\nThe tables below present performance ballpark numbers of LZAV algorithm\r\n(based on Silesia dataset).\r\n\r\nWhile LZ4 there seems to be compressing faster, LZAV comparably provides 14.8%\r\nmemory storage cost savings. This is a significant benefit in database and\r\nfile system use cases since compression is only about 35% slower while CPUs\r\nrarely run at their maximum capacity anyway (considering cached data writes\r\nare deferred in background threads), and disk I/O times are reduced due to a\r\nbetter compression. In general, LZAV holds a very strong position in this\r\nclass of data compression algorithms, if one considers all factors:\r\ncompression and decompression speeds, compression ratio, and not less\r\nimportant - code maintainability: LZAV is maximally portable and has a rather\r\nsmall independent codebase.\r\n\r\nPerformance of LZAV is not limited to the presented ballpark numbers.\r\nDepending on the data being compressed, LZAV can achieve 800 MB/s compression\r\nand 5000 MB/s decompression speeds. Incompressible data decompresses at 10000\r\nMB/s rate, which is not far from the \"memcpy\". There are cases like the\r\n[enwik9 dataset](https://mattmahoney.net/dc/textdata.html) where LZAV\r\nprovides 21.7% higher memory storage savings compared to LZ4. However, on\r\nsmall data (below 50 KB), compression ratio difference between LZAV and LZ4\r\ndiminishes, and LZ4 may have some advantage (due to smaller minimal\r\nback-reference length).\r\n\r\nLZAV algorithm's geomean performance on a variety of datasets is 550 +/- 150\r\nMB/s compression and 3800 +/- 1300 MB/s decompression speeds, on 4+ GHz 64-bit\r\nprocessors released since 2019. Note that the algorithm exhibits adaptive\r\nqualities, and its actual performance depends on the data being compressed.\r\nLZAV may show an exceptional performance on your specific data, including, but\r\nnot limited to: sparse databases, log files, HTML/XML files.\r\n\r\nIt is also worth noting that compression methods like LZAV and LZ4 usually\r\nhave an advantage over dictionary- and entropy-based coding in that\r\nhash-table-based compression has a small operation and memory overhead while\r\nthe classic LZ77 decompression has no overhead at all - this is especially\r\nrelevant for smaller data.\r\n\r\nFor a more comprehensive in-memory compression algorithms benchmark you may\r\nvisit [lzbench](https://github.com/inikep/lzbench).\r\n\r\n### Apple clang 15.0.0 arm64, macOS 15.3.1, Apple M1, 3.5 GHz\r\n\r\nSilesia compression corpus\r\n\r\n|Compressor      |Compression    |Decompression  |Ratio %        |\r\n|----            |----           |----           |----           |\r\n|**LZAV 4.22**   |618 MB/s       |3820 MB/s      |40.57          |\r\n|LZ4 1.9.4       |700 MB/s       |4570 MB/s      |47.60          |\r\n|Snappy 1.1.10   |495 MB/s       |3230 MB/s      |48.22          |\r\n|LZF 3.6         |395 MB/s       |800 MB/s       |48.15          |\r\n|**LZAV 4.22 HI**|133 MB/s       |3830 MB/s      |35.30          |\r\n|LZ4HC 1.9.4 -9  |40 MB/s        |4360 MB/s      |36.75          |\r\n\r\n### LLVM clang 18.1.8 x86-64, AlmaLinux 9.3, Xeon E-2386G (RocketLake), 5.1 GHz\r\n\r\nSilesia compression corpus\r\n\r\n|Compressor      |Compression    |Decompression  |Ratio %        |\r\n|----            |----           |----           |----           |\r\n|**LZAV 4.22**   |600 MB/s       |3550 MB/s      |40.57          |\r\n|LZ4 1.9.4       |848 MB/s       |4980 MB/s      |47.60          |\r\n|Snappy 1.1.10   |690 MB/s       |3360 MB/s      |48.22          |\r\n|LZF 3.6         |455 MB/s       |1000 MB/s      |48.15          |\r\n|**LZAV 4.22 HI**|117 MB/s       |3530 MB/s      |35.30          |\r\n|LZ4HC 1.9.4 -9  |43 MB/s        |4920 MB/s      |36.75          |\r\n\r\n### LLVM clang-cl 18.1.8 x86-64, Windows 10, Ryzen 3700X (Zen2), 4.2 GHz\r\n\r\nSilesia compression corpus\r\n\r\n|Compressor      |Compression    |Decompression  |Ratio %        |\r\n|----            |----           |----           |----           |\r\n|**LZAV 4.22**   |520 MB/s       |3060 MB/s      |40.57          |\r\n|LZ4 1.9.4       |675 MB/s       |4560 MB/s      |47.60          |\r\n|Snappy 1.1.10   |415 MB/s       |2440 MB/s      |48.22          |\r\n|LZF 3.6         |310 MB/s       |700 MB/s       |48.15          |\r\n|**LZAV 4.22 HI**|116 MB/s       |3090 MB/s      |35.30          |\r\n|LZ4HC 1.9.4 -9  |36 MB/s        |4430 MB/s      |36.75          |\r\n\r\nP.S. Popular Zstd's benchmark was not included here, because it is not a pure\r\nLZ77, much harder to integrate, and has a much larger code size - a different\r\nleague, close to zlib. Here are author's Zstd measurements with\r\n[TurboBench](https://github.com/powturbo/TurboBench/releases), on Ryzen 3700X,\r\non Silesia dataset:\r\n\r\n|Compressor      |Compression    |Decompression  |Ratio %        |\r\n|----            |----           |----           |----           |\r\n|zstd 1.5.5 -1   |460 MB/s       |1870 MB/s      |41.0           |\r\n|zstd 1.5.5 1    |436 MB/s       |1400 MB/s      |34.6           |\r\n\r\n## Notes\r\n\r\n1. LZAV API is not equivalent to LZ4 nor Snappy API. For example, the \"dstl\"\r\nparameter in the decompressor should specify the original uncompressed length,\r\nwhich should have been previously stored in some way, independent of LZAV.\r\n\r\n2. From a technical point of view, peak decompression speeds of LZAV have an\r\nimplicit limitation arising from its more complex stream format, compared to\r\nLZ4: LZAV decompression requires more code branching. Another limiting factor\r\nis a rather big 8 MiB LZ77 window which is not CPU cache-friendly. On the\r\nother hand, without these features it would not be possible to achieve\r\ncompetitive compression ratios while having fast compression speeds.\r\n\r\n3. LZAV supports compression of continuous data blocks of up to 2 GB. Larger\r\ndata should be compressed in chunks of at least 32 MB. Using smaller chunks\r\nmay reduce the achieved compression ratio.\r\n\r\n## Thanks\r\n\r\n* [Paul Dreik](https://github.com/pauldreik), for finding memcpy UB in the\r\ndecompressor.\r\n","funding_links":[],"categories":["C","Compression"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Favaneev%2Flzav","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Favaneev%2Flzav","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Favaneev%2Flzav/lists"}