{"id":18507878,"url":"https://github.com/espadrine/compressor-benchmark","last_synced_at":"2026-01-24T14:45:16.691Z","repository":{"id":136937618,"uuid":"211620409","full_name":"espadrine/compressor-benchmark","owner":"espadrine","description":"Look into the range of real-world circumstances where various compression programs win.","archived":false,"fork":false,"pushed_at":"2019-09-30T21:34:44.000Z","size":28,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-14T09:17:38.901Z","etag":null,"topics":["benchmark","brotli","bzip2","compression","gzip","loading","lz4","lzip","pareto","sending","xz","zstandard"],"latest_commit_sha":null,"homepage":null,"language":"Makefile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/espadrine.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-09-29T07:13:16.000Z","updated_at":"2019-09-30T21:34:46.000Z","dependencies_parsed_at":null,"dependency_job_id":"6a92bf25-ca8e-4f00-9653-21737fad2a9c","html_url":"https://github.com/espadrine/compressor-benchmark","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/espadrine/compressor-benchmark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/espadrine%2Fcompressor-benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/espadrine%2Fcompressor-benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/espadrine%2Fcompressor-benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/espadrine%2Fcompressor-benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/espadrine","download_url":"https://codeload.github.com/espadrine/compressor-benchmark/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/espadrine%2Fcompressor-benchmark/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28730186,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-24T10:24:43.181Z","status":"ssl_error","status_checked_at":"2026-01-24T10:24:36.112Z","response_time":89,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","brotli","bzip2","compression","gzip","loading","lz4","lzip","pareto","sending","xz","zstandard"],"created_at":"2024-11-06T15:12:42.937Z","updated_at":"2026-01-24T14:45:16.683Z","avatar_url":"https://github.com/espadrine.png","language":"Makefile","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Compressor benchmark\n\n[Many][zstd benchmark] [benchmarks][mattmahoney] [focus][Lzip benchmark] on\nintangible measurements, analyzing the relationship between compression speed\nand ratio without providing intuition as to their real-world benefits.\n\n(Notable improvements are the [LZ4 benchmark][LZ4] and [Squash][], but they are\neither not as extensive or not as focused as this article.)\n\nThis benchmark aims to bridge that gap, while comparing the most widely deployed\ncompression schemes in the world.\n\n## Aims\n\nThe motivation for compressing data mostly falls within three categories:\n\n### 1. Size\n\nYou wish to reduce *disk, memory or network data use*. For instance, you are a\npackage distribution service, or store lots of logs or files. What you want to\nmaximize then is **compression ratio**: how big a megabyte of compressed data\nis, once uncompressed. For instance, a ratio of 3 means that a 1 MB zip on disk\nstores 3 MB of content, effectively tripling the amount of disk space.\n\n![Compression chart](./plots/compression.svg)\n\nThe winner there is **Lzip**, closely followed by **XZ**.\n\n(Lzip and XZ contain the same compression algorithm, LZMA, which relies on\nrange encoding, a derivative of arithmetic coding, a compute-intensive method\nthat has very high compression power.)\n\nOf course, storage costs may need to be balanced against CPU costs of running\ncompression. The faster the compression, the lower the CPU cost.\n\n![Compression speed chart](./plots/compression-speed.svg)\n\nAssuming we have a constraint on both parameters computed as a linear\ncombination, it is representable as a line on the plot that “falls” from the top\nright. If disk space is more important, the line will be closer to the\nhorizontal; if CPU is more relevant, it will be more vertical.\n(That said, [storage costs usually dominate][jdlm].)\n\nA line that is going up is not optimal, so all positive slopes can be discarded.\nIf you imagine that line starting horizontal at the top, and slowly sloping\ndown, it will come into contact with various compressors, until it reaches a\nvertical state.\n\nThat creates a convex hull around the compressors, corresponding to the best\nchoices for all weighted compromises between storage and CPU costs.\n\n### 2. Loading\n\nYou want *fast downloads*. For instance, you serve assets for a website.\nTo truly shine there, you need to have previously compressed the files to disk\nonce. The time needed to get the file loaded from the server to your client is\nthe sum of two pieces: first, you need to transfer the compressed zip, then the\nclient needs to extract it.\n\nYour hope is that the time won by transfering less bytes will more than make up\nfor the time lost by extracting them.\n\n         Transfer 1 MB of raw data at 1 MB/s\n    ┌───────────────────────────────────────────┐\n    └───────────────────────────────────────────┘\n    └──────────────── 1 second ─────────────────┘\n\n      Transfer (1÷ratio) MB   Extract 1 MB\n    ┌───────────────────────┬──────────────┐\n    └───────────────────────┴──────────────┘\n    └──────── 203 ms ───────┴──── 3 ms ────┘\n\nUsually, for a given algorithm, the **decompression speed** (the number of\nmegabytes decompressed every second) is the same regardless of the compression\nlevel (ie, how viciously the compressor tried to reduce the size of the file).\nSo the extraction time is a constant of the raw data size.\n\nTherefore, you always get linearly higher download speeds with higher\ncompression levels. You want to compare compression software only between the\nflags that yield the highest ratio.\n\n![Loading chart](./plots/loading.svg)\n\nThe winner in this field seems to be Zstandard and Brotli, with XZ and Lzip not\nfar behind (in this order).\n\n(At least, at 1 MB/s. As network bandwidth grows closer to the decompression\nspeed, the bottleneck becomes decompression. Past a certain point, the\ncompressors get slower than sending data without compression, and you need\ncompressors with much higher decompression speeds, like Zstandard and LZ4, to\ncompete.)\n\n### 3. Sending\n\nIf you are looking for the *fastest transfer time*, and you don't want to keep\nthe compressed version, you may be able to gain a little extra transfer speed by\ncompressing the file first, if the network bandwidth is not too fast.\n\nTo compute this, we add the time it takes to compress the file to the loading\ntime we computed previously.\n\nAt 1 MB/s, when the sending time (compression + transfer + extraction) per\nmegabyte takes more than 1 second, we are better off sending the raw file.\n\nUnlike the decompression speed, the **compression speed** slows linearly with\nthe compression level. As a result, the sending time is a U curve over\ncompression levels, for any given compressor.\n\n![Sending chart](./plots/sending.svg)\n\nThe winner seems to be Zstandard -6, with Brotli -5 not far behind twenty\nmilliseconds later. Gzip -5 is relevant again 20 milliseconds later, closely\nfollowed by bzip2. XZ and Lzip are many tens of milliseconds slower than the\nrest even at the lowest compression level.\n\nThe big surprise is bzip2. With its consistently high compression speed, it\nremains competitive throughout its levels. If you are ready to compromise\na tiny bit of transfer time to gain space, you can transmit one more megabyte\nfor each megabyte transmitted.\n\n## Compression utilities\n\nI chose tools that were ❶ readily available, ❷ [popular][Google Trends], and ❸\naddress various segments of the Pareto frontier.\n\n\u003ctable\u003e\n  \u003ctr\u003e\u003cth\u003e Name \u003cth\u003e Algorithm                   \u003cth\u003e Related archivers\n  \u003ctr\u003e\u003ctd\u003e\u003ca href='https://www.gzip.org/'\u003egzip\u003c/a\u003e\n                \u003ctd\u003e DEFLATE (LZSS, Huffman)     \u003ctd\u003e ZIP, PNG\n  \u003ctr\u003e\u003ctd\u003e\u003ca href='http://sourceware.org/bzip2/'\u003ebzip2\u003c/a\u003e\n                \u003ctd\u003e BWT, MTF, RLE, Huffman      \u003ctd\u003e\n  \u003ctr\u003e\u003ctd\u003e\u003ca href='https://tukaani.org/xz/'\u003eXZ\u003c/a\u003e\n                \u003ctd\u003e LZMA (LZ77, range encoding) \u003ctd\u003e 7-Zip\n  \u003ctr\u003e\u003ctd\u003e\u003ca href='https://www.nongnu.org/lzip/lzip.html'\u003eLzip\u003c/a\u003e\n                \u003ctd\u003e LZMA (LZ77, range encoding) \u003ctd\u003e 7-Zip\n  \u003ctr\u003e\u003ctd\u003e\u003ca href='https://github.com/google/brotli/blob/master/README.md'\u003eBrotli\u003c/a\u003e\n                \u003ctd\u003e LZ77, Huffman, context modeling \u003ctd\u003e WOFF2\n  \u003ctr\u003e\u003ctd\u003e\u003ca href='https://lz4.github.io/lz4/'\u003eLZ4\u003c/a\u003e\n                \u003ctd\u003e LZ77                        \u003ctd\u003e\n  \u003ctr\u003e\u003ctd\u003e\u003ca href='https://facebook.github.io/zstd/'\u003eZstandard\u003c/a\u003e\n                \u003ctd\u003e LZ77, tANS, Huffman         \u003ctd\u003e\n\u003c/table\u003e\n\n## Caveats\n\n- A given algorithm cannot compress all inputs to a smaller size. In particular,\n  already-compressed inputs (such as images) might actually compress to a bigger\n  file.\n- Certain implementations may make use of CPU-dependent optimizations that can\n  improve their performance on other devices.\n\n[zstd benchmark]: https://raw.githubusercontent.com/facebook/zstd/master/doc/images/DCspeed5.png\n[mattmahoney]: http://mattmahoney.net/dc/text.html\n[Lzip benchmark]: https://www.nongnu.org/lzip/lzip_benchmark.html\n[Squash]: https://quixdb.github.io/squash-benchmark/\n[jdlm]: https://jdlm.info/articles/2017/05/01/compression-pareto-docker-gnuplot.html\n[Google Trends]: https://trends.google.com/trends/explore?cat=32\u0026date=today%205-y\u0026q=%2Fm%2F03bzt,%2Fm%2F0hjcb,%2Fm%2F063ynsr,%2Fg%2F11c1p5xyz2,%2Fm%2F011v70tt\n[gzip]: https://www.gzip.org/\n[bzip2]: http://sourceware.org/bzip2/\n[XZ]: https://tukaani.org/xz/\n[Lzip]: https://www.nongnu.org/lzip/lzip.html\n[Brotli]: https://github.com/google/brotli/blob/master/README.md\n[LZ4]: https://lz4.github.io/lz4/\n[Zstandard]: https://facebook.github.io/zstd/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fespadrine%2Fcompressor-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fespadrine%2Fcompressor-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fespadrine%2Fcompressor-benchmark/lists"}