{"id":23369771,"url":"https://github.com/fast-pack/streamvbyte","last_synced_at":"2025-04-08T09:06:54.838Z","repository":{"id":48917339,"uuid":"56248127","full_name":"fast-pack/streamvbyte","owner":"fast-pack","description":"Fast integer compression in C using the StreamVByte codec","archived":false,"fork":false,"pushed_at":"2025-02-09T02:04:18.000Z","size":272,"stargazers_count":393,"open_issues_count":11,"forks_count":39,"subscribers_count":23,"default_branch":"master","last_synced_at":"2025-04-01T07:48:12.033Z","etag":null,"topics":["arm","compression","integer-compression","neon","simd","ssse3","x64"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fast-pack.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-04-14T15:19:32.000Z","updated_at":"2025-03-24T17:31:36.000Z","dependencies_parsed_at":"2024-01-02T16:26:08.071Z","dependency_job_id":"f271a6b2-ad3f-4f07-aad9-84dcef67bc13","html_url":"https://github.com/fast-pack/streamvbyte","commit_stats":null,"previous_names":["fast-pack/streamvbyte"],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fast-pack%2Fstreamvbyte","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fast-pack%2Fstreamvbyte/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fast-pack%2Fstreamvbyte/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fast-pack%2Fstreamvbyte/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fast-pack","download_url":"https://codeload.github.com/fast-pack/streamvbyte/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247809964,"owners_count":20999816,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arm","compression","integer-compression","neon","simd","ssse3","x64"],"created_at":"2024-12-21T15:06:00.501Z","updated_at":"2025-04-08T09:06:54.815Z","avatar_url":"https://github.com/fast-pack.png","language":"C","readme":"streamvbyte\n===========\n[![Ubuntu 22.04 CI (GCC 9, 10, 11 and 12, LLVM 12, 13, 14)](https://github.com/lemire/streamvbyte/actions/workflows/ubuntu22.yml/badge.svg)](https://github.com/lemire/streamvbyte/actions/workflows/ubuntu22.yml)\n[![Ubuntu 20.04 CI (GCC 9.4 and 10, LLVM 10 and 11)](https://github.com/lemire/streamvbyte/actions/workflows/ubuntu20.yml/badge.svg)](https://github.com/lemire/streamvbyte/actions/workflows/ubuntu20.yml)\n[![macOS 11 CI (LLVM 13, GCC 10, 11, 12)](https://github.com/lemire/streamvbyte/actions/workflows/macos.yml/badge.svg)](https://github.com/lemire/streamvbyte/actions/workflows/macos.yml)\n[![VS16-CI](https://github.com/lemire/streamvbyte/actions/workflows/vs16.yml/badge.svg)](https://github.com/lemire/streamvbyte/actions/workflows/vs16.yml)\n[![VS17-CI](https://github.com/lemire/streamvbyte/actions/workflows/vs.yml/badge.svg)](https://github.com/lemire/streamvbyte/actions/workflows/vs.yml)\n\nStreamVByte is a new integer compression technique that applies SIMD instructions (vectorization) to\nGoogle's Group Varint approach. The net result is faster than other byte-oriented compression\ntechniques.\n\nThe approach is patent-free, the code is available under the Apache License.\n\n\nIt includes fast differential coding.\n\nIt assumes a recent Intel processor (most Intel and AMD processors released after 2010) or an ARM processor with NEON instructions (which is almost all of them except for the tiny cores). Big-endian processors are unsupported at this time, but they are getting to be extremely rare.\n\nThe code should build using most standard-compliant C99 compilers. The provided makefile\nexpects a Linux-like system. We have a CMake build.\n\n# Requirements\n\n* A C99 compatible compiler (GCC 9 and up, LLVM 10 and up, Visual Studio 2019 and up).\n* We support macOS, Linux and Windows. It should be easy to extend support to FreeBSD and other POSIX systems.\n\nFor high performance, you should have either a 64-bit ARM processor or a 64-bit x64 system with SSE 4.1 support. SSE 4.1 was added to Intel processors in 2007 so it is almost certain that your Intel or AMD processor supports it.\n\n# Users\n\nThis library is used by\n\n * [UpscaleDB](https://github.com/cruppstahl/upscaledb),\n * Redis' [RediSearch](https://github.com/RedisLabsModules/RediSearch),\n * [StarRocks](https://github.com/StarRocks/starrocks/),\n * [Facebook Thrift](https://github.com/facebook/fbthrift),\n * [Trinity Information Retrieval framework](https://github.com/phaistos-networks/Trinity),\n * [tilemaker](https://github.com/systemed/tilemaker).\n\n# Usage\n\n\nSee `examples/example.c` for an example.\n\nShort code sample:\n```C\n// suppose that datain is an array of uint32_t integers\nsize_t compsize = streamvbyte_encode(datain, N, compressedbuffer); // encoding\n// here the result is stored in compressedbuffer using compsize bytes\nstreamvbyte_decode(compressedbuffer, recovdata, N); // decoding (fast)\n```\n\nIf the values are sorted, then it might be preferable to use differential coding:\n```C\n// suppose that datain is an array of uint32_t integers\nsize_t compsize = streamvbyte_delta_encode(datain, N, compressedbuffer,0); // encoding\n// here the result is stored in compressedbuffer using compsize bytes\nstreamvbyte_delta_decode(compressedbuffer, recovdata, N,0); // decoding (fast)\n```\nYou have to know how many integers were coded when you decompress. You can store this\ninformation along with the compressed stream.\n\nDuring decoding, the library may read up to `STREAMVBYTE_PADDING` extra bytes\nfrom the input buffer (these bytes are read but never used).\n\nTo verify that the expected size of a stream is correct you may validate it before\ndecoding:\n```C\n// compressedbuffer, compsize, recovdata, N are as above\nif (streamvbyte_validate_stream(compressedbuffer, compsize, N)) {\n    // the stream is safe to decode\n    streamvbyte_decode(compressedbuffer, recovdata, N);\n} else {\n    // there's a mismatch between the expected size of the data (N) and the contents of\n    // the stream, so performing a decode is unsafe since the behaviour is undefined\n}\n```\n\n\n\n\n### 1. Building with CMake:\n\nWe expect a recent CMake. Please make sure that your version of CMake is up-to-date or you may\nneed to adapt our instructions.\n\nThe cmake build system also offers a `libstreamvbyte_static` static library\n(`libstreamvbyte_static` under linux) in addition to\n`libstreamvbyte` shared library (`libstreamvbyte.so` under linux).\n\n`-DCMAKE_INSTALL_PREFIX:PATH=/path/to/install` is optional.\nDefaults to /usr/local{include,lib}\n\n\n\n```\ncmake -DCMAKE_BUILD_TYPE=Release \\\n         -DCMAKE_INSTALL_PREFIX:PATH=/path/to/install \\\n\t -DSTREAMVBYTE_ENABLE_EXAMPLES=ON \\\n\t -DSTREAMVBYTE_ENABLE_TESTS=ON -B build\n\ncmake --build build\n# run the tests like:\nctest --test-dir build\n\n```\n\n#### Installation with CMake\n\n```\ncmake --install build \n```\n\n#### Benchmarking with CMake\n\n\nAfter building, you may run our benchmark as follows:\n\n```\n./build/test/perf\n```\n\nThe benchmarks are not currently built under Windows.\n\n\n### 2. Building with Makefile:\n\n      make\n      ./unit\n\n#### Installation with Makefile\n\nYou can install the library (as a dynamic library) on your machine if you have root access:\n\n      sudo make install\n\nTo uninstall, simply type:\n\n      sudo make uninstall\n\nIt is recommended that you try ``make dyntest`` before proceeding.\n\n#### Benchmarking with Makefile\n\n\nYou can try to benchmark the speed in this manner:\n\n      make perf\n      ./perf\n\nMake sure to run ``make test`` before, as a sanity test.\n\n\nSigned integers\n-----------------\n\nWe do not directly support signed integers, but you can use fast functions to convert signed integers to unsigned integers.\n\n```C\n\n#include \"streamvbyte_zigzag.h\"\n\nzigzag_encode(mysignedints, myunsignedints, number); // mysignedints =\u003e myunsignedints\n\nzigzag_decode(myunsignedints, mysignedints, number); // myunsignedints =\u003e mysignedints\n```\n\nTechnical posts\n---------------\n\n* [Trinity Updates and integer codes benchmarks](https://medium.com/@markpapadakis/trinity-updates-and-integer-codes-benchmarks-6a4fa2eb3fd1) by Mark Papadakis\n* [Stream VByte: breaking new speed records for integer compression](https://lemire.me/blog/2017/09/27/stream-vbyte-breaking-new-speed-records-for-integer-compression/) by Daniel Lemire\n\n\nAlternative encoding\n-------------------------------\n\nBy default, Stream VByte uses 1, 2, 3 or 4 bytes per integer.\nIn the case where you expect many of your integers to be zero, you might try\nthe ``streamvbyte_encode_0124`` and ``streamvbyte_decode_0124`` which use\n0, 1, 2, or 4 bytes per integer.\n\n\nStream VByte in other languages\n--------------------------------\n\n- Rust version by Marshall Pierce ([repository](https://bitbucket.org/marshallpierce/stream-vbyte-rust))\n- Rust version by Trevor McCulloch ([repository](https://github.com/mccullocht/streamvbyte64))\n- Go version by Nelz ([repository](https://github.com/nelz9999/stream-vbyte-go))\n- Go version by Milan Patel (SIMD-accelerated) ([repository](https://github.com/theMPatel/streamvbyte-simdgo))\n- Go version by Michal Hruby (with SSE4 \u0026 NEON support) ([repository](https://github.com/mhr3/streamvbyte))\n- Zig version by Nick Gates ([repository](https://github.com/fulcrum-so/streamvbyte-zig))\n- Python version by Chris Seymour ([repository](https://github.com/iiSeymour/pystreamvbyte))\n\nFormat Specification\n---------------------\n\nWe specify the format as follows.\n\nWe do not store how many integers (``count``) are compressed\nin the compressed data per se. If you want to store\nthe data stream (e.g., to disk), you need to add this\ninformation. It is intentionally left out because, in\napplications, it is often the case that there are better\nways to store this count.\n\nThere are two streams:\n\n- The data starts with an array of \"control bytes\". There\n   are (count + 3) / 4 of them.\n- Following the array of control bytes, there are data bytes.\n\nWe can interpret the control bytes as a sequence of 2-bit words.\nThe first 2-bit word is made of the least significant 2 bits\nin the first byte, and so forth. There are four 2-bit words\nwritten in each byte.\n\nStarting from the first 2-bit word, we have corresponding\nsequence in the data bytes, written in sequence from the beginning:\n - When the 2-bit word is 00, there is a single data byte.\n - When the 2-bit words is 01, there are two data bytes.\n - When the 2-bit words is 10, there are three data bytes.\n - When the 2-bit words is 11, there are four data bytes.\n\nThe data bytes are stored using a little-endian encoding.\n\n\nConsider the following example:\n\n```\ncontrol bytes: [0x40 0x55 ... ]\ndata bytes: [0x00 0x64 0xc8 0x2c 0x01 0x90  0x01 0xf4 0x01 0x58 0x02 0xbc 0x02 ...]\n```\n\nThe first control byte is 0x40 or the four 2-bit words : ``00 00 00 01``.\nThe second control byte is 0x55 or the four 2-bit words : ``01 01 01 01``.\nThus the first three values are given by the first three bytes:\n``0x00, 0x64, 0xc8`` (or 0, 100, 200 in base 10). The five next values are stored\nusing two bytes each: ``0x2c 0x01, 0x90  0x01, 0xf4 0x01, 0x58 0x02, 0xbc 0x02``.\nAs little endian integers, these are to be interpreted as 300, 400, 500, 600, 700.\n\nThus, to recap, the sequence of integers (0,100,200,300,400,500,600,700) gets encoded as the 15 bytes  ``0x40 0x55 0x00 0x64 0xc8 0x2c 0x01 0x90  0x01 0xf4 0x01 0x58 0x02 0xbc 0x02``.\n\nIf the ``count``is not divisible by four, then we include a final partial group where we use zero 2-bit corresponding to no data byte.\n\nReference\n---------\n\n* Daniel Lemire, Nathan Kurz, Christoph Rupp, [Stream VByte: Faster Byte-Oriented Integer Compression](https://arxiv.org/abs/1709.08990), Information Processing Letters 130, 2018.\n\nSee also\n--------\n* SIMDCompressionAndIntersection: A C++ library to compress and intersect sorted lists of integers using SIMD instructions https://github.com/lemire/SIMDCompressionAndIntersection\n* The FastPFOR C++ library : Fast integer compression https://github.com/lemire/FastPFor\n* High-performance dictionary coding https://github.com/lemire/dictionary\n* LittleIntPacker: C library to pack and unpack short arrays of integers as fast as possible https://github.com/lemire/LittleIntPacker\n* The SIMDComp library: A simple C library for compressing lists of integers using binary packing https://github.com/lemire/simdcomp\n* MaskedVByte: Fast decoder for VByte-compressed integers https://github.com/lemire/MaskedVByte\n* CSharpFastPFOR: A C#  integer compression library  https://github.com/Genbox/CSharpFastPFOR\n* JavaFastPFOR: A java integer compression library https://github.com/lemire/JavaFastPFOR\n* Encoding: Integer Compression Libraries for Go https://github.com/zhenjl/encoding\n* FrameOfReference is a C++ library dedicated to frame-of-reference (FOR) compression: https://github.com/lemire/FrameOfReference\n* libvbyte: A fast implementation for varbyte 32bit/64bit integer compression https://github.com/cruppstahl/libvbyte\n* TurboPFor is a C library that offers lots of interesting optimizations. Well worth checking! (GPL license) https://github.com/powturbo/TurboPFor\n* Oroch is a C++ library that offers a usable API (MIT license) https://github.com/ademakov/Oroch\n","funding_links":[],"categories":["C"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffast-pack%2Fstreamvbyte","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffast-pack%2Fstreamvbyte","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffast-pack%2Fstreamvbyte/lists"}