{"id":13418577,"url":"https://github.com/fast-pack/FastPFOR","last_synced_at":"2025-03-15T03:31:35.213Z","repository":{"id":37549339,"uuid":"4797147","full_name":"fast-pack/FastPFOR","owner":"fast-pack","description":"The FastPFOR C++ library: Fast integer compression","archived":false,"fork":false,"pushed_at":"2025-02-26T22:32:04.000Z","size":5818,"stargazers_count":906,"open_issues_count":15,"forks_count":126,"subscribers_count":42,"default_branch":"master","last_synced_at":"2025-03-07T03:49:48.946Z","etag":null,"topics":["compression-schemes","simd-compression","sorted-lists"],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fast-pack.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2012-06-26T15:50:06.000Z","updated_at":"2025-03-06T12:18:04.000Z","dependencies_parsed_at":"2024-06-10T15:10:22.019Z","dependency_job_id":"392316a0-239d-46c3-8150-e7886971838e","html_url":"https://github.com/fast-pack/FastPFOR","commit_stats":null,"previous_names":["fast-pack/fastpfor"],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fast-pack%2FFastPFOR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fast-pack%2FFastPFOR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fast-pack%2FFastPFOR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fast-pack%2FFastPFOR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fast-pack","download_url":"https://codeload.github.com/fast-pack/FastPFOR/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243681024,"owners_count":20330152,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression-schemes","simd-compression","sorted-lists"],"created_at":"2024-07-30T22:01:03.984Z","updated_at":"2025-03-15T03:31:31.771Z","avatar_url":"https://github.com/fast-pack.png","language":"C++","readme":"# The FastPFOR C++ library : Fast integer compression\n![Ubuntu-CI](https://github.com/lemire/FastPFor/workflows/Ubuntu-CI/badge.svg)\n\n\n## What is this?\n\nA research library with integer compression schemes.\nIt is broadly applicable to the compression of arrays of\n32-bit integers where most integers are small.\nThe library seeks to exploit SIMD instructions (SSE)\nwhenever possible.\n\nThis library can decode at least 4 billions of compressed integers per second on most\ndesktop or laptop processors. That is, it can decompress data at a rate of 15 GB/s.\nThis is significantly faster than generic codecs like gzip, LZO, Snappy or LZ4.\n\nIt is used by the [zsearch engine](http://victorparmar.github.com/zsearch/)\nas well as in [GMAP and GSNAP](http://research-pub.gene.com/gmap/). DuckDB derived some of their code from this library It\nhas [been ported to Java](https://github.com/lemire/JavaFastPFOR),\n[C#](https://github.com/Genbox/CSharpFastPFOR)  and \n[Go](https://github.com/reducedb/encoding). The Java port is used by\n[ClueWeb Tools](https://github.com/lintool/clueweb).\n\n[Apache Lucene version 4.6.x uses a compression format derived from our FastPFOR\nscheme](http://lucene.apache.org/core/4_6_1/core/org/apache/lucene/util/PForDeltaDocIdSet.html).\n\n## Python bindings\n\n- We have Python bindings: https://github.com/searchivarius/PyFastPFor\n\n## Myths\n\nMyth: SIMD compression requires very large blocks of integers (1024 or more).\n\nFact: This is not true. Our fastest scheme (SIMDBinaryPacking) works over blocks of 128 integers.\n[Another very fast scheme (Stream VByte) works over blocks of four integers](https://github.com/lemire/streamvbyte).\n\nMyth: SIMD compression means high speed but less compression.\n\nFact: This is wrong. Some schemes cannot easily be accelerated\nwith SIMD instructions, but many that do compress very well.\n\n## Working with sorted lists of integers\n\nIf you are working primarily with sorted lists of integers, then \nyou might want to use differential coding. That is you may want to\ncompress the deltas instead of the integers themselves. The current \nlibrary (fastpfor) is generic and was not optimized for this purpose.\nHowever, we have another library designed to compress sorted integer\nlists: \n\nhttps://github.com/lemire/SIMDCompressionAndIntersection\n\nThis other library (SIMDCompressionAndIntersection) also comes complete\nwith new SIMD-based intersection algorithms.\n\nThere is also a C library for differential coding (fast computation of\ndeltas, and recovery from deltas): \n\nhttps://github.com/lemire/FastDifferentialCoding\n\n## Other recommended libraries\n\n* Fast integer compression in Go: https://github.com/ronanh/intcomp\n* High-performance dictionary coding https://github.com/lemire/dictionary\n* LittleIntPacker: C library to pack and unpack short arrays of integers as fast as possible https://github.com/lemire/LittleIntPacker\n* The SIMDComp library: A simple C library for compressing lists of integers using binary packing https://github.com/lemire/simdcomp\n* StreamVByte: Fast integer compression in C using the StreamVByte codec https://github.com/lemire/streamvbyte\n* MaskedVByte: Fast decoder for VByte-compressed integers https://github.com/lemire/MaskedVByte\n* CSharpFastPFOR: A C#  integer compression library  https://github.com/Genbox/CSharpFastPFOR\n* JavaFastPFOR: A java integer compression library https://github.com/lemire/JavaFastPFOR\n* Encoding: Integer Compression Libraries for Go https://github.com/zhenjl/encoding\n* FrameOfReference is a C++ library dedicated to frame-of-reference (FOR) compression: https://github.com/lemire/FrameOfReference\n* libvbyte: A fast implementation for varbyte 32bit/64bit integer compression https://github.com/cruppstahl/libvbyte\n* TurboPFor is a C library that offers lots of interesting optimizations. Well worth checking! (GPL license) https://github.com/powturbo/TurboPFor-Integer-Compression\n* Oroch is a C++ library that offers a usable API (MIT license) https://github.com/ademakov/Oroch\n\n## Reference and documentation\n\nFor a simple example, please see \n\nexample.cpp \n\nin the root directory of this project.\n\nPlease see:\n\n* Daniel Lemire, Nathan Kurz, Christoph Rupp, Stream VByte: Faster Byte-Oriented Integer Compression, Information Processing Letters 130, 2018. https://arxiv.org/abs/1709.08990\n* Daniel Lemire and Leonid Boytsov, Decoding billions of integers per second through vectorization, Software Practice \u0026 Experience 45 (1), 2015.  http://arxiv.org/abs/1209.2137 http://onlinelibrary.wiley.com/doi/10.1002/spe.2203/abstract\n* Daniel Lemire, Leonid Boytsov, Nathan Kurz, SIMD Compression and the Intersection of Sorted Integers, Software Practice \u0026 Experience 46 (6), 2016 http://arxiv.org/abs/1401.6399\n* Jeff Plaisance, Nathan Kurz, Daniel Lemire, Vectorized VByte Decoding, International Symposium on Web Algorithms 2015, 2015. http://arxiv.org/abs/1503.07387\n* Wayne Xin Zhao, Xudong Zhang, Daniel Lemire, Dongdong Shan, Jian-Yun Nie, Hongfei Yan, Ji-Rong Wen, A General SIMD-based Approach to Accelerating Compression Algorithms, ACM Transactions on Information Systems 33 (3), 2015. http://arxiv.org/abs/1502.01916\n\n\nThis library was used by several papers including the following:\n\n* Jianguo Wang, Chunbin Lin, Yannis Papakonstantinou, Steven Swanson, An Experimental Study of Bitmap Compression vs. Inverted List Compression, SIGMOD 2017 http://db.ucsd.edu/wp-content/uploads/2017/03/sidm338-wangA.pdf\n* P. Damme, D. Habich, J. Hildebrandt, W. Lehner, Lightweight Data Compression Algorithms: An Experimental Survey (Experiments and Analyses), EDBT 2017 http://openproceedings.org/2017/conf/edbt/paper-146.pdf\n* P. Damme, D. Habich, J. Hildebrandt, W. Lehner, Insights into the Comparative Evaluation of Lightweight Data Compression Algorithms, EDBT 2017 http://openproceedings.org/2017/conf/edbt/paper-414.pdf\n* G. Ottaviano, R. Venturini, Partitioned Elias-Fano Indexes, ACM SIGIR 2014 http://www.di.unipi.it/~ottavian/files/elias_fano_sigir14.pdf\n* M. Petri, A. Moffat, J. S. Culpepper, Score-Safe Term Dependency Processing With Hybrid Indexes, ACM SIGIR 2014 http://www.culpepper.io/publications/sp074-petri.pdf\n\nIt has also inspired related work such as...\n\n* T. D. Wu, Bitpacking techniques for indexing genomes: I. Hash tables, Algorithms for Molecular Biology 11 (5), 2016. http://almob.biomedcentral.com/articles/10.1186/s13015-016-0069-5\n\n## License\n\nThis code is licensed under Apache License, Version 2.0 (ASL2.0).\n\n## Software Requirements\n\nThis code requires a compiler supporting C++11. This was\na design decision.\n\nIt builds under \n\n*  clang++ 3.2 (LLVM 3.2) or better,\n*  Intel icpc (ICC) 13.0.1 or better,\n*  MinGW32 (x64-4.8.1-posix-seh-rev5)\n*  Microsoft VS 2012 or better,\n* and GNU GCC 4.7 or better.\n\nThe code was tested under Windows, Linux and MacOS.\n\n## Hardware Requirements\n\nOn an x64 platform, your processor should support SSSE3. This includes almost every Intel or AMD processor\nsold after 2006. (Note: the key schemes require merely SSE2.)  Some specific binaries will only run if your processor \nsupports SSE4.1. They have been purely used for specific tests however.\n\nWe also support ARM platforms through SIMDe, by wrapping.\n\n## Building with CMake\n\nYou need cmake. On most linux distributions, you can simply do the following:\n\n      git clone https://github.com/lemire/FastPFor.git\n      cd FastPFor\n      mkdir build\n      cd build\n      cmake ..\n      cmake --build .\n\nIt may be necessary to set the CXX variable. The project is installable (`make install` works).\n\nTo create project files for Microsoft Visual Studio, it might be useful to target 64-bit Windows (e.g., see http://www.cmake.org/cmake/help/v3.0/generator/Visual%20Studio%2012%202013.html).\n\n### Multithreaded context\n\nYou should not assume that our objects are thread safe.\nIf you have several threads, each thread should have its own IntegerCODEC\nobjects to ensure that there is no concurrency problems.\n\n\n## Why C++11?\n\nWith minor changes, all schemes will compile fine under\ncompilers that do not support C++11. And porting the code\nto C should not be a challenge.\n\nIn any case, we already support 3 major C++ compilers so portability\nis not a major issue.\n\n## What if I prefer Java?\n\nMany schemes cannot be efficiently ported to Java. However\nsome have been. Please see:\n\nhttps://github.com/lemire/JavaFastPFOR\n\n## What if I prefer C#?\n\nSee CSharpFastPFOR: A C#  integer compression library  https://github.com/Genbox/CSharpFastPFOR\n\n## What if I prefer Go?\n\nSee  Encoding: Integer Compression Libraries for Go https://github.com/zhenjl/encoding\n\n## Testing\n\nIf you used CMake to generate the build files, the `check` target will\nrun the unit tests. For example , if you generated Unix Makefiles\n\n    make check\n\nwill do it. \n\n## Simple benchmark\n\n    make codecs\n    ./codecs --clusterdynamic\n    ./codecs --uniformdynamic\n\n## Optional : Snappy\n\nTyping \"make allallall\" will install some testing binaries that depend\non Google Snappy. If you want to build these, you need to install\nGoogle snappy. You can do so on a recent ubuntu machine as:\n\n    sudo apt-get install libsnappy-dev\n\n## Processing data files\n\nTyping \"make\" will generate an \"inmemorybenchmark\"\nexecutable that can process data files.\n\nYou can use it to process arrays on (sorted!) integers\non disk using the following 32-bit format: 1 unsigned 32-bit\ninteger  indicating array length followed by the corresponding\nnumber of 32-bit integer. Repeat.\n\n ( It is assumed that the integers are sorted.)\n\n\nOnce you have such a binary file somefilename you can\nprocess it with our inmemorybenchmark:\n\n    ./inmemorybenchmark --minlength 10000 somefilename\n\nThe \"minlength\" flag skips short arrays. (Warning: timings over\nshort arrays are unreliable.)\n\n\n## Testing with the Gov2 and ClueWeb09 data sets\n\nAs of April 2014, we recommend getting our archive at\n\nhttp://lemire.me/data/integercompression2014.html\n\nIt is the data was used for the following paper:\n\nDaniel Lemire, Leonid Boytsov, Nathan Kurz, SIMD Compression and the Intersection of Sorted Integers, arXiv: 1401.6399, 2014\nhttp://arxiv.org/abs/1401.6399\n\n\n## I used your code and I get segmentation faults\n\nOur code is thoroughly tested.\n\nOne common issue is that people do not provide large enough buffers.\nSome schemes can have such small compression rates that the compressed data\ngenerated will be much larger than the input data.\n\n## Is any of this code subject to patents?\n\nI (D. Lemire) did not patent anything.\n\nHowever, we implemented varint-G8UI which was patented by its authors. \nDO NOT use varint-G8UI if you want to avoid patents.\n\nThe rest of the library *should be* patent-free.\n\n## Funding \n\nThis work was supported by NSERC grant number 26143.\n","funding_links":[],"categories":["TODO scan for Android support in followings","Maths"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffast-pack%2FFastPFOR","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffast-pack%2FFastPFOR","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffast-pack%2FFastPFOR/lists"}