{"id":27837052,"url":"https://github.com/Cydhra/vers","last_synced_at":"2025-05-02T18:05:56.515Z","repository":{"id":166179826,"uuid":"632415033","full_name":"Cydhra/vers","owner":"Cydhra","description":"very efficient rank and select","archived":false,"fork":false,"pushed_at":"2024-06-28T21:19:02.000Z","size":505,"stargazers_count":56,"open_issues_count":2,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-07-02T20:59:30.671Z","etag":null,"topics":["data-structures","rust","succinct-bit-vector"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Cydhra.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-25T11:09:07.000Z","updated_at":"2024-07-07T18:34:44.484Z","dependencies_parsed_at":"2024-03-05T01:32:32.094Z","dependency_job_id":"8974b690-c309-499e-b0be-2c9cdce177b9","html_url":"https://github.com/Cydhra/vers","commit_stats":null,"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cydhra%2Fvers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cydhra%2Fvers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cydhra%2Fvers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cydhra%2Fvers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Cydhra","download_url":"https://codeload.github.com/Cydhra/vers/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252084816,"owners_count":21692163,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-structures","rust","succinct-bit-vector"],"created_at":"2025-05-02T18:05:51.962Z","updated_at":"2025-05-02T18:05:56.500Z","avatar_url":"https://github.com/Cydhra.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"# Vers - Very Efficient Rank and Select\n\n[![crates.io](https://img.shields.io/crates/v/vers-vecs.svg)](https://crates.io/crates/vers-vecs)\n[![rust](https://github.com/cydhra/vers/actions/workflows/rust.yml/badge.svg)](https://github.com/Cydhra/vers)\n[![docs](https://docs.rs/vers-vecs/badge.svg)](https://docs.rs/vers-vecs)\n\nVers (vers-vecs on crates.io)\ncontains pure-Rust implementations of several data structures backed by rank and select operations.\nWhen using this library, it is strongly recommended to enable the `BMI2` and `popcnt` features for x86_64 CPUs\nor compile with the `target-cpu=native` flag,\nsince the intrinsics speed up both `rank` and `select` operations by a factor of 2-3.\n\n## Data Structures\n- A fully-featured bit vector with no memory overhead.\n- A succinct bit vector supporting fast rank and select queries.\n- An Elias-Fano encoding of monotone sequences supporting constant-time predecessor/successor queries.\n- Two Range Minimum Query vector structures for constant-time range minimum queries.\n- A Wavelet Matrix supporting `O(k)` rank, select, statistical, predecessor, and successor queries.\n- A succinct tree structure (BP Tree) supporting level-ordered and depth-first-ordered tree navigation and subtree queries.\n\n## Why Vers?\n- Vers is among the fastest publicly available bit vector implementations for rank and select operations.\n- Vers has a substantially lower memory overhead than its competitors.\n- Without crate features, all data structures are implemented in pure Rust and have no dependencies outside the standard library.\n- Every functionality is extensively documented.\n- Vers aims to provide more functionality for its data structures than competitors \n  (e.g., Elias-Fano sequences and the Wavelet Matrix support predecessor and successor queries, \n  the Wavelet Matrix supports statistical queries, all data structures implement various iterators, etc.).\n\n## Crate Features\n- `simd`: Enables the use of SIMD instructions for rank and select operations.\nThis feature requires AVX-512 support and uses unsafe code.\nIt also enables a special iterator for the rank/select bit vector that uses vectorized operations.\nThe feature only works on nightly Rust.\nEnabling it on stable Rust is a no-op, because the required CPU features are not available there.\n- `serde`: Enables serialization and deserialization of the data structures using the `serde` crate.\n- `u16_lookup` Enables a larger lookup table for BP tree queries. The larger table requires 128 KiB instead of 4 KiB.\n\n## Benchmarks\nI benchmarked the implementations against publicly available implementations of the same data structures.\nThe benchmarking code is available in the [vers-benchmarks](https://github.com/Cydhra/vers_benchmarks) repository.\nThe benchmark uses the `simd` feature of rsdict, which requires nightly Rust.\n\nI performed the benchmarks on a Ryzen 9 7950X with 32GB of RAM.\nSome of the results are shown below.\nAll benchmarks were run with the `target-cpu=native` flag enabled, and the `simd` feature enabled for Vers.\nMore results can be found in the benchmark repository.\n\nBenchmarks for the Wavelet Matrix are still missing because I want to improve the benchmarking code before I do them.\nBecause Wavelet Matrices have very little room for engineering, there aren't any surprising results to be expected, though.\nThe performance solely depends on the bit vector implementation, so the results will be similar to the bit vector benchmarks.\nThe only exception is the [qwt](https://crates.io/crates/qwt) crate, which uses quad vectors instead,\nand is substantially faster than any other crate due to the reduced number of cache misses.\n\n### Bit-Vector\n#### Rank \u0026 Select\nThe bit vector implementation is among the fastest publicly available implementations for rank and select operations.\nNote that the `succinct` crate substantially outperforms Vers' `rank` operation but does not provide an efficient select operation.\n\nThe x-axis is the number of bits in the bit vector.\nAn increase in all runtimes can be observed for input sizes exceeding the L2 cache size (16 MB).\n\n| Legend            | Crate                                   | Notes                               |\n|-------------------|-----------------------------------------|-------------------------------------|\n| bio               | https://crates.io/crates/bio            | with adaptive block-size            |\n| fair bio          | https://crates.io/crates/bio            | with constant block-size            |\n| fid               | https://crates.io/crates/fid            |                                     |\n| indexed bitvector | https://crates.io/crates/indexed_bitvec |                                     |\n| rank9             | https://crates.io/crates/succinct       | Fastest of multiple implementations |\n| rsdict            | https://crates.io/crates/rsdict         |                                     |\n| vers              | https://github.com/Cydhra/vers          |                                     |\n| sucds-rank9       | https://crates.io/crates/sucds          |                                     |\n| sucds-darray      | https://crates.io/crates/sucds          | Dense Set Implementation            |\n| bitm              | https://crates.io/crates/bitm           |                                     |\n\n![Bit-Vector Rank Benchmark](images/rank_comparison.svg)\n![Bit-Vector Select Benchmark](images/select_comparison.svg)\n\n#### Heap Size\n\nThe memory overhead of the bit vector implementation is significantly lower than that of other implementations.\nThe x-axis is the number of bits in the bit vector,\nthe y-axis is the additional overhead in percent compared to the size of the bit vector.\nOnly the fastest competitors are shown, to make the graph more readable\n(I would like to add the bio crate data structure as well, since it is the only truly succinct one,\nbut it does not offer an operation to measure the heap size.\nThe same is true for the `bitm` crate, which claims to have a lower memory overhead compared to `Vers`,\nbut does not offer a convenient way of measuring it).\nVers achieves its high speeds with significantly less memory overhead, as can be seen in the heap size benchmark.\nThe legend contains the measurement for the biggest input size,\nbecause I assume that the overhead approaches a constant value for large inputs.\n\n![Bit-Vector Heap Size Benchmark](images/heap.svg)\n\n### Elias-Fano\nThe benchmark compares the access times for random elements in the sequence.\nThe x-axis is the number of elements in the sequence.\nNote, that the elias-fano crate is inefficient with random order access.\nIn-order access benchmarks can be found in the benchmark repository.\n\n![Elias-Fano Randomized](images/elias_fano_access_random.svg)\n\nThe following two benchmarks show the predecessor query times for average element distribution and the \nworst-case element distribution.\nNote that Vers worst-case query times are logarithmic, while `sucds` has linear worst-case query times.\n\n![Elias-Fano Worst Case](images/elias_fano_pred_random.svg)\n![Elias-Fano Worst Case](images/elias_fano_pred_adversarial.svg)\n\n### Range Minimum Query\nThe Range Minimum Query implementations are compared against the \n[range_minimum_query](https://crates.io/crates/range_minimum_query) and \n[librualg](https://crates.io/crates/librualg) crate.\nVers outperforms both crates by a significant margin with both implementations.\nAn increase in runtime can be observed for input sizes exceeding the L3 cache size (64 MB).\nThe increase is earlier for the `BinaryRMQ` implementation, because it has a substantially higher memory overhead.\nFor the same reason, the final two measurements for the `BinaryRMQ` implementation are missing (the data structure\nexceeded the available 32 GB main memory).\n\n(Yes, the naming of both implementations is unfortunate, but they will stay until I do a major version bump.)\n\n![RMQ Comparison](images/rmq_comparison.svg)\n\n# Intrinsics\nThis crate uses compiler intrinsics for bit manipulation. The intrinsics are supported by\nall modern x86_64 CPUs, but not by other architectures.\nThere are fallback implementations if the intrinsics are not available, but they are significantly slower.\nUsing this library on `x86` CPUs without enabling `BMI2` and `popcnt` target features is not recommended.\n\nThe intrinsics in question are `popcnt` (supported since SSE4.2 resp. SSE4a on AMD, 2007-2008),\n`pdep` (supported with BMI2 since Intel Haswell resp. AMD Excavator, in hardware since AMD Zen 3, 2011-2013),\nand `tzcnt` (supported with BMI1 since Intel Haswell resp. AMD Jaguar, ca. 2013).\n\n## Safety\nThis crate uses no unsafe code, with the only exception being compiler intrinsic for `pdep`.\nThe intrinsics cannot fail with the provided inputs (provided they are\nsupported by the target machine), so even if they were to be implemented incorrectly, no\nmemory corruption can occur (only incorrect results).\n\nUnsafe code is hidden behind public API.\n\n## Dependencies\nThe library has no dependencies outside the Rust standard library by default.\nIt has a plethora of dependencies for benchmarking purposes, but these are not required for normal use.\nOptionally, the `serde` feature can be enabled to allow serialization and deserialization of the data structures,\nwhich requires the `serde` crate and its `derive` feature.\n\n## License\nLicensed under either of\n\n* Apache License, Version 2.0\n  ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)\n* MIT license\n  ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)\n\nat your option.\n\nThis project includes code developed by [Gonzalo Brito Gadeschi](https://github.com/gnzlbg/bitintr)\noriginally licensed under the MIT license.\nIt is redistributed under the above dual license.\n\n## Contribution\nUnless you explicitly state otherwise, any contribution intentionally submitted\nfor inclusion in the work by you, as defined in the Apache-2.0 license, shall be\ndual licensed as above, without any additional terms or conditions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCydhra%2Fvers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FCydhra%2Fvers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCydhra%2Fvers/lists"}