{"id":22157992,"url":"https://github.com/qratorlabs/pysdsl","last_synced_at":"2026-04-30T18:31:37.394Z","repository":{"id":137989560,"uuid":"141998217","full_name":"QratorLabs/pysdsl","owner":"QratorLabs","description":"Python bindings to Succinct Data Structure Library 2.0","archived":false,"fork":false,"pushed_at":"2019-05-18T18:52:36.000Z","size":135,"stargazers_count":30,"open_issues_count":9,"forks_count":12,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-24T14:48:28.316Z","etag":null,"topics":["data-structures","pybind11","python","succinct","succinct-data-structure"],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/QratorLabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-07-23T10:18:29.000Z","updated_at":"2024-08-31T09:11:53.000Z","dependencies_parsed_at":null,"dependency_job_id":"587f0710-207d-45f6-bf10-61d4642d4c08","html_url":"https://github.com/QratorLabs/pysdsl","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/QratorLabs/pysdsl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QratorLabs%2Fpysdsl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QratorLabs%2Fpysdsl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QratorLabs%2Fpysdsl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QratorLabs%2Fpysdsl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/QratorLabs","download_url":"https://codeload.github.com/QratorLabs/pysdsl/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QratorLabs%2Fpysdsl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32473804,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-30T13:12:12.517Z","status":"ssl_error","status_checked_at":"2026-04-30T13:12:06.837Z","response_time":57,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-structures","pybind11","python","succinct","succinct-data-structure"],"created_at":"2024-12-02T03:16:47.640Z","updated_at":"2026-04-30T18:31:37.376Z","avatar_url":"https://github.com/QratorLabs.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Python bindings to Succinct Data Structure Library 2.0\n\nThe Succinct Data Structure Library ([SDSL][SDSL]) is a powerful and flexible C++11 library implementing succinct data structures. In total, the library contains the highlights of 40 [research publications][SDSLLIT]. Succinct data structures can represent an object (such as a bitvector or a tree) in space close to the information-theoretic lower bound of the object while supporting operations of the original object efficiently. The theoretical time complexity of an operation performed on the classical data structure and the equivalent succinct data structure are (most of the time) identical.\n\nMost of examples from [SDSL cheat sheet][SDSL-CHEAT-SHEET] and [SDSL tutorial][SDSL-TUTORIAL] are implemented.\n\n## Mutable bit-compressed vectors\n\nCore classes (see `pysdsl.int_vector` for dict of all of them):\n\n * `pysdsl.IntVector(size, default_value, bit_width=64)` — dynamic bit width\n * `pysdsl.BitVector(size, default_value)` — static (fixed) bit width (1)\n * `pysdsl.Int4Vector(size, default_value)` — static bit width (4)\n * `pysdsl.Int8Vector(size, default_value)` — static bit width (8)\n * `pysdsl.Int16Vector(size, default_value)` — static bit width (16)\n * `pysdsl.Int24Vector(size, default_value)` — static bit width (24)\n * `pysdsl.Int32Vector(size, default_value)` — static bit width (32)\n * `pysdsl.Int64Vector(size, default_value)` — static bit width (64)\n\nConstruction from python sequences is also supported.\n\n```python\n\nIn [1]: import pysdsl\n\nIn [2]: %time v = pysdsl.IntVector(1024 * 1024 * 256)\nCPU times: user 914 ms, sys: 509 ms, total: 1.42 s\nWall time: 1.42 s\n\nIn [3]: v.size_in_mega_bytes\nOut[3]: 2048.000008583069\n\nIn [4]: %time v.set_to_id()  # like *v = range(len(v))\nCPU times: user 8.19 s, sys: 1.3 ms, total: 8.19 s\nWall time: 8.19 s\n\nIn [5]: v.width\nOut[5]: 64\n\nIn [6]: %time v.bit_compress()\nCPU times: user 23.3 s, sys: 155 ms, total: 23.5 s\nWall time: 23.5 s\n\nIn [7]: v.width\nOut[7]: 28\n\nIn [8]: v.size_in_mega_bytes\nOut[8]: 896.0000085830688\n\n```\n\nBuffer interface:\n\n```python\nIn [9]: import array\n\nIn [10]: v = pysdsl.Int64Vector([1, 2, 3])\n\nIn [11]: array.array('Q', v)\nOut[11]: array('Q', [1, 2, 3])\n```\n\n## Immutable compressed integer vectors\n\n(See `pysdsl.enc_vector`):\n\n * `EncVectorEliasDelta(IntVector)`\n * `EncVectorEliasGamma(IntVector)`\n * `EncVectorFibonacci(IntVector)`\n * `EncVectorComma2(IntVector)`\n * `EncVectorComma4(IntVector)`\n\n```python\nIn [9]: %time ev = pysdsl.EncVectorEliasDelta(v)\nCPU times: user 26.5 s, sys: 31.8 ms, total: 26.5 s\nWall time: 26.5 s\n\nIn [10]: ev.size_in_mega_bytes\nOut[10]: 45.75003242492676\n```\n\nEncoding values with variable length codes (see `pysdsl.variable_length_codes_vector`):\n\n * `VariableLengthCodesVectorEliasDelta(IntVector)`\n * `VariableLengthCodesVectorEliasGamma(IntVector)`\n * `VariableLengthCodesVectorFibonacci(IntVector)`\n * `VariableLengthCodesVectorComma2(IntVector)`\n * `VariableLengthCodesVectorComma4(IntVector)`\n\nEncoding values with \"escaping\" technique (see `pysdsl.direct_accessible_codes_vector`):\n\n * `DirectAccessibleCodesVector(IntVector)`\n * `DirectAccessibleCodesVector8(IntVector)`,\n * `DirectAccessibleCodesVector16(IntVector)`,\n * `DirectAccessibleCodesVector63(IntVector)`,\n * `DirectAccessibleCodesVectorDP(IntVector)` — number of layers is chosen\n                                                with dynamic programming\n * `DirectAccessibleCodesVectorDPRRR(IntVector)` — same but built on top of\n                                                   RamanRamanRaoVector (see later)\n\nConstruction from python sequences is also supported.\n\n## Immutable compressed bit (boolean) vectors\n\n(See pysdsl.`all_immutable_bitvectors`)\n\n * `BitVectorInterLeaved64(BitVector)`\n * `BitVectorInterLeaved128(BitVector)`\n * `BitVectorInterLeaved256(BitVector)`\n * `BitVectorInterLeaved512(BitVector)` — A bit vector which interleaves the\n                                          original `BitVector` with rank information\n                                          (see later)\n * `SDVector(BitVector)` — A bit vector which compresses very sparse populated\n                           bit vectors by representing the positions of 1 by the\n                           Elias-Fano representation for\n                           non-decreasing sequences\n * `RamanRamanRaoVector15(BitVector)`\n * `RamanRamanRaoVector63(BitVector)`\n * `RamanRamanRaoVector256(BitVector)` — An H₀-compressed bitvector representation.\n * `HybVector8(BitVector)`\n * `HybVector16(BitVector)` — A hybrid-encoded compressed bitvector\n                              representation\n\nSee also: `pysdsl.raman_raman_rao_vectors`, `pysdsl.sparse_bit_vectors`,\n`pysdsl.hybrid_bit_vectors` and `pysdsl.bit_vector_interleaved`.\n\n## Rank and select operations on bitvectors\n\nFor bitvector `v` `rank(i)` for pattern `P` (by default `P` is a bitstring of\nlen 1: `1`) is the number of patterns `P` in the prefix `[0..i)` in vector `v`.\n\nFor bitvector `v` `select(i)` for pattern `P` (by default `P`=`1`) is the\nposition of the `i`-th occurrence of pattern `P` in vector `v`.\n\nCreate support instances for rank and/or select for different patterns via:\n\n * `v.init_rank()` or `v.init_rank_1()` for ranks of pattern `1`\n    (e.g. the number of set bits in `v`)\n * `v.init_rank_0()` for ranks of pattern `0`\n * `v.init_rank_00()` (if supported by vector class) for ranks of pattern `00`\n * `v.init_rank_01()` (if supported by vector class) for ranks of pattern `01`\n * `v.init_rank_10()` (if supported by vector class) for ranks of pattern `10`\n * `v.init_rank_11()` (if supported by vector class) for ranks of pattern `11`\n * `v.init_support()` or `v.init_support_1()` for support of pattern `1`\n    (e.g. the positions of set bits)\n * `v.init_support_0()` for ranks of pattern `0`\n * `v.init_support_00()` (if supported by vector class) for ranks of pattern `00`\n * `v.init_support_01()` (if supported by vector class) for ranks of pattern `01`\n * `v.init_support_10()` (if supported by vector class) for ranks of pattern `10`\n * `v.init_support_11()` (if supported by vector class) for ranks of pattern `11`\n\nOnce support instance `s` is created call it (`s(idx)` or `s.__call__(idx)`)\nor use corresponding methods `s.rank(idx)` or `s.select(idx)` to get\nthe results.\n\n`s.rank(idx)` and `s.select(idx)` are undefined if original bitvector is\nmutable and was modified.\n\n\n## Wavelet trees\n\nThe wavelet tree is a data structure that provides three efficient methods:\n\n* The `[]`-operator: `wt[i]` returns the `i`-th symbol of vector for which the wavelet tree was build for.\n* The rank method: `wt.rank(i, c)` returns the number of occurrences of symbol `c` in the prefix `[0..i-1]` in the vector for which the wavelet tree was build for.\n* The select method: `wt.select(j, c)` returns the index `i` from `[0..size()-1]` of the `j`-th occurrence of symbol `c`.\n\n## Comressed suffix arrays\n\nSuffix array is a sorted array of all suffixes of a string.\n\nSDSL supports bitcompressed and compressed suffix arrays.\n\nByte representaion of original IntVector should have no zero symbols in order to construct SuffixArray.\n\n## Objects memory structure\n\nAny object has a `.structure` property with technical information about an\nobject. `.structure_json` also provided for web-view implementations.\n`.write_structure_json()` method puts that information into a file.\n\n`.size_in_bytes` and `.size_in_mega_bytes` properties show how much memory the\nobject is occupying.\n\n## Saving/Loading objects\n\nAll objects provide `.store_to_checked_file()` method allowing one to save\nobject into a file.\n\nAll classes provide `.load_from_checkded_file()` static method allowing one to\nload object stored  with `.store_to_checked_file()`\n\n\n## Building\n\nRequirements: static libraries for sdsl and divsufsort.\n\nCall `pip` with binaries disabled to fetch sources and build the package:\n\n```bash\npip install --no-binaries :all: pysdsl\n```\n\n\n[SDSL]: https://github.com/simongog/sdsl-lite\n[SDSLLIT]: https://github.com/simongog/sdsl-lite/wiki/Literature\n\"Succinct Data Structure Literature\"\n[SDSL-CHEAT-SHEET]: https://simongog.github.io/assets/data/sdsl-cheatsheet.pdf\n[SDSL-TUTORIAL]: https://simongog.github.io/assets/data/sdsl-slides/tutorial\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqratorlabs%2Fpysdsl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqratorlabs%2Fpysdsl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqratorlabs%2Fpysdsl/lists"}