{"id":20321365,"url":"https://github.com/openacid/succinct","last_synced_at":"2025-04-11T19:09:47.304Z","repository":{"id":57632287,"uuid":"334454511","full_name":"openacid/succinct","owner":"openacid","description":"succinct static kv","archived":false,"fork":false,"pushed_at":"2021-02-05T07:13:25.000Z","size":25,"stargazers_count":45,"open_issues_count":0,"forks_count":3,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-03-25T15:03:12.466Z","etag":null,"topics":["go","golang","kv","static","succicnt"],"latest_commit_sha":null,"homepage":"https://openacid.github.io/","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openacid.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-01-30T16:10:01.000Z","updated_at":"2025-02-03T01:07:03.000Z","dependencies_parsed_at":"2022-08-31T13:12:11.281Z","dependency_job_id":null,"html_url":"https://github.com/openacid/succinct","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":"openacid/gotmpl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openacid%2Fsuccinct","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openacid%2Fsuccinct/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openacid%2Fsuccinct/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openacid%2Fsuccinct/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openacid","download_url":"https://codeload.github.com/openacid/succinct/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248465343,"owners_count":21108244,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["go","golang","kv","static","succicnt"],"created_at":"2024-11-14T19:14:40.179Z","updated_at":"2025-04-11T19:09:47.281Z","avatar_url":"https://github.com/openacid.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# succinct\n\n[![Travis](https://travis-ci.com/openacid/succinct.svg?branch=main)](https://travis-ci.com/openacid/succinct)\n![test](https://github.com/openacid/succinct/workflows/test/badge.svg)\n\n[![Report card](https://goreportcard.com/badge/github.com/openacid/succinct)](https://goreportcard.com/report/github.com/openacid/succinct)\n[![Coverage Status](https://coveralls.io/repos/github/openacid/succinct/badge.svg?branch=main\u0026service=github)](https://coveralls.io/github/openacid/succinct?branch=main\u0026service=github)\n\n[![GoDoc](https://godoc.org/github.com/openacid/succinct?status.svg)](http://godoc.org/github.com/openacid/succinct)\n[![PkgGoDev](https://pkg.go.dev/badge/github.com/openacid/succinct)](https://pkg.go.dev/github.com/openacid/succinct)\n[![Sourcegraph](https://sourcegraph.com/github.com/openacid/succinct/-/badge.svg)](https://sourcegraph.com/github.com/openacid/succinct?badge)\n\nsuccinct provides several static succinct data types\n\n\u003c!-- START doctoc generated TOC please keep comment here to allow auto update --\u003e\n\u003c!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --\u003e\n\n\n- [Succinct Set](#succinct-set)\n  - [Synopsis](#synopsis)\n  - [Performance](#performance)\n  - [Implementation](#implementation)\n- [License](#license)\n\n\u003c!-- END doctoc generated TOC please keep comment here to allow auto update --\u003e\n\n# Succinct Set\n\n中文介绍: [100行代码的压缩前缀树: 50% smaller](https://blog.openacid.com/algo/succinctset/)\n\nSet is a succinct, sorted and static string set impl with compacted trie as\nstorage. The space cost is about half lower than the original data.\n\n## Synopsis\n\n```go\npackage succinct\n\nimport \"fmt\"\n\nfunc ExampleNewSet() {\n\tkeys := []string{\n\t\t\"A\", \"Aani\", \"Aaron\", \"Aaronic\", \"Aaronical\", \"Aaronite\",\n\t\t\"Aaronitic\", \"Aaru\", \"Ab\", \"Ababdeh\", \"Ababua\", \"Abadite\",\n\t}\n\ts := NewSet(keys)\n\tfor _, k := range []string{\"Aani\", \"Foo\", \"Ababdeh\"} {\n\t\tfound := s.Has(k)\n\t\tfmt.Printf(\"lookup %10s, found: %v\\n\", k, found)\n\t}\n\n\t// Output:\n\t//\n\t// lookup       Aani, found: true\n\t// lookup        Foo, found: false\n\t// lookup    Ababdeh, found: true\n}\n```\n\n## Performance\n\n-   200 kilo real-world words collected from web:\n    - the space costs is **57%** of original data size.\n    - And a `Has()` costs about `350 ns` with a **zip-f** workload.\n\n    Original size: 2204 KB\n\n    With comparison with string array bsearch and google [btree][] :\n\n    | Data         | Engine       | Size(KB) | Size/original | ns/op |\n    | :--          | :--          | --:      | --:           | --:   |\n    | 200kweb2     | bsearch      |  5890    |  267%         | 229   |\n    | 200kweb2     | succinct.Set |  1258    |   57%         | 356   |\n    | 200kweb2     | btree        | 12191    |  553%         | 483   |\n\n    \u003e A string in go has two fields: a pointer to the text content and a length.\n    \u003e Thus the space overhead is quite high with small strings.\n    \u003e [btree][] internally has more pointers and indirections(interface).\n\n\n-   870 kilo real-world ipv4:\n    - the space costs is **67%** of original data size.\n    - And a `Has()` costs about `500 ns` with a **zip-f** workload.\n\n    Original size: 6823 KB\n\n    | Data         | Engine       | Size(KB) | Size/original | ns/op |\n    | :--          | :--          | --:      | --:           | --:   |\n    | 870k_ip4_hex | bsearch      | 17057    |  500%         | 276   |\n    | 870k_ip4_hex | succinct.Set |  2316    |   67%         | 496   |\n    | 870k_ip4_hex | btree        | 40388    | 1183%         | 577   |\n\n\n\n## Implementation\n\nIt stores sorted strings in a compacted trie(AKA prefix tree). A trie node has\nat most 256 outgoing labels. A label is just a single byte. E.g., [ab, abc,\nabcd, axy, buv] is represented with a trie like the following: (Numbers are node\nid)\n\n    ^ -a-\u003e 1 -b-\u003e 3 $\n      |      |      `c-\u003e 6 $\n      |      |             `d-\u003e 9 $\n      |      `x-\u003e 4 -y-\u003e 7 $\n      `b-\u003e 2 -u-\u003e 5 -v-\u003e 8 $\n\nInternally it uses a packed []byte and a bitmap with `len([]byte)` bits to\ndescribe the outgoing labels of a node,:\n\n    ^: ab  00\n    1: bx  00\n    2: u   0\n    3: c   0\n    4: y   0\n    5: v   0\n    6: d   0\n    7: ø\n    8: ø\n    9: ø\n\nIn storage it packs labels together and bitmaps joined with separator `1`:\n\n    labels(ignore space): \"ab bx u c y v d\"\n    label bitmap:          0010010101010101111\n\nIn this way every node has a `0` pointing to it(except the root node)\nand has a corresponding `1` for it:\n\n                                   .-----.\n                           .--.    | .---|-.\n                           |.-|--. | | .-|-|.\n                           || ↓  ↓ | | | ↓ ↓↓\n    labels(ignore space):  ab bx u c y v d øøø\n    label bitmap:          0010010101010101111\n    node-id:               0  1  2 3 4 5 6 789\n                              || | ↑ ↑ ↑ |   ↑\n                              || `-|-|-' `---'\n                              |`---|-'\n                              `----'\n\nTo walk from a parent node along a label to a child node, count the number of\n`0` upto the bit the label position, then find where the the corresponding\n`1` is:\n\n    childNodeId = select1(rank0(i))\n\nIn our impl, it is:\n\n    nodeId = countZeros(ss.labelBitmap, ss.ranks, bmIdx+1)\n    bmIdx = selectIthOne(ss.labelBitmap, ss.ranks, ss.selects, nodeId-1) + 1\n\nFinally leaf nodes are indicated by another bitmap `leaves`, in which a `1` at\ni-th bit indicates the i-th node is a leaf:\n\n    leaves: 0001001111\n\n# License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n[btree]: https://github.com/google/btree","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenacid%2Fsuccinct","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenacid%2Fsuccinct","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenacid%2Fsuccinct/lists"}