{"id":20321397,"url":"https://github.com/openacid/slimarray","last_synced_at":"2026-02-11T12:38:49.679Z","repository":{"id":37654150,"uuid":"311189692","full_name":"openacid/slimarray","owner":"openacid","description":"SlimArray compresses uint32 into several bits, by using a polynomial to describe overall trend of an array.","archived":false,"fork":false,"pushed_at":"2022-05-30T13:47:01.000Z","size":6955,"stargazers_count":52,"open_issues_count":0,"forks_count":3,"subscribers_count":4,"default_branch":"main","last_synced_at":"2026-01-15T07:28:14.217Z","etag":null,"topics":["array","compacted","compress","go","golang","memory","space"],"latest_commit_sha":null,"homepage":"https://openacid.github.io/","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openacid.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-11-09T01:11:50.000Z","updated_at":"2025-04-24T06:44:15.000Z","dependencies_parsed_at":"2022-09-04T21:41:32.422Z","dependency_job_id":null,"html_url":"https://github.com/openacid/slimarray","commit_stats":null,"previous_names":["openacid/polyarray"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/openacid/slimarray","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openacid%2Fslimarray","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openacid%2Fslimarray/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openacid%2Fslimarray/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openacid%2Fslimarray/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openacid","download_url":"https://codeload.github.com/openacid/slimarray/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openacid%2Fslimarray/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29333113,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-11T06:13:03.264Z","status":"ssl_error","status_checked_at":"2026-02-11T06:12:55.843Z","response_time":97,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["array","compacted","compress","go","golang","memory","space"],"created_at":"2024-11-14T19:14:52.259Z","updated_at":"2026-02-11T12:38:49.664Z","avatar_url":"https://github.com/openacid.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# slimarray\n\n[![Travis](https://travis-ci.com/openacid/slimarray.svg?branch=main)](https://travis-ci.com/openacid/slimarray)\n![test](https://github.com/openacid/slimarray/workflows/test/badge.svg)\n\n[![Report card](https://goreportcard.com/badge/github.com/openacid/slimarray)](https://goreportcard.com/report/github.com/openacid/slimarray)\n[![Coverage Status](https://coveralls.io/repos/github/openacid/slimarray/badge.svg?branch=main\u0026service=github)](https://coveralls.io/github/openacid/slimarray?branch=main\u0026service=github)\n\n[![GoDoc](https://godoc.org/github.com/openacid/slimarray?status.svg)](http://godoc.org/github.com/openacid/slimarray)\n[![PkgGoDev](https://pkg.go.dev/badge/github.com/openacid/slimarray)](https://pkg.go.dev/github.com/openacid/slimarray)\n[![Sourcegraph](https://sourcegraph.com/github.com/openacid/slimarray/-/badge.svg)](https://sourcegraph.com/github.com/openacid/slimarray?badge)\n\nSlimArray is a space efficient, static `uint32` array.\nIt uses polynomial to compress and store an array.\nWith a SlimArray with a million sorted number in range `[0, 1000*1000]`,\n- a `uint32` requires only **5 bits** (17% of original data);\n- compressing a `uint32` takes **110 ns**, e.g., 9 million insert per second;\n- reading a `uint32` with `Get()` takes **7 ns**.\n- batch reading with `Slice()` takes **3.8 ns**/elt.\n\nSlimBytes is an array of var-length records(a record is a `[]byte`), which is indexed by SlimArray.\nThus the memory overhead of storing `offset` and `length` of each record is very low, e.g., about **8 bits/record**,\ncompared to a typical implementation that uses an offset of type int(`32 to 64 bit / record`).\nAn `Get()` takes **15 ns**.\n\n中文介绍: [https://blog.openacid.com/algo/slimarray/](https://blog.openacid.com/algo/slimarray/)\n\n\u003c!-- START doctoc generated TOC please keep comment here to allow auto update --\u003e\n\u003c!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --\u003e\n\n\n- [Why](#why)\n- [What It Is And What It Is Not](#what-it-is-and-what-it-is-not)\n- [Limitation](#limitation)\n- [Install](#install)\n- [Synopsis](#synopsis)\n  - [Build a SlimArray](#build-a-slimarray)\n  - [Build a SlimBytes](#build-a-slimbytes)\n- [How it works](#how-it-works)\n    - [The General Idea](#the-general-idea)\n    - [What It Is And What It Is Not](#what-it-is-and-what-it-is-not-1)\n    - [Data Structure](#data-structure)\n    - [Uncompressed Data Structures](#uncompressed-data-structures)\n    - [Compact](#compact)\n\n\u003c!-- END doctoc generated TOC please keep comment here to allow auto update --\u003e\n\n# Why\n\n- **Space efficient**: In a sorted array, an elt only takes about **10 bits** to\n    store a 32-bit int.\n\n| Data size | Data Set                | gzip size | slimarry size | avg size   | ratio |\n| --:       | :--                     | --:       | :--           | --:        | --:   |\n| 1,000     | rand u32: [0, 1000]     | x         | 824 byte      | 6 bit/elt  | 18%   |\n| 1,000,000 | rand u32: [0, 1000,000] | x         | 702 KB        | 5 bit/elt  | 15%   |\n| 1,000,000 | IPv4 DB                 | 2 MB      | 2 MB          | 16 bit/elt | 50%   |\n| 600       | [slim][] star count     | 602 byte  | 832 byte      | 10 bit/elt | 26%   |\n\n[slim]: https://github.com/openacid/slim\n\n- **Fast**: `Get()`: 7 ns/op. Building: 110 ns/elt. Run and see the benchmark: `go test . -bench=.`.\n\n- **Adaptive**: It does not require the data to be totally sorted to compress\n    it. E.g., SlimArray is perfect to store online user histogram data.\n\n- **Ready for transport**: slimarray is protobuf defined, and has the same structure in memory as\non disk. No cost to load or dump.\n\n\n# What It Is And What It Is Not\n\nAnother space efficient data structure to store uint32 array is trie(Aka prefix\ntree or radix tree). It is possible to use bitmap-based btree like structure\nto reduce space(very likely in such case it provides higher compression rate).\nBut it requires the array to be **sorted**.\n\nSlimArray does not have such restriction. It is more adaptive with data\nlayout. To achieve high compression rate, it only requires the data has a\noverall trend, e.g., **roughly sorted**.\n\nAdditionally, it also accept duplicated element in the array, which\na bitmap based or tree-like data structure does not allow.\n\nIn the [ipv4-list](./example/iplist) example, we feed 450,000 ipv4 to SlimArray.\nWe see that SlimArray costs as small as gzip-ed data(`2.1 MB vs 2.0 MB`),\nwhile it provides instance access to the data without decompressing it.\nAnd in the [slimstar](./example/slimstar) example, SlimArray memory usage vs gzip-ed data is 832 bytes vs 602 bytes.\n\n\n# Limitation\n\n- **Static**: slimarray is a static data structure that can not be modified\nafter creation. Thus slimarray is ideal for a time-series-database, i.e., data\nset is huge but never change.\n\n- **32 bits**: currently slimarray supports only one element type `uint32`.\n\n\n# Install\n\n```sh\ngo get github.com/openacid/slimarray\n```\n\n# Synopsis\n\n## Build a SlimArray\n\n```go\npackage slimarray_test\n\nimport (\n\t\"fmt\"\n\n\t\"github.com/openacid/slimarray\"\n)\n\nfunc ExampleSlimArray() {\n\n\tnums := []uint32{\n\t\t0, 16, 32, 48, 64, 79, 95, 111, 126, 142, 158, 174, 190, 206, 222, 236,\n\t\t252, 268, 275, 278, 281, 283, 285, 289, 296, 301, 304, 307, 311, 313, 318,\n\t\t321, 325, 328, 335, 339, 344, 348, 353, 357, 360, 364, 369, 372, 377, 383,\n\t\t387, 393, 399, 404, 407, 410, 415, 418, 420, 422, 426, 430, 434, 439, 444,\n\t\t446, 448, 451, 456, 459, 462, 465, 470, 473, 479, 482, 488, 490, 494, 500,\n\t\t506, 509, 513, 519, 521, 528, 530, 534, 537, 540, 544, 546, 551, 556, 560,\n\t\t566, 568, 572, 574, 576, 580, 585, 588, 592, 594, 600, 603, 606, 608, 610,\n\t\t614, 620, 623, 628, 630, 632, 638, 644, 647, 653, 658, 660, 662, 665, 670,\n\t\t672, 676, 681, 683, 687, 689, 691, 693, 695, 697, 703, 706, 710, 715, 719,\n\t\t722, 726, 731, 735, 737, 741, 748, 750, 753, 757, 763, 766, 768, 775, 777,\n\t\t782, 785, 791, 795, 798, 800, 806, 811, 815, 818, 821, 824, 829, 832, 836,\n\t\t838, 842, 846, 850, 855, 860, 865, 870, 875, 878, 882, 886, 890, 895, 900,\n\t\t906, 910, 913, 916, 921, 925, 929, 932, 937, 940, 942, 944, 946, 952, 954,\n\t\t956, 958, 962, 966, 968, 971, 975, 979, 983, 987, 989, 994, 997, 1000,\n\t}\n\n\ta := slimarray.NewU32(nums)\n\n\tfmt.Println(\"last elt is:\", a.Get(int32(a.Len()-1)))\n\n\tst := a.Stat()\n\tfor _, k := range []string{\n\t\t\"elt_width\",\n\t\t\"mem_elts\",\n\t\t\"bits/elt\"} {\n\t\tfmt.Printf(\"%10s : %d\\n\", k, st[k])\n\t}\n\n\t// Unordered output:\n\t// last elt is: 1000\n\t//  elt_width : 3\n\t//   mem_elts : 112\n\t//   bits/elt : 16\n}\n```\n\n\n## Build a SlimBytes\n\n```go\npackage slimarray_test\n\nimport (\n\t\"fmt\"\n\n\t\"github.com/openacid/slimarray\"\n)\n\nfunc ExampleSlimBytes() {\n\n\trecords := [][]byte{\n\t\t[]byte(\"SlimBytes\"),\n\t\t[]byte(\"is\"),\n\t\t[]byte(\"an\"),\n\t\t[]byte(\"array\"),\n\t\t[]byte(\"of\"),\n\t\t[]byte(\"var-length\"),\n\t\t[]byte(\"records(a\"),\n\t\t[]byte(\"record\"),\n\t\t[]byte(\"is\"),\n\t\t[]byte(\"a\"),\n\t\t[]byte(\"[]byte\"),\n\t\t[]byte(\"which\"),\n\t\t[]byte(\"is\"),\n\t\t[]byte(\"indexed\"),\n\t\t[]byte(\"by\"),\n\t\t[]byte(\"SlimArray\"),\n\t}\n\n\ta, err := slimarray.NewBytes(records)\n\t_ = err\n\n\tfor i := 0; i \u003c 16; i++ {\n\t\tfmt.Print(string(a.Get(int32(i))), \" \")\n\t}\n\tfmt.Println()\n\n\t// Output:\n\t// SlimBytes is an array of var-length records(a record is a []byte which is indexed by SlimArray\n}\n```\n\n# How it works\n\nPackage slimarray uses polynomial to compress and store an array of uint32. A\nuint32 costs only 5 bits in a sorted array of a million number in range [0,\n1000*1000].\n\n\n### The General Idea\n\nWe use a polynomial y = a + bx + cx² to describe the overall trend of the\nnumbers. And for every number i we add a residual to fit the gap between y(i)\nand nums[i]. E.g. If there are 4 numbers: 0, 15, 33, 50 The polynomial and\nresiduals are:\n\n    y = 16x\n    0, -1, 1, 2\n\nIn this case the residuals require 3 bits for each of them. To retrieve the\nnumbers, we evaluate y(i) and add the residual to it:\n\n    get(0) = y(0) + 0 = 16 * 0 + 0 = 0\n    get(1) = y(1) - 1 = 16 * 1 - 1 = 15\n    get(2) = y(2) + 1 = 16 * 2 + 1 = 33\n    get(3) = y(3) + 2 = 16 * 3 + 2 = 50\n\n\n### What It Is And What It Is Not\n\nAnother space efficient data structure to store uint32 array is trie or prefix\ntree or radix tree. It is possible to use bitmap-based btree like structure to\nreduce space(very likely in such case it provides higher compression rate). But\nit requires the array to be sorted.\n\nSlimArray does not have such restriction. It is more adaptive with data layout.\nTo achieve high compression rate, it only requires the data has a overall trend,\ne.g., roughly sorted, as seen in the above 4 integers examples. Additionally, it\nalso accept duplicated element in the array, which a bitmap based or tree-like\ndata structure does not allow.\n\n\n### Data Structure\n\nSlimArray splits the entire array into segments(Seg), each of which has 1024\nnumbers. And then it splits every segment into several spans. Every span has its\nown polynomial. A span has 16*k numbers. A segment has at most 64 spans.\n\n            seg[0]                      seg[1]\n            1024 nums                   1024 nums\n    |-------+---------------+---|---------------------------|...\n     span[0]    span[1]\n     16 nums    32 nums      ..\n\n\n### Uncompressed Data Structures\n\nA SlimArray is a compacted data structure. The original data structures are\ndefined as follow(assumes original user data is `nums []uint32`):\n\n    Seg struct {\n      SpansBitmap   uint64      // describe span layout\n      Rank         uint64      // count `1` in preceding Seg.\n      Spans       []Span\n    }\n\n    Span struct {\n      width         int32       // is retrieved from SpansBitmap\n\n      Polynomial [3]double      //\n      Config struct {           //\n        Offset        int32     // residual offset\n        ResidualWidth int32     // number of bits a residual requires\n      }\n      Residuals  [width][ResidualWidth]bit // pack into SlimArray.Residuals\n    }\n\nA span stores 16*k int32 in it, where k ∈ [1, 64).\n\n`Seg.SpansBitmap` describes the layout of Span-s in a Seg. The i-th \"1\"\nindicates where the last 16 numbers are in the i-th Span. e.g.:\n\n    001011110000......\n    \u003c-- least significant bit\n\nIn the above example:\n\n    span[0] has 16*3 nums in it.\n    span[1] has 16*2 nums in it.\n    span[2] has 16*1 nums in it.\n\n`Seg.Rank` caches the total count of \"1\" in all preceding Seg.SpansBitmap. This\naccelerate locating a Span in the packed field SlimArray.Polynomials .\n\n`Span.width` is the count of numbers stored in this span. It does not need to be\nstored because it can be calculated by counting the \"0\" between two \"1\" in\n`Seg.SpansBitmap`.\n\n`Span.Polynomial` stores 3 coefficients of the polynomial describing the overall\ntrend of this span. I.e. the `[a, b, c]` in `y = a + bx + cx²`\n\n`Span.Config.Offset` adjust the offset to locate a residual. In a span we want\nto have that:\n\n    residual position = Config.Offset + (i%1024) * Config.ResidualWidth\n\nBut if the preceding span has smaller residual width, the \"offset\" could be\nnegative, e.g.: span[0] has residual of width 0 and 16 residuals, span[1] has\nresidual of width 4. Then the \"offset\" of span[1] is `-16*4` in order to\nsatisfy: `(-16*4) + i * 4` is the correct residual position, for i in [16, 32).\n\n`Span.Config.ResidualWidth` specifies the number of bits to store every residual\nin this span, it must be a power of 2: `2^k`.\n\n`Span.Residuals` is an array of residuals of length `Span.width`. Every elt in\nit is a `ResidualWidth`-bits integers.\n\n\n### Compact\n\nSlimArray compact `Seg` into a dense format:\n\n    SlimArray.Bitmap = [\n      Seg[0].SpansBitmap,\n      Seg[1].SpansBitmap,\n      ... ]\n\n    SlimArray.Polynomials = [\n      Seg[0].Spans[0].Polynomials,\n      Seg[0].Spans[1].Polynomials,\n      ...\n      Seg[1].Spans[0].Polynomials,\n      Seg[1].Spans[1].Polynomials,\n      ...\n    ]\n\n    SlimArray.Configs = [\n      Seg[0].Spans[0].Config\n      Seg[0].Spans[1].Config\n      ...\n      Seg[1].Spans[0].Config\n      Seg[1].Spans[1].Config\n      ...\n    ]\n\n`SlimArray.Residuals` simply packs the residuals of every nums[i] together.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenacid%2Fslimarray","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenacid%2Fslimarray","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenacid%2Fslimarray/lists"}