{"id":17483555,"url":"https://github.com/creationix/nibs","last_synced_at":"2025-04-10T02:43:45.313Z","repository":{"id":40361917,"uuid":"453484347","full_name":"creationix/nibs","owner":"creationix","description":null,"archived":false,"fork":false,"pushed_at":"2024-05-01T03:11:03.000Z","size":2507,"stargazers_count":23,"open_issues_count":8,"forks_count":3,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-10-19T02:51:12.488Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Lua","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/creationix.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-29T18:29:53.000Z","updated_at":"2024-10-07T00:51:32.000Z","dependencies_parsed_at":"2023-09-28T16:31:39.280Z","dependency_job_id":"d06f53ec-ee3a-4bc1-8e70-7a2401f5e122","html_url":"https://github.com/creationix/nibs","commit_stats":{"total_commits":181,"total_committers":2,"mean_commits":90.5,"dds":"0.14364640883977897","last_synced_commit":"4b6041ea48ee5128a477b25beae81f1663c7adeb"},"previous_names":[],"tags_count":21,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/creationix%2Fnibs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/creationix%2Fnibs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/creationix%2Fnibs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/creationix%2Fnibs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/creationix","download_url":"https://codeload.github.com/creationix/nibs/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248145393,"owners_count":21055118,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-19T00:05:53.536Z","updated_at":"2025-04-10T02:43:45.290Z","avatar_url":"https://github.com/creationix.png","language":"Lua","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Nibs Serialization Format\n\nNibs is a new binary serialization format with the following set of priorities:\n\n## Fast Random Access Reads\n\nThis format is designed to be read in-place (similar to cap'n proto) so that arbitrarily large documents can be read with minimal memory or compute requirements.  For example a 1 TiB mega nibs document could be read from a virtual block device where blocks are fetched from the network on-demand and the initial latency to start walking the data structure would be nearly instant.  Large documents could also be written to local NVMe drives and loaded to RAM using memmap.\n\nTo enable this random access, all values are either inline (just the nibs pair) or contain a prefix length so that a single indirection can jump to the next value.  Also some types like the [proposed `array` type](https://github.com/creationix/nibs/issues/4) enable O(1) lookup of arrays of any size.\n\nUserspace types using the [proposed tags](https://github.com/creationix/nibs/issues/4) can enable O(1) misses and O(log n) hits for trees via userspace bloom filters and hash array mapped tries.\n\n## Self Documenting\n\nNibs documents are similar to JSON in that the objects are self documenting (unlike protobuf that depends on implicit external schemas).  This works very well for dynamic programming environments like JavaScript and Lua or for replacing existing JSON workloads.\n\nBut if a developer chooses to also have schemas, it's possible to encode values as a nibs `list` and then the code would know what each positional value it.\n\n## Compact on the Wire\n\nNibs tries to balance between compactness and simplicity and finds a nice middle ground.  Especially when combined with [the `ref` type](https://github.com/creationix/nibs/issues/4) typical JSON payloads can be made considerably smaller.  Numbers are very compact, binary can be enbedded as-is without base64 or hex encoding, etc.\n\n## Simple to Implement\n\nOne of the main goals of nibs vs existing formats is it aims to be simple implement.  It should be possible for a single developer with experience writing serilization libraries to have an initial working version very quickly so that new languages/tools can easily adopt it.  This also means that these libraries are likely to have no dependencies themselves keeping it lean.\n\nThis is a much simpler format than pretty much any of the existing formats except for JSON.\n\n## Simple to Understand\n\nAnother goal is for the format itself to be simple to understand and think about.  The nibs-pair encoding is the same for all value types.  The types are grouped into similar behavior.  Anything complex is pushed out to userspace.\n\n## Superset of JSON\n\nAny value that can be encoded in JSON can also be encoded in nibs.  In this way it's similar to msgpack and cbor format, but it's much faster to read since it doesn't require parsing the whole document first.\n\nThere is also a [proposal to add a textual representation that is a superset of JSON](https://github.com/creationix/nibs/issues/3).  This would make it even easier to integrate into systems that use JSON or need textual representations of data (like config files or documentation).\n\n## Implementations\n\n- [JavaScript](js/README.md)\n- [LuaJit](lua/README.md)\n- Go (coming soon)\n\n## Binary Nibs Format Specification\n\nAll multi-byte numbers in this spec are assumed to be little-endian.\n\nIn this document *\"should\"* means that an implementation is recommended to work this way.\nHowever *\"must\"* means that it is not considered spec compliant without said behavior.\n\n## Integer Pair Encoding\n\nThere are 5 possible encoding patterns depending on the size of the second number:\n\n```js\nxxxx yyyy\nxxxx 1100 yyyyyyyy\nxxxx 1101 yyyyyyyy yyyyyyyy\nxxxx 1110 yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy\nxxxx 1111 yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy\n          yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy\n```\n\nHere the `x`s are a `u4` and the `y`s are semantically a `u64` using zero extension on the smaller numbers.\n\nEncoders *should* only use the smallest possible encoding for a given value.\n\nDecoders *must* accept all.\n\n## Nibs Value Types\n\nFor each encoded integer pair, the first small number is the type and the big number is it's parameter:\n\n```c++\nenum Type {\n\n    // Inline types.\n    ZigZag    = 0, // big = zigzag encoded i64\n    Float     = 1, // big = binary encoding of float\n    Simple    = 2, // big = subtype (false, true, null)\n    Ref       = 3, // big = reference offset into nearest parent RefScope array\n\n    // slots 4-7 reserved\n\n    // Prefixed length types.\n    Bytes     = 8, // big = len (raw octets)\n    Utf8      = 9, // big = len (utf-8 encoded unicode string)\n    HexString = a, // big = len (lowercase hex string stored as binary)\n    List      = b, // big = len (list of nibs values)\n    Map       = c, // big = len (list of alternating nibs keys and values)\n    Array     = d, // big = len (array index then list)\n                   // small2 = width, big2 = count\n    Trie      = e, // big = len (trie index then list)\n                   // small2 = width, big2 = count\n    Scope     = f, // big = len (wrapped value, then array of refs)\n                   // small2 = width, big2 = count\n};\n```\n\n### ZigZag Integers\n\nThe `integer` type has `i64` range, but is encoded with zigzag encoding to take advantage of the smaller nibs representations for common values.\n\nThis maps negative values to positive values while going back and forth:\n\n`(0 = 0, -1 = 1, 1 = 2, -2 = 3, 2 = 4, -3 = 5, 3 = 6 ...)`\n\n```c\n// Convert between signed value and `u64` bitfield representations.\nuint64_t encodeZigZag(int64_t i) {\n  return (i \u003e\u003e 63) ^ (i \u003c\u003c 1);\n}\nint64_t decodeZigZag(uint64_t i) {\n  return (i \u003e\u003e 1) ^ -(i \u0026 1);\n}\n```\n\nThe best way to show this is with some examples going from encoded bytes to dissambly to final semantic meaning.\n\n```lua\n00 --\u003e ZigZag(0)\n--\u003e 0\n\n03 --\u003e ZigZag(3)\n--\u003e -2\n\n0c 54 --\u003e ZigZag-8(84)\n--\u003e 42\n\n0d d0 07 --\u003e ZigZag-16(2000)\n--\u003e 1000\n\n0e 40 0d 03 00 --\u003e ZigZag-32(200000)\n--\u003e 100000\n\n0f 00 c8 17 a8 04 00 00 00 --\u003e ZigZag-64(20000000000)\n--\u003e 10000000000\n```\n\n### Floating Point Numbers\n\nThe `float` type is stored as binary-64 (aka `double`) bitcast to u64.\n\n```c\n// Convert between `f64` (double precision floating point) and `u64` bitfield representations.\nuint64_t encodeDouble(double i) {\n  return *(uint64_t*)(\u0026i);\n}\ndouble decodeDouble(uint64_t i) {\n  return *(double*)(\u0026i);\n}\n```\n\nThis means that in practice it will nearly always use the largest representation since `double` almost always uses the high bits.\n\n```lua\n1f 18 2d 44 54 fb 21 09 40 --\u003e Float-64(0x400921fb54442d18)\n--\u003e 3.1415926535897930\n\n1f 00 00 00 00 00 00 f0 7f --\u003e Float-64(0x7ff0000000000000)\n--\u003e inf\n\n1f 00 00 00 00 00 00 f0 ff --\u003e Float-64(0xfff0000000000000)\n--\u003e -inf\n\n1f 00 00 00 00 00 00 f8 ff --\u003e Float-64(0xfff8000000000000)\n--\u003e nan\n```\n\n### Simple SubTypes\n\nThe simple type has it's own subtype enum for booleans and null.\n\n```c++\nenum SubType {\n    False     = 0,\n    True      = 1,\n    Null      = 2,\n\n    // slots 3-7 reserved\n};\n```\n\nThese are simple indeed to encode.\n\n```lua\n20 --\u003e Simple(1)\n--\u003e false\n\n21 --\u003e Simple(1)\n--\u003e true\n\n22 --\u003e Simple(2)\n--\u003e null\n```\n\n### Bytes\n\nBytes are a container for raw octets.\n\n```lua\n84 --\u003e Bytes(4)\n  de ad be ef --\u003e 0xde 0xad 0xbe 0xef\n--\u003e \u003cdeadbeef\u003e\n```\n\n### Utf8 Unicode Strings\n\nMost strings are stored as utf-8 encoded unicode wrapped in nibs.  Codepoints higher than 16-bits are allowed, but also are surrogate pairs.  It is recommended to not encode as surrogate pairs and use the smaller native encoding utf-8 allows.\n\n```lua\n9b --\u003e Utf8(11)\n  f0 9f 8f b5 --\u003e `🏵`\n  52 4f 53 45 54 54 45 --\u003e `R` `O` `S` `E` `T` `T` `E`\n--\u003e \"🏵ROSETTE\"\n\n9c 18 --\u003e Utf8-8(24)\n  f0 9f 9f a5 --\u003e `🟥`\n  f0 9f 9f a7 --\u003e `🟧`\n  f0 9f 9f a8 --\u003e `🟨`\n  f0 9f 9f a9 --\u003e `🟩`\n  f0 9f 9f a6 --\u003e `🟦`\n  f0 9f 9f aa --\u003e `🟪`\n--\u003e \"🟥🟧🟨🟩🟦🟪\"\n\n95 --\u003e Utf8(5)\n  f0 9f 91 b6 --\u003e `👶`\n  21 --\u003e `!`\n--\u003e \"👶!\"\n```\n\n### Hex Strings\n\nHex Strings are an optimization for common string values that are an even number of lowercase hexadecimal characters.  They are stored in half the space by storing the pairs as bytes, but are strings externally.\n\n```lua\na4 --\u003e HexString(4)\n  de ad be ef --\u003e 0xde 0xad 0xbe 0xef\n--\u003e \"deadbeef\"\n```\n\n### List\n\nThe `list` type is a ordered list of values.  It's encoded as zero or more nibs encoded values concatenated back to back.  These have O(n) lookup cost since the list of items needs to be scanned linearly.\n\n```lua\nb0 --\u003e List(0)\n--\u003e []\n\nb3 --\u003e List(3)\n  02 --\u003e ZigZag(2)\n  04 --\u003e ZigZag(4)\n  06 --\u003e ZigZag(6)\n--\u003e [1,2,3]\n\nb6 --\u003e List(6)\n  b1 --\u003e List(1)\n    02 --\u003e ZigZag(2)\n  b1 --\u003e List(1)\n    04 --\u003e ZigZag(4)\n  b1 --\u003e List(1)\n    06 --\u003e ZigZag(6)\n--\u003e [[1],[2],[3]]\n```\n\n### Map\n\nMap is the same, except the items are considered alternatinv keys and values.  Lookup by key is O(2n).\n\n```lua\ncb --\u003e Map(11)\n  94 --\u003e Utf8(4)\n    6e 61 6d 65 --\u003e `n` `a` `m` `e`\n  93 --\u003e Utf8(3)\n    54 69 6d --\u003e `T` `i` `m`\n  21 --\u003e Simple(1)\n  20 --\u003e Simple(0)\n--\u003e {\"name\":\"Tim\",true:false}\n```\n\n### Array\n\nThe `array` type is like list, except it includes an array of pointers before the payload to enable O(1) lookups.\n\nThis index is encoded via a secondary nibs pair where small is the byte width of the pointers and big is the number of entries.  This is followed by the pointers as offset distances from the end of the index (the start of the list of values).\n\n```lua\nd7 --\u003e Array(7)\n  13 --\u003e ArrayIndex(width=1,count=3)\n    00 --\u003e Pointer(0)\n    01 --\u003e Pointer(1)\n    02 --\u003e Pointer(2)\n  02 --\u003e ZigZag(2)\n  04 --\u003e ZigZag(4)\n  06 --\u003e ZigZag(6)\n--\u003e [1,2,3]\n```\n\n### Trie\n\nA trie is an indexed map, this is done by creating a HAMT prefix trie from the nibs binary encoded map key hashed.\n\nThis index is a HAMT ([Hash Array Mapped Trie](https://en.wikipedia.org/wiki/Hash_array_mapped_trie)). The keys need to be mapped to uniformly distributed hashes.  By default nibs uses the [xxhash64](https://github.com/Cyan4973/xxHash) algorithm.\n\nThe secondary nibs pair is pointer width and size of trie in entries.\n\nExample key hashing.\n\n```c++\nkey = \"name\"                   // \"name\"\nencoded = nibs.encode(key)     // \u003c946e616d65\u003e\nseed = 0                       // 0\nhash = xxhash64(encoded, seed) // 0xff0dd0ea8d956135ULL\n```\n\n```lua\nec 11 --\u003e Trie-8(17)\n  14 --\u003e TrieIndex(width=4,count=4)\n    00 --\u003e HashSeed(0)\n    21 --\u003e Bitmask([0,5])\n    8a --\u003e Leaf(10)\n    80 --\u003e Leaf(0)\n  94 --\u003e Utf8(4)\n    6e --\u003e 'n'\n    61 --\u003e 'a'\n    6d --\u003e 'm'\n    65 --\u003e 'e'\n  94 --\u003e Utf8(4)\n    4e --\u003e 'N'\n    69 --\u003e 'i'\n    62 --\u003e 'b'\n    73 --\u003e 's'\n  21 --\u003e Simple(1)\n  20 --\u003e Simple(0)\n--\u003e {\"name\":\"Nibs\",true:false}\n```\n\nThe same value with a worse seed chosen can show an internal node:\n\n```lua\nec 13 --\u003e Trie-8(19)\n  16 --\u003e IndexHeader(width=1,count=6)\n    03 --\u003e HashSeed(3)\n    04 --\u003e Bitmask([2])\n    00 --\u003e Pointer(0)\n    22 --\u003e Bitmask([1,5])\n    80 --\u003e Leaf(0)\n    8a --\u003e Leaf(10)\n  94 --\u003e Utf8(4)\n    6e --\u003e 'n'\n    61 --\u003e 'a'\n    6d --\u003e 'm'\n    65 --\u003e 'e'\n  94 --\u003e Utf8(4)\n    4e --\u003e 'N'\n    69 --\u003e 'i'\n    62 --\u003e 'b'\n    73 --\u003e 's'\n  21 --\u003e Simple(1)\n  20 --\u003e Simple(0)\n--\u003e {\"name\":\"Nibs\",true:false}\n```\n\n### HAMT Encoding\n\nEach node in the trie index has a bitfield so that only used pointers need to be stored.\n\nFor example, consider a simplified 4-bit wide trie node with 4 hashes pointing to values at offsets 0,1,2,3:\n\n- `0101` -\u003e 0\n- `0011` -\u003e 1\n- `1010` -\u003e 2\n- `1011` -\u003e 3\n\nSince the width is 4 bits, we can only consume the hash 2 bits at a time (starting with least-significant).\n\nThis means the root node has 3 entries for `01`, `10`, and `11`.  Since two keys share the `11` prefix a second node is needed.\n\n```c++\n// Hash config\n 0000 // (seed 0)\n// Root Node (xxxx)\n 1110 // Bitfield [1,2,3]\n1 000 // xx01 -\u003e leaf 0\n1 010 // xx10 -\u003e leaf 2\n0 000 // xx11 -\u003e node 0\n// Second Node (xx11)\n 0101 // Bitfield [0,2]\n1 001 // 0011 -\u003e leaf 1\n1 000 // 1011 -\u003e leaf 3\n```\n\nFor each 1 in the bitfield, a pointer follows in the node.  The least significant bit is 0, most significant is 3.\n\nThe pointers have a 1 prefix in the most significant position when pointing to a leaf node.  The value is offset from the start of the map (after the index).  Internal pointers start with a 0 in the most significant position followed by an offset from the end of the pointer.\n\n### References\n\nThe `ref` type is used to reference into a userspace table of values.  The table is found by in the nearest `scope` wrapping the current value.\n\nTypically this is the outermost value in a nibs document so that all data can reuse the same refs array.\n\nThis is encoded like array, except it's semantic meaning is special.  All entries except for the\nlast store the values of referenced values and the last entry can then reference them by index.\n\nFor example, consider the following value:\n\n```js\n// Original Value\n[ { color: \"red\", fruits: [\"apple\", \"strawberry\"] },\n  { color: \"green\", fruits: [\"apple\"] },\n  { color: \"yellow\", fruits: [\"apple\", \"banana\"] } ]\n```\n\nA good refs table for this would be to pull out the repeated strings since their refs overhead is smaller then their encoding costs:\n\n```js\n// Refs Table\n[ \"color\", \"fruits\", \"apple\" ]\n```\n\nThen the encoded value would look more like this with the refs applied.\n\n```js\n// encoding with refs and refsscope\nRefScope(\n  \"color\", \"fruits\", \"apple\",\n  [ { \u00260: \"red\", \u00261: [\u00262, \"strawberry\"] },\n    { \u00260: \"green\", \u00261: [\u00262] },\n    { \u00260: \"yellow\", \u00261: [\u00262, \"banana\"] } ]\n)\n```\n\nIn this example, the refs table overhead is:\n\n```txt\n+2 \u003c- RefScope-8\n+1 \u003c- IndexHeader\n+3 \u003c- 3 pointers 1 byte each\n-5 \u003c- \"color\" to Ref(0)\n-6 \u003c- \"fruits\" to Ref(1)\n-5 \u003c- \"apple\" to Ref(2)\n-5 \u003c- \"color\" to Ref(0)\n-6 \u003c- \"fruits\" to Ref(1)\n-5 \u003c- \"color\" to Ref(0)\n-5 \u003c- \"apple\" to Ref(2)\n-6 \u003c- \"fruits\" to Ref(1)\n-5 \u003c- \"apple\" to Ref(2)\n+6 \u003c- \"color\"\n+7 \u003c- \"fruits\"\n+6 \u003c- \"apple\"\n-2 \u003c- some nibs pairs jump to inline instead of 8 bit length\n------------------------\n25 bytes saved!\n```\n\nAnother example is encoding `[4,2,3,1]` using the refs `[1,2,3,4]`\n\n```lua\nfc 0f --\u003e Ref-8(15)\n  14 --\u003e ArrayIndex(width=1,count=4)\n    00 --\u003e Pointer(0) -\u003e 1\n    01 --\u003e Pointer(1) -\u003e 2\n    02 --\u003e Pointer(2) -\u003e 3\n    03 --\u003e Pointer(3) -\u003e 4\n    04 --\u003e Pointer(4) -\u003e value\n  02 --\u003e ZigZag(2) = 1\n  04 --\u003e ZigZag(4) = 2\n  06 --\u003e ZigZag(6) = 3\n  08 --\u003e ZigZag(8) = 4\n  b4 --\u003e List(4)\n    33 --\u003e Ref(3) -\u003e Pointer(8) -\u003e 4\n    31 --\u003e Ref(1) -\u003e Pointer(6) -\u003e 2\n    32 --\u003e Ref(2) -\u003e Pointer(7) -\u003e 3\n    30 --\u003e Ref(0) -\u003e Pointer(5) -\u003e 1\n--\u003e RefScope(1,2,3,4,[\u00263,\u00261,\u00262,\u00260])\n```\n\nNote that refs are always zero indexed even if your language normally starts indices at 1.\n\n```lua\nfb --\u003e Ref(11)\n  13 --\u003e ArrayIndex(width=1,count=2)\n    00 --\u003e Pointer(0) -\u003e \"dead\"\n    03 --\u003e Pointer(6) -\u003e \"beef\"\n    06 --\u003e Pointer(6) -\u003e value\n  a2 --\u003e HexString(2)\n    dead\n  a2 --\u003e HexString(2)\n    beef\n  31 --\u003e Ref(1) -\u003e \"beef\"\n--\u003e RefScope(\"dead\",\"beef\",\u00261)\n```\n\nThe larger ref example from above would be encoded like this:\n\n```lua\nfc 4f --\u003e Ref-8(79)\n  14 --\u003e ArrayIndex(width=1,count=3)\n    00 --\u003e Ptr(0)\n    06 --\u003e Ptr(6)\n    0d --\u003e Ptr(13)\n    13 --\u003e Ptr(19)\n  95636f6c6f72 --\u003e \"color\"\n  96667275697473 --\u003e \"fruits\"\n  956170706c65 --\u003e \"apple\"\n  bc 35 --\u003e List-8(53)\n    cc 14 --\u003e Map-8(20)\n      30 --\u003e Ref(0)\n      93726564 --\u003e \"red\"\n      31 --\u003e Ref(1)\n      bc 0c --\u003e List-8(12)\n        32 --\u003e Ref(2)\n        9a73747261776265727279 --\u003e \"strawberry\"\n    ca --\u003e Map(10)\n      30 --\u003e Ref(0)\n      95677265656e --\u003e \"green\"\n      31 --\u003e Ref(1)\n      b1 --\u003e List(1)\n        32 --\u003e Ref(2)\n    cc 12 --\u003e Map-8(18)\n      30 --\u003e Ref(0)\n      9679656c6c6f77 --\u003e \"yellow\"\n      31 --\u003e Ref(1)\n      b8 --\u003e List(8)\n        32 --\u003e Ref(2)\n        9662616e616e61 --\u003e \"banana\"\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcreationix%2Fnibs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcreationix%2Fnibs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcreationix%2Fnibs/lists"}