https://github.com/aep/madpack
stupid small json binary encoding
https://github.com/aep/madpack
Last synced: 8 months ago
JSON representation
stupid small json binary encoding
- Host: GitHub
- URL: https://github.com/aep/madpack
- Owner: aep
- Created: 2020-08-20T17:02:28.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2021-07-13T14:37:03.000Z (almost 5 years ago)
- Last Synced: 2025-02-24T14:54:32.667Z (over 1 year ago)
- Language: Go
- Size: 35.2 KB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# madpack
stupid small json binary encoding.
uses mad hax (tm) to pack all valid json faster and smaller than **everything else**
(tested: msgpack, ubjson, protobuf, gzip and zstandard)
## building & usage
get the [zetz.it](https://zetz.it) compiler
zz build --release
./target/release/bin/madpack < some.json > some.madpack
./target/release/bin/madpack --unpack < some.madpack
zetz can automatically create packages for many languages including npm, python, rust, golang.
so lets say we have this small json
```json
{
"sha256": "beep boop yadda",
"commitmsg": "hella",
"stable": false,
"contentsize": 2332
}
```
$ wc -c < some.json
108
compressing it with zstandard (it's really good)
$ zstd some.json | wc -c
99
madpack beats zstd out of the box for small messages
$ ./target/release/bin/madpack < some.json | wc -c
65
but preshared indices are the real magic
$ ./target/release/bin/madpack --make-index < some.json > some.madindex
$ ./target/release/bin/madpack --index some.madindex < some.json | wc -c
29
## preshared index for structured data
The way to beat zipped json is to create an index from a data sample or schema and simply not include that index in the encoded file.
This is what protobuf does, but we still beat protobuf with other mad hax;
Also unlike with protobuf, the message can be fully recovered without the index.
Desaster recovery or bug hunting is much easier with a fully intact message structure.
to create an index from any json run
./target/release/bin/madpack --make-index < some.json > some.madindex
you can then use the index in encoding and decoding
./target/release/bin/madpack --index some.madindex < some.json > some.madpack
./target/release/bin/madpack --index some.madindex --unpack < some.madpack
if you ever loose the index, you can still read the message and guess the key names from context
./target/release/bin/madpack --unpack < some.madpack
{
"1" : 77,
"2" : "0.10.1-4-g2bc35c7-dirty-R3riyLGLmb",
"3" : true,
## encoding
- there's a maximum of 65535 unique strings per file
- key strings can only be 65535 bytes long
- in maps, every member is preceeded by a key byte
- in arrays, every member is preceeded by a value byte
### key byte
000x xxxx 0x00 value is u8
001x xxxx 0x20 value is u16
010x xxxx 0x40 value is f32
011x xxxx 0x60 value is bytes u8
100x xxxx 0x80 value is string u8
101x xxxx 0xa0 value is map
110x xxxx 0xc0 value is array
111x xxxx 0xe0 full value byte follows
xxx0 0000 0x00 reserved
xxx0 0001 0x01 key number 1
xxx1 1010 0x1a key number 26
xxx1 1011 0x1b key number as a u8
xxx1 1100 0x1c key number as a u16
xxx1 1101 0x1d key is a string size u8
xxx1 1110 0x1e key is a string size u16
1111 1111 0xff end
### value byte
0000 0000 0x00 literal 0
0110 1111 0x6f literal 111
0111 0000 0x70 u8
0111 0001 0x71 u16
0111 0010 0x72 u32
0111 0011 0x73 u64
0111 0100 0x74 i8
0111 0101 0x75 i16
0111 0110 0x76 i32
0111 0111 0x77 i64
0111 1000 0x78 null
0111 1001 0x79 true
0111 1010 0x7a false
0111 1011 0x7b map
0111 1100 0x7c array
0111 1101 0x7d f32
0111 1110 0x7e f64
0111 1111 0x7f ext
dynamic size:
1000 xxxx 0x80 string
1001 xxxx 0x90 bytes
1010 xxxx 0xa0 reserved
1011 xxxx 0xb0 reserved
1100 xxxx 0xc0 reserved
1101 xxxx 0xd0 reserved
1110 xxxx 0xe0 reserved
1111 xxxx 0xf0 reserved
0000 size 0
1011 size 11
1100 size see next 1 bytes
1101 size see next 2 bytes
1110 size see next 4 bytes
1111 size see next 8 bytes
1111 1111 0xff end