An open API service indexing awesome lists of open source software.

https://github.com/risingwavelabs/jsonbb

A binary representation of json value, optimized for parsing and querying.
https://github.com/risingwavelabs/jsonbb

Last synced: 6 months ago
JSON representation

A binary representation of json value, optimized for parsing and querying.

Awesome Lists containing this project

README

          

# jsonbb

[![Crate](https://img.shields.io/crates/v/jsonbb.svg)](https://crates.io/crates/jsonbb)
[![Docs](https://docs.rs/jsonbb/badge.svg)](https://docs.rs/jsonbb)

`jsonbb` is a binary representation of JSON value. It is inspired by [JSONB](https://www.postgresql.org/docs/current/datatype-json.html) in PostgreSQL and optimized for fast parsing.

## Usage

`jsonbb` provides an API similar to `serde_json` for constructing and querying JSON values.

```rust
// Deserialize a JSON value from a string of JSON text.
let value: jsonbb::Value = r#"{"name": ["foo", "bar"]}"#.parse().unwrap();

// Serialize a JSON value into JSON text.
let json = value.to_string();
assert_eq!(json, r#"{"name":["foo","bar"]}"#);
```

As a binary format, you can extract byte slices from it or read JSON values from byte slices.

```rust
// Get the underlying byte slice of a JSON value.
let jsonbb = value.as_bytes();

// Read a JSON value from a byte slice.
let value = jsonbb::ValueRef::from_bytes(jsonbb);
```

You can use common API to query JSON and then build new JSON values using the `Builder` API.

```rust
// Indexing
let name = value.get("name").unwrap();
let foo = name.get(0).unwrap();
assert_eq!(foo.as_str().unwrap(), "foo");

// Build a JSON value.
let mut builder = jsonbb::Builder::>::new();
builder.begin_object();
builder.add_string("name");
builder.add_value(foo);
builder.end_object();
let value = builder.finish();
assert_eq!(value.to_string(), r#"{"name":"foo"}"#);
```

## Format

`jsonbb` stores JSON values in contiguous memory. By avoiding dynamic memory allocation, it is more cache-friendly and provides efficient **parsing** and **querying** performance.

It has the following key features:

1. Memory Continuity: The content of any JSON subtree is stored contiguously, allowing for efficient copying through `memcpy`. This leads to highly efficient indexing operations.
2. Post-Order Traversal: JSON nodes are stored in post-order traversal sequence. When parsing JSON strings, output can be sequentially written to the buffer without additional memory allocation and movement. This results in highly efficient parsing operations.

## Performance Comparison

| item[^0] | jsonbb | [jsonb] | [serde_json] | [simd_json] |
| --------------------------- | --------- | --------- | -------------- | -------------- |
| `canada.parse()` | 4.7394 ms | 12.640 ms | 10.806 ms | 6.0767 ms [^1] |
| `canada.to_json()` | 5.7694 ms | 20.420 ms | 5.5702 ms | 3.0548 ms |
| `canada.size()` | 2,117,412 B | 1,892,844 B | | |
| `canada["type"]`[^2] | 39.181 ns[^2.1] | 316.51 ns[^2.2] | 67.202 ns [^2.3] | 27.102 ns [^2.4] |
| `citm_catalog["areaNames"]` | 92.363 ns | 328.70 ns | 2.1190 µs [^3] | 1.9012 µs [^3] |
| `from("1234567890")` | 26.840 ns | 91.037 ns | 45.130 ns | 21.513 ns |
| `a == b` | 66.513 ns | 115.89 ns | 39.213 ns | 41.675 ns |
| `a < b` | 71.793 ns | 120.77 ns | not supported | not supported |

[jsonb]: https://docs.rs/jsonb/0.3.0/jsonb/
[serde_json]: https://docs.rs/serde_json/1.0.107/serde_json/
[simd_json]: https://docs.rs/simd-json/0.12.0/simd_json/

[^0]: JSON files for benchmark: [canada](https://github.com/datafuselabs/jsonb/blob/6b3f03effc08e1ca3cad69199e4cb1398e482757/data/canada.json), [citm_catalog](https://github.com/datafuselabs/jsonb/blob/6b3f03effc08e1ca3cad69199e4cb1398e482757/data/citm_catalog.json)

[^1]: Parsed to [`simd_json::OwnedValue`](https://docs.rs/simd-json/0.12.0/simd_json/value/owned/enum.Value.html) for fair.

[^2]: `canada["type"]` returns a string, so the primary overhead of this operation lies in indexing.

[^2.1]: `jsonbb` uses binary search on sorted keys
[^2.2]: `jsonb` uses linear search on unsorted keys
[^2.3]: `serde_json` uses `BTreeMap`
[^2.4]: `simd_json` uses `HashMap`

[^3]: `citm_catalog["areaNames"]` returns an object with 17 key-value string pairs. However, both `serde_json` and `simd_json` exhibit slower performance due to dynamic memory allocation for each string. In contrast, jsonb employs a flat representation, allowing for direct memcpy operations, resulting in better performance.