https://github.com/tiw302/cjsonx
High-performance C11 JSON parser with a 16-byte Flat-DOM, SIMD acceleration
https://github.com/tiw302/cjsonx
avx2 c c11 embedded header-only high-performance json json-parser memory-efficient neon parser simd wasm webassembly
Last synced: 13 days ago
JSON representation
High-performance C11 JSON parser with a 16-byte Flat-DOM, SIMD acceleration
- Host: GitHub
- URL: https://github.com/tiw302/cjsonx
- Owner: tiw302
- License: mit
- Created: 2026-05-30T14:08:52.000Z (25 days ago)
- Default Branch: master
- Last Pushed: 2026-06-10T12:30:16.000Z (15 days ago)
- Last Synced: 2026-06-10T14:09:01.584Z (14 days ago)
- Topics: avx2, c, c11, embedded, header-only, high-performance, json, json-parser, memory-efficient, neon, parser, simd, wasm, webassembly
- Language: C
- Homepage: https://tiw302.github.io/cjsonx/
- Size: 975 KB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Extreme-performance JSON parser for C11/C++ featuring a 16-byte ultra-compact DOM.
# cjsonx
[](https://github.com/tiw302/cjsonx/actions)
[](https://github.com/tiw302/cjsonx/actions)
[](https://github.com/tiw302/cjsonx/actions)
[](https://github.com/tiw302/cjsonx/actions)
[](https://opensource.org/licenses/MIT)
[](https://en.wikipedia.org/wiki/C11_(C_standard_revision))
[](#installation)
[](#introduction)
**[Read the Official Documentation: docs/index.md](https://tiw302.github.io/cjsonx/)**
**[Try the Live WebAssembly Demo: https://tiw302.github.io/cjsonx/demo/](https://tiw302.github.io/cjsonx/demo/)**
> **Verified Compatibility — Cross-Platform Passing**
| Architecture | Platform | Verified Backend |
| :--- | :--- | :--- |
| **x86_64 (Modern)** | Linux / Windows | **AVX2** (Vectorized) |
| **ARM64 (Apple)** | macOS (M1/M2/M3) | **NEON** (Vectorized) |
| **WebAssembly** | Chrome / Node.js | **WASM-SIMD128** |
| **RISC-V64** | Linux (QEMU) | **Scalar** C11 |
| **General Desktop** | Linux / Windows | **Scalar** C11 Fallback |
---
## Table of Contents
| Introduction | Setup & Build | Docs & Metrics |
|---|---|---|
| [Overview](#introduction) | [Requirements](#requirements) | [API Reference](#api-reference) |
| [Why cjsonx?](#why-cjsonx) | [Toolchains](#verified-toolchains) | [Documentation](#documentation) |
| [Philosophy](#design-philosophy) | [Installation](#build-and-installation) | [Examples](#examples) |
| [Limits & Guarantees](#limits--guarantees) | [AI Methodology](#development-methodology--ai-assistance) | [Benchmarks](#benchmark-results) |
| [License](#license) | | |
---
## Introduction
**cjsonx** is a header-only C library for parsing JSON. It is designed to achieve high parsing speeds (exceeding 1.0 GB/s on modern hardware) while offering a fully mutable, ultra-compact 16-byte Flat-DOM.
Built on top of a highly optimized dual-stage architecture, `cjsonx` validates structural characters using SIMD bitmasks (AVX2/NEON/WASM-SIMD) before applying a recursive descent parsing phase that utilizes the state-of-the-art Eisel-Lemire algorithm for blazing-fast 64-bit IEEE 754 floating-point numerical conversions.
---
## Why cjsonx?
Standard JSON parsers often face specific limitations: they can be slower due to heavy heap allocation per node (using `malloc` recursively), or they consume excessive memory per node (e.g., standard parsers often require 56-64 bytes per node).
`cjsonx` was built to address these specific use cases by providing a fully mutable DOM while drastically reducing memory overhead and maximizing computational throughput:
| Parser | Speed (Large Payload) | DOM Node Size | Allocation Strategy | Portability |
|---|---|---|---|---|
| `cJSON` | ~130 MB/s | ~64 bytes | Heavy (O(N) Malloc) | Universal |
| `jsmn` | ~600 MB/s | Tokenizer Only | None | Universal |
| `yyjson` | ~1000+ MB/s | 16-24 bytes | Arena | High |
| **cjsonx** | **~1000+ MB/s** | **16 bytes (Fixed)** | **Flat Arena** | **Universal** |
cjsonx aims to provide an alternative: **delivering high throughput and a fully mutable DOM while maintaining an incredibly dense 16-byte memory footprint.**
---
## Trade-offs & Alternatives (When NOT to use cjsonx)
We believe in engineering honesty. `cjsonx` is built for a specific niche and is *not* a silver bullet. You should evaluate alternatives if your requirements match the following:
- **Need the absolute fastest C++ parser?** Use [simdjson](https://github.com/simdjson/simdjson). It runs at 3-6 GB/s and is the industry gold standard for C++ server backends. `cjsonx` is pure C11 and cannot compete with their multi-year optimized C++ engine.
- **Need a battle-tested, general-purpose C parser?** Use [yyjson](https://github.com/ibireme/yyjson). It is incredibly fast, highly optimized for general use cases, and has a massive community.
- **Need to drop in a ubiquitous, legacy C parser?** Use [cJSON](https://github.com/DaveGamble/cJSON). It's older and much slower, but it works on ancient C89 compilers and has no modern standard requirements. (Note: `cjsonx` also runs without SIMD on any platform via its Scalar fallback, but requires a C11-compliant compiler).
**So when *should* you use cjsonx?**
1. **High-Performance Mutable Data:** You need a pure C11 parser that allows you to read, edit, add, and remove JSON nodes rapidly, and stringify them back to JSON text without rebuilding the entire document.
2. **Strict Memory Constraints (IoT/RTOS):** You need high-speed parsing but absolutely **refuse to waste memory**. Our 16-byte nodes use 4x less RAM than traditional parsers like cJSON. Additionally, `cjsonx_parse_with_buffer()` provides a True Zero-Allocation mode for embedded systems.
3. **WASM Edge Functions (Cloudflare Workers / Fastly):** You need a pure C11 parser that compiles effortlessly to WebAssembly and leverages WASM-SIMD128 for native execution at the edge, without the heavy overhead of C++ engines.
---
## Design Philosophy
The library is built around three strict constraints:
**Flat Arena DOM.** There are no calls to `malloc` per node. The entire document tree is parsed sequentially into a continuous array of 16-byte structs. This guarantees cache locality and enables O(1) skipping over complex objects and arrays during iteration.
**State-of-the-art Number Parsing.** `cjsonx` incorporates the Eisel-Lemire fast float algorithm directly into its lexical analysis phase. It parses 99.9% of all IEEE 754 floating-point numbers natively using a single fast path, falling back to strict standard library parsing only on extreme mathematical edge cases.
**Zero OS-Dependencies.** The library is built entirely on standard C11. It does not rely on OS-specific file I/O or POSIX headers. It compiles seamlessly to WebAssembly, embedded ARM targets, and standard desktop operating systems.
**True Zero-Allocation Mode.** For strict embedded constraints, the `cjsonx_parse_with_buffer()` API completely bypasses `malloc` by parsing the JSON entirely into a user-provided fixed-size stack buffer or RTOS memory pool.
---
## Limits & Guarantees
Professional-grade software requires transparent technical boundaries. Here is exactly what `cjsonx` guarantees, and where it draws the line:
- **RFC 8259 Compliance:** `cjsonx` strictly adheres to RFC 8259 and ECMA-404. It correctly rejects structural anomalies, unescaped control characters, and deeply nested bombs.
- **Thread Safety:** The core parsing engine is entirely stateless. Multiple threads can safely parse different JSON documents concurrently without any mutexes or locks.
- **Length Limit:** The maximum byte length of any single string or serialized container is 16MB (specifically, 16,777,215 bytes, due to the 24-bit length field packed in the 16-byte DOM node structure).
- **Nesting Depth Limit:** The stringification routines enforce a maximum nesting depth limit of 512 (`CJSONX_MAX_DEPTH`) to prevent stack overflow when printing extremely nested JSON.
- **Builder Performance:** Pushing elements to an array via `cjsonx_array_push` is an O(N) operation because it traverses the list of siblings to locate the end of the array. Repeated sequential pushes to build large arrays will result in O(N^2) complexity.
---
## Requirements
| Component | Requirement |
|---|---|
| C Standard | C11 or later |
| Compiler | GCC 4.9+, Clang 3.5+, MSVC 2019+, Emscripten 3.0+ |
| Dependencies| None (Standard C Library only) |
---
## Verified Toolchains
The following toolchains are tested on every commit via GitHub Actions:
| Toolchain | Platform | Backend |
|---|---|---|
| GCC | Linux x86_64 | Scalar, AVX2 |
| GCC (riscv64-linux-gnu) | Linux RISC-V64 (QEMU) | Scalar |
| Clang | macOS Apple Silicon | NEON |
| MSVC | Windows x64 | Scalar, AVX2 |
| Emscripten | WASM (Node.js) | WASM-SIMD, Scalar |
---
## Build and Installation
`cjsonx` is entirely header-only.
### Single-Header Distribution (Recommended)
The simplest integration is copying the amalgamated `single_include/cjsonx.h` into your project. Define the implementation macro in **exactly one** C file to compile the core functions:
```c
#define CJSONX_IMPLEMENTATION
#include "cjsonx.h"
```
All other translation units should include the header without the macro.
### CMake (System Install)
You can build the test suites and install the library system-wide:
```bash
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
sudo cmake --install build
```
Then in your project's `CMakeLists.txt`:
```cmake
find_package(cjsonx REQUIRED)
target_link_libraries(my_app PRIVATE cjsonx::cjsonx)
```
---
## API Reference
### Core Parsing
| Function | Signature | Description |
|---|---|---|
| `cjsonx_parse` | `cjsonx_doc_t* cjsonx_parse(const char* json, size_t length)` | Parses a JSON string into a managed document tree. Returns `NULL` on fatal memory error. Check `doc->is_valid` for syntax status. |
| `cjsonx_parse_ex` | `cjsonx_doc_t* cjsonx_parse_ex(const char* json, size_t length, cjsonx_allocator_t* alloc)` | Parses a JSON string using custom memory allocation hooks. |
| `cjsonx_parse_with_buffer` | `cjsonx_doc_t* cjsonx_parse_with_buffer(const char* json, size_t length, void* buffer, size_t buffer_size)` | Zero-allocation mode parsing JSON directly into a user-provided buffer. |
| `cjsonx_doc_free` | `void cjsonx_doc_free(cjsonx_doc_t* doc)` | Frees the entire document arena in a single call. |
| `cjsonx_error_string` | `const char* cjsonx_error_string(cjsonx_error_t err)` | Translates an error code into a human-readable string. |
### DOM Access
| Function | Signature | Description |
|---|---|---|
| `cjsonx_get` | `cjsonx_val_t cjsonx_get(cjsonx_val_t obj, const char* key)` | Retrieves a child node from an Object by its exact string key. |
| `cjsonx_get_index` | `cjsonx_val_t cjsonx_get_index(cjsonx_val_t arr, size_t index)` | Retrieves a child node from an Array by its index. |
| `cjsonx_get_type` | `cjsonx_type_t cjsonx_get_type(cjsonx_val_t val)` | Returns the type of the node (`CJSONX_STRING`, `CJSONX_NUMBER`, etc.). |
| `cjsonx_num` | `double cjsonx_num(cjsonx_val_t val)` | Retrieves the numerical value as a float. |
| `cjsonx_int` | `int64_t cjsonx_int(cjsonx_val_t val)` | Retrieves the numerical value as a 64-bit integer. |
| `cjsonx_str` | `const char* cjsonx_str(cjsonx_val_t val)` | Retrieves the string pointer. Note: strings may not be null-terminated if they are zero-copy references. |
| `cjsonx_str_len` | `size_t cjsonx_str_len(cjsonx_val_t val)` | Returns the exact length of the string. |
| `cjsonx_size` | `size_t cjsonx_size(cjsonx_val_t val)` | Returns the element count of an Array or Object. |
| `cjsonx_bool` | `bool cjsonx_bool(cjsonx_val_t val)` | Retrieves the boolean value. |
| `cjsonx_is_null` | `bool cjsonx_is_null(cjsonx_val_t val)` | Returns `true` if the node is explicitly a JSON `null` or is empty/invalid. |
| `cjsonx_pointer_get` | `cjsonx_val_t cjsonx_pointer_get(cjsonx_val_t root, const char* path)` | Retrieves a node using a RFC 6901 JSON Pointer path. |
### Iteration
| Function | Signature | Description |
|---|---|---|
| `cjsonx_iter_init` | `cjsonx_iter_t cjsonx_iter_init(cjsonx_val_t val)` | Initializes a lightweight iterator for an Array or Object. |
| `cjsonx_iter_next` | `bool cjsonx_iter_next(cjsonx_iter_t* iter)` | Advances the iterator to the next element or key-value pair. |
### Mutation & Builder API
| Function | Signature | Description |
|---|---|---|
| `cjsonx_create_null` | `cjsonx_val_t cjsonx_create_null(cjsonx_doc_t* doc)` | Creates a `null` node. |
| `cjsonx_create_bool` | `cjsonx_val_t cjsonx_create_bool(cjsonx_doc_t* doc, bool val)` | Creates a boolean node. |
| `cjsonx_create_number` | `cjsonx_val_t cjsonx_create_number(cjsonx_doc_t* doc, double val)` | Creates a number node. |
| `cjsonx_create_string` | `cjsonx_val_t cjsonx_create_string(cjsonx_doc_t* doc, const char* str)` | Creates a string node (copies string to arena). |
| `cjsonx_create_object` | `cjsonx_val_t cjsonx_create_object(cjsonx_doc_t* doc)` | Creates an empty Object node. |
| `cjsonx_create_array` | `cjsonx_val_t cjsonx_create_array(cjsonx_doc_t* doc)` | Creates an empty Array node. |
| `cjsonx_object_set` | `bool cjsonx_object_set(cjsonx_val_t obj, const char* key, cjsonx_val_t val)` | Inserts or overwrites a key-value pair in an Object. |
| `cjsonx_array_push` | `bool cjsonx_array_push(cjsonx_val_t arr, cjsonx_val_t val)` | Appends a value to an Array. |
| `cjsonx_object_remove` | `bool cjsonx_object_remove(cjsonx_val_t obj, const char* key)` | Removes a key-value pair from an Object. |
| `cjsonx_array_remove` | `bool cjsonx_array_remove(cjsonx_val_t arr, size_t index)` | Removes a value at the given index from an Array. |
| `cjsonx_clone_val` | `cjsonx_val_t cjsonx_clone_val(cjsonx_doc_t* dest_doc, cjsonx_val_t src_val)` | Recursively clones a value node and its children into another document arena. |
| `cjsonx_merge_patch` | `cjsonx_val_t cjsonx_merge_patch(cjsonx_val_t target, cjsonx_val_t patch)` | Applies an RFC 7396 JSON Merge Patch to a target node. |
| `cjsonx_stringify` | `char* cjsonx_stringify(cjsonx_doc_t* doc)` | Converts document to minified JSON string (malloc'd). |
| `cjsonx_stringify_format` | `char* cjsonx_stringify_format(cjsonx_doc_t* doc, int indent)` | Converts document to pretty JSON string with indent spaces. |
### File I/O Utilities
| Function | Signature | Description |
|---|---|---|
| `cjsonx_read_file` | `cjsonx_doc_t* cjsonx_read_file(const char* path)` | Reads and parses a JSON file. |
| `cjsonx_read_file_ex` | `cjsonx_doc_t* cjsonx_read_file_ex(const char* path, cjsonx_allocator_t* alloc)` | Reads and parses a JSON file using a custom allocator. |
| `cjsonx_write_file` | `bool cjsonx_write_file(const char* path, cjsonx_doc_t* doc)` | Serializes a document to a file (minified). |
| `cjsonx_write_file_format` | `bool cjsonx_write_file_format(const char* path, cjsonx_doc_t* doc, int indent)` | Serializes a document to a file (pretty printed). |
---
## Documentation
Check out the `docs/` directory for deep-dives into the architecture and API:
- [The cjsonx Algorithm](docs/algorithm.md): Detailed explanation of the 2-stage SIMD scanning and Eisel-Lemire numerical parsing engine.
- [API Reference](docs/api_reference.md): Complete guide to all functions, structures, and memory safety guarantees.
## Examples
Runnable examples are provided in the `examples/` directory.
**`dom_access.c`**
Demonstrates basic file loading, parsing, and retrieving keys from the root object.
```c
#define CJSONX_IMPLEMENTATION
#include "cjsonx.h"
#include
#include
int main() {
const char* json = "{\"name\": \"cjsonx\", \"speed\": \"insane\"}";
cjsonx_doc_t* doc = cjsonx_parse(json, strlen(json));
if (doc && doc->is_valid) {
cjsonx_val_t name = cjsonx_get(doc->root, "name");
if (cjsonx_get_type(name) == CJSONX_STRING) {
printf("Parsed name: %.*s\n", (int)cjsonx_str_len(name), cjsonx_str(name));
}
cjsonx_doc_free(doc);
}
return 0;
}
```
**`error_handling.c`**
Demonstrates extracting byte offsets and exact error messages when parsing malformed JSON payloads.
---
## Benchmark Results
Benchmarks were executed on a modern x86_64 CPU (GCC -O3 -march=native). We track **Parse Speed**, **Stringify Speed**, and the **Peak Memory** (Maximum RAM allocated during the parse operation).
> **Note on Memory**: `cjsonx` uses a Flat DOM approach with exactly 16 bytes per node. By optimizing initial node allocation capacity and performing a shrink-to-fit step at the end of parsing, `cjsonx` now achieves the lowest peak memory usage among tested libraries while maintaining high parsing throughput.
### 1. `twitter.json` (0.60 MB)
| Library | Parse (MB/s) | Stringify (MB/s) | Peak Mem (MB) |
|---------|--------------|------------------|---------------|
| **cjsonx** | 514.35 | 1929.13 | **0.92** |
| yyjson | **1026.59** | **4890.89** | 1.20 |
| cJSON | 408.14 | 636.55 | 1.23 |
### 2. `citm_catalog.json` (1.65 MB)
| Library | Parse (MB/s) | Stringify (MB/s) | Peak Mem (MB) |
|---------|--------------|------------------|---------------|
| **cjsonx** | **898.58** | 2233.71 | **2.07** |
| yyjson | 810.77 | **6899.93** | 3.29 |
| cJSON | 274.59 | 773.56 | 2.57 |
### 3. `canada.json` (2.15 MB) - Heavy Floating-Point Arrays
| Library | Parse (MB/s) | Stringify (MB/s) | Peak Mem (MB) |
|---------|--------------|------------------|---------------|
| **cjsonx** | 346.86 | 273.80 | **4.70** |
| yyjson | **820.91** | **712.29** | 7.87 |
| cJSON | 73.08 | 26.46 | 10.20 |
View raw console output from bench_compare
```console
tiw@tiw-CachyOS ~/Public/cjsonx (master)
❯./build/bench_compare benchmarks/datasets/citm_catalog.json && ./build/bench_compare benchmarks/datasets/twitter.json && ./build/bench_compare benchmarks/datasets/canada.json
Dataset: benchmarks/datasets/citm_catalog.json (1.65 MB)
========================================================================
Library | Parse (MB/s) | Stringify (MB/s) | Peak Mem (MB)
-----------|-----------------|------------------|-----------------------
cjsonx | 898.58 | 2233.71 | 2.07
yyjson | 810.77 | 6899.93 | 3.29
cJSON | 274.59 | 773.56 | 2.57
========================================================================
Dataset: benchmarks/datasets/twitter.json (0.60 MB)
========================================================================
Library | Parse (MB/s) | Stringify (MB/s) | Peak Mem (MB)
-----------|-----------------|------------------|-----------------------
cjsonx | 514.35 | 1929.13 | 0.92
yyjson | 1026.59 | 4890.89 | 1.20
cJSON | 408.14 | 636.55 | 1.23
========================================================================
Dataset: benchmarks/datasets/canada.json (2.15 MB)
========================================================================
Library | Parse (MB/s) | Stringify (MB/s) | Peak Mem (MB)
-----------|-----------------|------------------|-----------------------
cjsonx | 346.86 | 273.80 | 4.70
yyjson | 820.91 | 712.29 | 7.87
cJSON | 73.08 | 26.46 | 10.20
========================================================================
tiw@tiw-CachyOS ~/Public/cjsonx (master)
❯
```
### Analysis
`cjsonx` demonstrates significant parsing throughput on large payloads, measuring up to ~898 MB/s on `citm_catalog.json`. This provides a performance profile comparable to, and often exceeding, modern parsers like `yyjson` during tree construction, while dramatically outperforming legacy standards like `cJSON` in computational speed and maintaining the lowest peak memory overhead.
---
## Development Methodology & AI Assistance
Building a memory-safe, SIMD-accelerated C parser from scratch involves handling incredibly complex edge cases—from vectorized bit-masking to IEEE 754 catastrophic cancellation bounds.
To achieve this level of stability and performance within a short timeframe, this project was architected and rigorously verified in collaboration with **Advanced Agentic AI**. AI was specifically utilized to:
- Stress-test the Eisel-Lemire numerical engine against extreme floating-point edge cases and LibFuzzer.
- Assist in planning the memory layout and cache-locality of the 16-byte arena DOM.
- Automate the generation of robust cross-platform CI/CD pipelines (Linux, macOS, Windows, WASM).
However, **human agency remains at the core of this project**. Every single line of code generated or suggested was manually inspected, audited, and strictly verified. The core architecture, algorithms, and memory design were meticulously human-planned. This hybrid approach—combining human architectural vision with AI-driven debugging and verification—allowed us to push the boundaries of performance and reliability in a modern C library without compromising security or code ownership.
---
## Author's Note
I'm just a kid building projects as a hobby. Thank you for showing interest in my little library! It really means a lot to me. :)
---
## License
This project is licensed under the [MIT License](LICENSE) - see the [LICENSE](LICENSE) file for details.