An open API service indexing awesome lists of open source software.

https://github.com/tiw302/cjsonx

High-performance C11 JSON parser with a 16-byte Flat-DOM, SIMD acceleration
https://github.com/tiw302/cjsonx

avx2 c c11 embedded header-only high-performance json json-parser memory-efficient neon parser simd wasm webassembly

Last synced: 13 days ago
JSON representation

High-performance C11 JSON parser with a 16-byte Flat-DOM, SIMD acceleration

Awesome Lists containing this project

README

          


cjsonx Logo


Extreme-performance JSON parser for C11/C++ featuring a 16-byte ultra-compact DOM.

# cjsonx

[![Linux](https://github.com/tiw302/cjsonx/actions/workflows/linux.yml/badge.svg)](https://github.com/tiw302/cjsonx/actions)
[![macOS](https://github.com/tiw302/cjsonx/actions/workflows/macos.yml/badge.svg)](https://github.com/tiw302/cjsonx/actions)
[![Windows](https://github.com/tiw302/cjsonx/actions/workflows/windows.yml/badge.svg)](https://github.com/tiw302/cjsonx/actions)
[![WASM](https://github.com/tiw302/cjsonx/actions/workflows/wasm.yml/badge.svg)](https://github.com/tiw302/cjsonx/actions)

[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Language](https://img.shields.io/badge/Language-C11-00599C.svg)](https://en.wikipedia.org/wiki/C11_(C_standard_revision))
[![Header-Only](https://img.shields.io/badge/Library-Header--Only-brightgreen.svg)](#installation)
[![Dependencies](https://img.shields.io/badge/Dependencies-None-blueviolet.svg)](#introduction)

**[Read the Official Documentation: docs/index.md](https://tiw302.github.io/cjsonx/)**

**[Try the Live WebAssembly Demo: https://tiw302.github.io/cjsonx/demo/](https://tiw302.github.io/cjsonx/demo/)**

> **Verified Compatibility — Cross-Platform Passing**

| Architecture | Platform | Verified Backend |
| :--- | :--- | :--- |
| **x86_64 (Modern)** | Linux / Windows | **AVX2** (Vectorized) |
| **ARM64 (Apple)** | macOS (M1/M2/M3) | **NEON** (Vectorized) |
| **WebAssembly** | Chrome / Node.js | **WASM-SIMD128** |
| **RISC-V64** | Linux (QEMU) | **Scalar** C11 |
| **General Desktop** | Linux / Windows | **Scalar** C11 Fallback |

---

## Table of Contents

| Introduction | Setup & Build | Docs & Metrics |
|---|---|---|
| [Overview](#introduction) | [Requirements](#requirements) | [API Reference](#api-reference) |
| [Why cjsonx?](#why-cjsonx) | [Toolchains](#verified-toolchains) | [Documentation](#documentation) |
| [Philosophy](#design-philosophy) | [Installation](#build-and-installation) | [Examples](#examples) |
| [Limits & Guarantees](#limits--guarantees) | [AI Methodology](#development-methodology--ai-assistance) | [Benchmarks](#benchmark-results) |
| [License](#license) | | |

---

## Introduction

**cjsonx** is a header-only C library for parsing JSON. It is designed to achieve high parsing speeds (exceeding 1.0 GB/s on modern hardware) while offering a fully mutable, ultra-compact 16-byte Flat-DOM.

Built on top of a highly optimized dual-stage architecture, `cjsonx` validates structural characters using SIMD bitmasks (AVX2/NEON/WASM-SIMD) before applying a recursive descent parsing phase that utilizes the state-of-the-art Eisel-Lemire algorithm for blazing-fast 64-bit IEEE 754 floating-point numerical conversions.

---

## Why cjsonx?

Standard JSON parsers often face specific limitations: they can be slower due to heavy heap allocation per node (using `malloc` recursively), or they consume excessive memory per node (e.g., standard parsers often require 56-64 bytes per node).

`cjsonx` was built to address these specific use cases by providing a fully mutable DOM while drastically reducing memory overhead and maximizing computational throughput:

| Parser | Speed (Large Payload) | DOM Node Size | Allocation Strategy | Portability |
|---|---|---|---|---|
| `cJSON` | ~130 MB/s | ~64 bytes | Heavy (O(N) Malloc) | Universal |
| `jsmn` | ~600 MB/s | Tokenizer Only | None | Universal |
| `yyjson` | ~1000+ MB/s | 16-24 bytes | Arena | High |
| **cjsonx** | **~1000+ MB/s** | **16 bytes (Fixed)** | **Flat Arena** | **Universal** |

cjsonx aims to provide an alternative: **delivering high throughput and a fully mutable DOM while maintaining an incredibly dense 16-byte memory footprint.**

---

## Trade-offs & Alternatives (When NOT to use cjsonx)

We believe in engineering honesty. `cjsonx` is built for a specific niche and is *not* a silver bullet. You should evaluate alternatives if your requirements match the following:

- **Need the absolute fastest C++ parser?** Use [simdjson](https://github.com/simdjson/simdjson). It runs at 3-6 GB/s and is the industry gold standard for C++ server backends. `cjsonx` is pure C11 and cannot compete with their multi-year optimized C++ engine.
- **Need a battle-tested, general-purpose C parser?** Use [yyjson](https://github.com/ibireme/yyjson). It is incredibly fast, highly optimized for general use cases, and has a massive community.
- **Need to drop in a ubiquitous, legacy C parser?** Use [cJSON](https://github.com/DaveGamble/cJSON). It's older and much slower, but it works on ancient C89 compilers and has no modern standard requirements. (Note: `cjsonx` also runs without SIMD on any platform via its Scalar fallback, but requires a C11-compliant compiler).

**So when *should* you use cjsonx?**

1. **High-Performance Mutable Data:** You need a pure C11 parser that allows you to read, edit, add, and remove JSON nodes rapidly, and stringify them back to JSON text without rebuilding the entire document.
2. **Strict Memory Constraints (IoT/RTOS):** You need high-speed parsing but absolutely **refuse to waste memory**. Our 16-byte nodes use 4x less RAM than traditional parsers like cJSON. Additionally, `cjsonx_parse_with_buffer()` provides a True Zero-Allocation mode for embedded systems.
3. **WASM Edge Functions (Cloudflare Workers / Fastly):** You need a pure C11 parser that compiles effortlessly to WebAssembly and leverages WASM-SIMD128 for native execution at the edge, without the heavy overhead of C++ engines.

---

## Design Philosophy

The library is built around three strict constraints:

**Flat Arena DOM.** There are no calls to `malloc` per node. The entire document tree is parsed sequentially into a continuous array of 16-byte structs. This guarantees cache locality and enables O(1) skipping over complex objects and arrays during iteration.

**State-of-the-art Number Parsing.** `cjsonx` incorporates the Eisel-Lemire fast float algorithm directly into its lexical analysis phase. It parses 99.9% of all IEEE 754 floating-point numbers natively using a single fast path, falling back to strict standard library parsing only on extreme mathematical edge cases.

**Zero OS-Dependencies.** The library is built entirely on standard C11. It does not rely on OS-specific file I/O or POSIX headers. It compiles seamlessly to WebAssembly, embedded ARM targets, and standard desktop operating systems.

**True Zero-Allocation Mode.** For strict embedded constraints, the `cjsonx_parse_with_buffer()` API completely bypasses `malloc` by parsing the JSON entirely into a user-provided fixed-size stack buffer or RTOS memory pool.

---

## Limits & Guarantees

Professional-grade software requires transparent technical boundaries. Here is exactly what `cjsonx` guarantees, and where it draws the line:

- **RFC 8259 Compliance:** `cjsonx` strictly adheres to RFC 8259 and ECMA-404. It correctly rejects structural anomalies, unescaped control characters, and deeply nested bombs.
- **Thread Safety:** The core parsing engine is entirely stateless. Multiple threads can safely parse different JSON documents concurrently without any mutexes or locks.
- **Length Limit:** The maximum byte length of any single string or serialized container is 16MB (specifically, 16,777,215 bytes, due to the 24-bit length field packed in the 16-byte DOM node structure).
- **Nesting Depth Limit:** The stringification routines enforce a maximum nesting depth limit of 512 (`CJSONX_MAX_DEPTH`) to prevent stack overflow when printing extremely nested JSON.
- **Builder Performance:** Pushing elements to an array via `cjsonx_array_push` is an O(N) operation because it traverses the list of siblings to locate the end of the array. Repeated sequential pushes to build large arrays will result in O(N^2) complexity.

---

## Requirements

| Component | Requirement |
|---|---|
| C Standard | C11 or later |
| Compiler | GCC 4.9+, Clang 3.5+, MSVC 2019+, Emscripten 3.0+ |
| Dependencies| None (Standard C Library only) |

---

## Verified Toolchains

The following toolchains are tested on every commit via GitHub Actions:

| Toolchain | Platform | Backend |
|---|---|---|
| GCC | Linux x86_64 | Scalar, AVX2 |
| GCC (riscv64-linux-gnu) | Linux RISC-V64 (QEMU) | Scalar |
| Clang | macOS Apple Silicon | NEON |
| MSVC | Windows x64 | Scalar, AVX2 |
| Emscripten | WASM (Node.js) | WASM-SIMD, Scalar |

---

## Build and Installation

`cjsonx` is entirely header-only.

### Single-Header Distribution (Recommended)

The simplest integration is copying the amalgamated `single_include/cjsonx.h` into your project. Define the implementation macro in **exactly one** C file to compile the core functions:

```c
#define CJSONX_IMPLEMENTATION
#include "cjsonx.h"
```

All other translation units should include the header without the macro.

### CMake (System Install)

You can build the test suites and install the library system-wide:

```bash
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
sudo cmake --install build
```

Then in your project's `CMakeLists.txt`:

```cmake
find_package(cjsonx REQUIRED)
target_link_libraries(my_app PRIVATE cjsonx::cjsonx)
```

---

## API Reference

### Core Parsing

| Function | Signature | Description |
|---|---|---|
| `cjsonx_parse` | `cjsonx_doc_t* cjsonx_parse(const char* json, size_t length)` | Parses a JSON string into a managed document tree. Returns `NULL` on fatal memory error. Check `doc->is_valid` for syntax status. |
| `cjsonx_parse_ex` | `cjsonx_doc_t* cjsonx_parse_ex(const char* json, size_t length, cjsonx_allocator_t* alloc)` | Parses a JSON string using custom memory allocation hooks. |
| `cjsonx_parse_with_buffer` | `cjsonx_doc_t* cjsonx_parse_with_buffer(const char* json, size_t length, void* buffer, size_t buffer_size)` | Zero-allocation mode parsing JSON directly into a user-provided buffer. |
| `cjsonx_doc_free` | `void cjsonx_doc_free(cjsonx_doc_t* doc)` | Frees the entire document arena in a single call. |
| `cjsonx_error_string` | `const char* cjsonx_error_string(cjsonx_error_t err)` | Translates an error code into a human-readable string. |

### DOM Access

| Function | Signature | Description |
|---|---|---|
| `cjsonx_get` | `cjsonx_val_t cjsonx_get(cjsonx_val_t obj, const char* key)` | Retrieves a child node from an Object by its exact string key. |
| `cjsonx_get_index` | `cjsonx_val_t cjsonx_get_index(cjsonx_val_t arr, size_t index)` | Retrieves a child node from an Array by its index. |
| `cjsonx_get_type` | `cjsonx_type_t cjsonx_get_type(cjsonx_val_t val)` | Returns the type of the node (`CJSONX_STRING`, `CJSONX_NUMBER`, etc.). |
| `cjsonx_num` | `double cjsonx_num(cjsonx_val_t val)` | Retrieves the numerical value as a float. |
| `cjsonx_int` | `int64_t cjsonx_int(cjsonx_val_t val)` | Retrieves the numerical value as a 64-bit integer. |
| `cjsonx_str` | `const char* cjsonx_str(cjsonx_val_t val)` | Retrieves the string pointer. Note: strings may not be null-terminated if they are zero-copy references. |
| `cjsonx_str_len` | `size_t cjsonx_str_len(cjsonx_val_t val)` | Returns the exact length of the string. |
| `cjsonx_size` | `size_t cjsonx_size(cjsonx_val_t val)` | Returns the element count of an Array or Object. |
| `cjsonx_bool` | `bool cjsonx_bool(cjsonx_val_t val)` | Retrieves the boolean value. |
| `cjsonx_is_null` | `bool cjsonx_is_null(cjsonx_val_t val)` | Returns `true` if the node is explicitly a JSON `null` or is empty/invalid. |
| `cjsonx_pointer_get` | `cjsonx_val_t cjsonx_pointer_get(cjsonx_val_t root, const char* path)` | Retrieves a node using a RFC 6901 JSON Pointer path. |

### Iteration

| Function | Signature | Description |
|---|---|---|
| `cjsonx_iter_init` | `cjsonx_iter_t cjsonx_iter_init(cjsonx_val_t val)` | Initializes a lightweight iterator for an Array or Object. |
| `cjsonx_iter_next` | `bool cjsonx_iter_next(cjsonx_iter_t* iter)` | Advances the iterator to the next element or key-value pair. |

### Mutation & Builder API

| Function | Signature | Description |
|---|---|---|
| `cjsonx_create_null` | `cjsonx_val_t cjsonx_create_null(cjsonx_doc_t* doc)` | Creates a `null` node. |
| `cjsonx_create_bool` | `cjsonx_val_t cjsonx_create_bool(cjsonx_doc_t* doc, bool val)` | Creates a boolean node. |
| `cjsonx_create_number` | `cjsonx_val_t cjsonx_create_number(cjsonx_doc_t* doc, double val)` | Creates a number node. |
| `cjsonx_create_string` | `cjsonx_val_t cjsonx_create_string(cjsonx_doc_t* doc, const char* str)` | Creates a string node (copies string to arena). |
| `cjsonx_create_object` | `cjsonx_val_t cjsonx_create_object(cjsonx_doc_t* doc)` | Creates an empty Object node. |
| `cjsonx_create_array` | `cjsonx_val_t cjsonx_create_array(cjsonx_doc_t* doc)` | Creates an empty Array node. |
| `cjsonx_object_set` | `bool cjsonx_object_set(cjsonx_val_t obj, const char* key, cjsonx_val_t val)` | Inserts or overwrites a key-value pair in an Object. |
| `cjsonx_array_push` | `bool cjsonx_array_push(cjsonx_val_t arr, cjsonx_val_t val)` | Appends a value to an Array. |
| `cjsonx_object_remove` | `bool cjsonx_object_remove(cjsonx_val_t obj, const char* key)` | Removes a key-value pair from an Object. |
| `cjsonx_array_remove` | `bool cjsonx_array_remove(cjsonx_val_t arr, size_t index)` | Removes a value at the given index from an Array. |
| `cjsonx_clone_val` | `cjsonx_val_t cjsonx_clone_val(cjsonx_doc_t* dest_doc, cjsonx_val_t src_val)` | Recursively clones a value node and its children into another document arena. |
| `cjsonx_merge_patch` | `cjsonx_val_t cjsonx_merge_patch(cjsonx_val_t target, cjsonx_val_t patch)` | Applies an RFC 7396 JSON Merge Patch to a target node. |
| `cjsonx_stringify` | `char* cjsonx_stringify(cjsonx_doc_t* doc)` | Converts document to minified JSON string (malloc'd). |
| `cjsonx_stringify_format` | `char* cjsonx_stringify_format(cjsonx_doc_t* doc, int indent)` | Converts document to pretty JSON string with indent spaces. |

### File I/O Utilities

| Function | Signature | Description |
|---|---|---|
| `cjsonx_read_file` | `cjsonx_doc_t* cjsonx_read_file(const char* path)` | Reads and parses a JSON file. |
| `cjsonx_read_file_ex` | `cjsonx_doc_t* cjsonx_read_file_ex(const char* path, cjsonx_allocator_t* alloc)` | Reads and parses a JSON file using a custom allocator. |
| `cjsonx_write_file` | `bool cjsonx_write_file(const char* path, cjsonx_doc_t* doc)` | Serializes a document to a file (minified). |
| `cjsonx_write_file_format` | `bool cjsonx_write_file_format(const char* path, cjsonx_doc_t* doc, int indent)` | Serializes a document to a file (pretty printed). |

---

## Documentation

Check out the `docs/` directory for deep-dives into the architecture and API:

- [The cjsonx Algorithm](docs/algorithm.md): Detailed explanation of the 2-stage SIMD scanning and Eisel-Lemire numerical parsing engine.
- [API Reference](docs/api_reference.md): Complete guide to all functions, structures, and memory safety guarantees.

## Examples

Runnable examples are provided in the `examples/` directory.

**`dom_access.c`**
Demonstrates basic file loading, parsing, and retrieving keys from the root object.

```c
#define CJSONX_IMPLEMENTATION
#include "cjsonx.h"
#include
#include

int main() {
const char* json = "{\"name\": \"cjsonx\", \"speed\": \"insane\"}";

cjsonx_doc_t* doc = cjsonx_parse(json, strlen(json));
if (doc && doc->is_valid) {
cjsonx_val_t name = cjsonx_get(doc->root, "name");
if (cjsonx_get_type(name) == CJSONX_STRING) {
printf("Parsed name: %.*s\n", (int)cjsonx_str_len(name), cjsonx_str(name));
}
cjsonx_doc_free(doc);
}
return 0;
}
```

**`error_handling.c`**
Demonstrates extracting byte offsets and exact error messages when parsing malformed JSON payloads.

---

## Benchmark Results

Benchmarks were executed on a modern x86_64 CPU (GCC -O3 -march=native). We track **Parse Speed**, **Stringify Speed**, and the **Peak Memory** (Maximum RAM allocated during the parse operation).

> **Note on Memory**: `cjsonx` uses a Flat DOM approach with exactly 16 bytes per node. By optimizing initial node allocation capacity and performing a shrink-to-fit step at the end of parsing, `cjsonx` now achieves the lowest peak memory usage among tested libraries while maintaining high parsing throughput.

### 1. `twitter.json` (0.60 MB)

| Library | Parse (MB/s) | Stringify (MB/s) | Peak Mem (MB) |
|---------|--------------|------------------|---------------|
| **cjsonx** | 514.35 | 1929.13 | **0.92** |
| yyjson | **1026.59** | **4890.89** | 1.20 |
| cJSON | 408.14 | 636.55 | 1.23 |

### 2. `citm_catalog.json` (1.65 MB)

| Library | Parse (MB/s) | Stringify (MB/s) | Peak Mem (MB) |
|---------|--------------|------------------|---------------|
| **cjsonx** | **898.58** | 2233.71 | **2.07** |
| yyjson | 810.77 | **6899.93** | 3.29 |
| cJSON | 274.59 | 773.56 | 2.57 |

### 3. `canada.json` (2.15 MB) - Heavy Floating-Point Arrays

| Library | Parse (MB/s) | Stringify (MB/s) | Peak Mem (MB) |
|---------|--------------|------------------|---------------|
| **cjsonx** | 346.86 | 273.80 | **4.70** |
| yyjson | **820.91** | **712.29** | 7.87 |
| cJSON | 73.08 | 26.46 | 10.20 |

View raw console output from bench_compare

```console
tiw@tiw-CachyOS ~/Public/cjsonx (master)
❯./build/bench_compare benchmarks/datasets/citm_catalog.json && ./build/bench_compare benchmarks/datasets/twitter.json && ./build/bench_compare benchmarks/datasets/canada.json

Dataset: benchmarks/datasets/citm_catalog.json (1.65 MB)
========================================================================
Library | Parse (MB/s) | Stringify (MB/s) | Peak Mem (MB)
-----------|-----------------|------------------|-----------------------
cjsonx | 898.58 | 2233.71 | 2.07
yyjson | 810.77 | 6899.93 | 3.29
cJSON | 274.59 | 773.56 | 2.57
========================================================================

Dataset: benchmarks/datasets/twitter.json (0.60 MB)
========================================================================
Library | Parse (MB/s) | Stringify (MB/s) | Peak Mem (MB)
-----------|-----------------|------------------|-----------------------
cjsonx | 514.35 | 1929.13 | 0.92
yyjson | 1026.59 | 4890.89 | 1.20
cJSON | 408.14 | 636.55 | 1.23
========================================================================

Dataset: benchmarks/datasets/canada.json (2.15 MB)
========================================================================
Library | Parse (MB/s) | Stringify (MB/s) | Peak Mem (MB)
-----------|-----------------|------------------|-----------------------
cjsonx | 346.86 | 273.80 | 4.70
yyjson | 820.91 | 712.29 | 7.87
cJSON | 73.08 | 26.46 | 10.20
========================================================================

tiw@tiw-CachyOS ~/Public/cjsonx (master)

```

### Analysis

`cjsonx` demonstrates significant parsing throughput on large payloads, measuring up to ~898 MB/s on `citm_catalog.json`. This provides a performance profile comparable to, and often exceeding, modern parsers like `yyjson` during tree construction, while dramatically outperforming legacy standards like `cJSON` in computational speed and maintaining the lowest peak memory overhead.

---

## Development Methodology & AI Assistance

Building a memory-safe, SIMD-accelerated C parser from scratch involves handling incredibly complex edge cases—from vectorized bit-masking to IEEE 754 catastrophic cancellation bounds.

To achieve this level of stability and performance within a short timeframe, this project was architected and rigorously verified in collaboration with **Advanced Agentic AI**. AI was specifically utilized to:

- Stress-test the Eisel-Lemire numerical engine against extreme floating-point edge cases and LibFuzzer.
- Assist in planning the memory layout and cache-locality of the 16-byte arena DOM.
- Automate the generation of robust cross-platform CI/CD pipelines (Linux, macOS, Windows, WASM).

However, **human agency remains at the core of this project**. Every single line of code generated or suggested was manually inspected, audited, and strictly verified. The core architecture, algorithms, and memory design were meticulously human-planned. This hybrid approach—combining human architectural vision with AI-driven debugging and verification—allowed us to push the boundaries of performance and reliability in a modern C library without compromising security or code ownership.

---

## Author's Note

I'm just a kid building projects as a hobby. Thank you for showing interest in my little library! It really means a lot to me. :)

---

## License

This project is licensed under the [MIT License](LICENSE) - see the [LICENSE](LICENSE) file for details.