An open API service indexing awesome lists of open source software.

https://github.com/sebyx07/json-asm

⚡ The fastest JSON parser/serializer - hand-optimized x86-64 & ARM64 assembly with C API. AVX-512/AVX2/NEON SIMD, 24-byte nodes, zero-copy parsing
https://github.com/sebyx07/json-asm

arm64 assembly avx2 avx512 c high-performance json neon parser performance serializer simd x86-64

Last synced: about 2 months ago
JSON representation

⚡ The fastest JSON parser/serializer - hand-optimized x86-64 & ARM64 assembly with C API. AVX-512/AVX2/NEON SIMD, 24-byte nodes, zero-copy parsing

Awesome Lists containing this project

README

          

# json-asm

The fastest JSON parser and serializer, written in hand-optimized assembly (x86-64 & ARM64) with a clean C API.

## Overview

**json-asm** is a high-performance JSON library that pushes the boundaries of parsing and serialization speed through hand-optimized assembly. It leverages every available SIMD instruction set and CPU feature for maximum throughput.

### Key Features

- **Multi-architecture assembly** - Hand-optimized for x86-64 (AVX-512/AVX2/SSE4.2) and ARM64 (NEON/SVE/SVE2)
- **Maximum SIMD utilization** - 512-bit vector operations where available
- **Branch prediction optimization** - Computed gotos, branchless algorithms, cmov usage
- **Cache hierarchy optimization** - Prefetching, cache-line alignment, TLB-friendly allocation
- **Zero-copy parsing** - Minimal memory allocations during parse operations
- **Lock-free data structures** - Wait-free progress on multi-threaded access
- **Simple C API** - Easy to integrate, no C++ runtime dependencies

### Supported Architectures

| Architecture | SIMD Support | Status |
|--------------|-------------------------------------|--------|
| x86-64 | AVX-512, AVX2, SSE4.2, BMI2, POPCNT | Full |
| ARM64 | SVE2, SVE, NEON | Full |

## Performance

Preliminary benchmark results on available hardware:

### Development System (AMD EPYC 7282 @ AVX2)

Tested with 399-byte JSON document (44 values, nested objects/arrays):

| Operation | Throughput | Time per operation | Memory per value |
|------------|------------|-------------------|------------------|
| Parse | ~290 MB/s | 1.38 μs | 24 bytes |
| Stringify | ~275 MB/s | 1.45 μs | 24 bytes |

**Test Environment:**
- CPU: AMD EPYC 7282 16-Core Processor
- SIMD: AVX2 (256-bit vectors)
- Compiler: GCC with -O3 -march=native
- Test data: Small JSON with mixed types

**Note:** These are preliminary results from development hardware. Performance will vary significantly based on:
- CPU architecture and generation
- Available SIMD instructions (SSE4.2 vs AVX2 vs AVX-512)
- JSON document size and structure
- Memory bandwidth and cache characteristics

The library automatically selects the best SIMD implementation for your CPU at runtime.

*Comprehensive benchmarks comparing against other JSON libraries coming soon.*

## Quick Start

```c
#include

int main(void) {
const char *json = "{\"name\": \"json-asm\", \"fast\": true}";

// Parse JSON
json_doc *doc = json_parse(json, strlen(json));
if (!doc) {
fprintf(stderr, "Parse error\n");
return 1;
}

// Access values
json_val *root = json_doc_root(doc);
json_val *name = json_obj_get(root, "name");
printf("Name: %s\n", json_get_str(name));

// Cleanup
json_doc_free(doc);
return 0;
}
```

## Building

```bash
# Requirements:
# x86-64: NASM 2.15+, GCC 10+/Clang 12+
# ARM64: GCC 10+/Clang 12+ (uses inline assembly)

make
make install # installs to /usr/local by default

# Run tests
make test

# Run benchmarks
make bench
```

See [docs/building.md](docs/building.md) for detailed build instructions.

## Documentation

- [API Reference](docs/api.md) - Complete C API documentation
- [Architecture](docs/architecture.md) - Internal design, SIMD optimizations, CPU features
- [Benchmarks](docs/benchmarks.md) - Performance methodology and comparisons
- [Building](docs/building.md) - Build instructions for x86-64 and ARM64
- [Examples](docs/examples.md) - Usage examples and patterns

## Why Assembly?

Modern compilers are excellent, but JSON parsing has predictable patterns that benefit from manual optimization:

### x86-64 Optimizations
- **AVX-512 masked operations** - Process 64 bytes with predicated loads/stores
- **AVX2 string scanning** - Find structural chars across 32 bytes simultaneously
- **BMI2 PEXT/PDEP** - Branchless bit manipulation for escape handling
- **POPCNT/LZCNT/TZCNT** - O(1) bit counting for mask processing

### ARM64 Optimizations
- **SVE/SVE2 variable-length vectors** - Scalable from 128 to 2048 bits
- **NEON 128-bit operations** - Parallel character classification
- **AdvSIMD dot product** - Fast numeric accumulation
- **Load/store pair instructions** - Efficient memory access patterns

### Micro-architectural Optimizations
- **Branchless number parsing** - Eliminates misprediction penalties (~15 cycles)
- **Computed goto dispatch** - O(1) state machine transitions
- **Explicit prefetching** - Hide memory latency for sequential access
- **Register pressure optimization** - Keep hot data in registers
- **Instruction-level parallelism** - Maximize superscalar execution

See [docs/architecture.md](docs/architecture.md) for technical details.

## Design Approach

json-asm uses hand-optimized assembly to achieve high performance:

| Aspect | json-asm |
|---------------------|----------------------------------|
| Implementation | Native assembly (x86-64 + ARM64) |
| SIMD x86-64 | AVX-512/AVX2/SSE4.2 hand-written |
| SIMD ARM64 | NEON/SVE/SVE2 hand-written |
| String scanning | Custom SIMD routines |
| Number parsing | Branchless assembly algorithms |
| Memory layout | Compact 24 bytes/value |

By controlling every instruction in hot paths, assembly allows for:
- Explicit SIMD instruction selection
- Branchless algorithms to avoid misprediction penalties
- Cache-line alignment and prefetching control
- Register allocation optimization

## CPU Feature Detection

json-asm automatically detects and uses the best available instructions:

```
x86-64 dispatch (best to worst):
AVX-512 + BMI2 → AVX2 + BMI2 → AVX2 → SSE4.2 → Scalar

ARM64 dispatch (best to worst):
SVE2 → SVE → NEON → Scalar
```

Runtime detection ensures optimal performance on any conforming CPU.

## API Overview

```c
// Parsing
json_doc *json_parse(const char *json, size_t len);
json_doc *json_parse_file(const char *path);

// Document access
json_val *json_doc_root(json_doc *doc);
void json_doc_free(json_doc *doc);

// Value inspection
json_type json_get_type(json_val *val);
const char *json_get_str(json_val *val);
double json_get_num(json_val *val);
int64_t json_get_int(json_val *val);
bool json_get_bool(json_val *val);

// Object/Array access
json_val *json_obj_get(json_val *obj, const char *key);
json_val *json_arr_get(json_val *arr, size_t index);

// Serialization
char *json_stringify(json_val *val);
```

See [docs/api.md](docs/api.md) for the complete API reference.

## Requirements

### x86-64
- **CPU**: x86-64-v2 minimum (SSE4.2), x86-64-v4 recommended (AVX-512)
- **OS**: Linux, macOS, Windows, FreeBSD
- **Build**: NASM 2.15+, GCC 10+ or Clang 12+

### ARM64
- **CPU**: ARMv8-A minimum, ARMv9-A recommended (SVE2)
- **OS**: Linux, macOS (Apple Silicon), Android
- **Build**: GCC 10+ or Clang 12+

## License

MIT License. See [LICENSE](LICENSE) for details.

## Contributing

Contributions welcome! Performance improvements should include benchmark results. Please read the architecture docs before working on assembly code.