https://github.com/sebyx07/json-asm
⚡ The fastest JSON parser/serializer - hand-optimized x86-64 & ARM64 assembly with C API. AVX-512/AVX2/NEON SIMD, 24-byte nodes, zero-copy parsing
https://github.com/sebyx07/json-asm
arm64 assembly avx2 avx512 c high-performance json neon parser performance serializer simd x86-64
Last synced: about 2 months ago
JSON representation
⚡ The fastest JSON parser/serializer - hand-optimized x86-64 & ARM64 assembly with C API. AVX-512/AVX2/NEON SIMD, 24-byte nodes, zero-copy parsing
- Host: GitHub
- URL: https://github.com/sebyx07/json-asm
- Owner: sebyx07
- License: mit
- Created: 2025-12-19T16:51:58.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-12-19T17:06:09.000Z (6 months ago)
- Last Synced: 2026-02-28T20:29:17.754Z (4 months ago)
- Topics: arm64, assembly, avx2, avx512, c, high-performance, json, neon, parser, performance, serializer, simd, x86-64
- Language: C
- Homepage: https://github.com/sebyx07/json-asm
- Size: 69.3 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# json-asm
The fastest JSON parser and serializer, written in hand-optimized assembly (x86-64 & ARM64) with a clean C API.
## Overview
**json-asm** is a high-performance JSON library that pushes the boundaries of parsing and serialization speed through hand-optimized assembly. It leverages every available SIMD instruction set and CPU feature for maximum throughput.
### Key Features
- **Multi-architecture assembly** - Hand-optimized for x86-64 (AVX-512/AVX2/SSE4.2) and ARM64 (NEON/SVE/SVE2)
- **Maximum SIMD utilization** - 512-bit vector operations where available
- **Branch prediction optimization** - Computed gotos, branchless algorithms, cmov usage
- **Cache hierarchy optimization** - Prefetching, cache-line alignment, TLB-friendly allocation
- **Zero-copy parsing** - Minimal memory allocations during parse operations
- **Lock-free data structures** - Wait-free progress on multi-threaded access
- **Simple C API** - Easy to integrate, no C++ runtime dependencies
### Supported Architectures
| Architecture | SIMD Support | Status |
|--------------|-------------------------------------|--------|
| x86-64 | AVX-512, AVX2, SSE4.2, BMI2, POPCNT | Full |
| ARM64 | SVE2, SVE, NEON | Full |
## Performance
Preliminary benchmark results on available hardware:
### Development System (AMD EPYC 7282 @ AVX2)
Tested with 399-byte JSON document (44 values, nested objects/arrays):
| Operation | Throughput | Time per operation | Memory per value |
|------------|------------|-------------------|------------------|
| Parse | ~290 MB/s | 1.38 μs | 24 bytes |
| Stringify | ~275 MB/s | 1.45 μs | 24 bytes |
**Test Environment:**
- CPU: AMD EPYC 7282 16-Core Processor
- SIMD: AVX2 (256-bit vectors)
- Compiler: GCC with -O3 -march=native
- Test data: Small JSON with mixed types
**Note:** These are preliminary results from development hardware. Performance will vary significantly based on:
- CPU architecture and generation
- Available SIMD instructions (SSE4.2 vs AVX2 vs AVX-512)
- JSON document size and structure
- Memory bandwidth and cache characteristics
The library automatically selects the best SIMD implementation for your CPU at runtime.
*Comprehensive benchmarks comparing against other JSON libraries coming soon.*
## Quick Start
```c
#include
int main(void) {
const char *json = "{\"name\": \"json-asm\", \"fast\": true}";
// Parse JSON
json_doc *doc = json_parse(json, strlen(json));
if (!doc) {
fprintf(stderr, "Parse error\n");
return 1;
}
// Access values
json_val *root = json_doc_root(doc);
json_val *name = json_obj_get(root, "name");
printf("Name: %s\n", json_get_str(name));
// Cleanup
json_doc_free(doc);
return 0;
}
```
## Building
```bash
# Requirements:
# x86-64: NASM 2.15+, GCC 10+/Clang 12+
# ARM64: GCC 10+/Clang 12+ (uses inline assembly)
make
make install # installs to /usr/local by default
# Run tests
make test
# Run benchmarks
make bench
```
See [docs/building.md](docs/building.md) for detailed build instructions.
## Documentation
- [API Reference](docs/api.md) - Complete C API documentation
- [Architecture](docs/architecture.md) - Internal design, SIMD optimizations, CPU features
- [Benchmarks](docs/benchmarks.md) - Performance methodology and comparisons
- [Building](docs/building.md) - Build instructions for x86-64 and ARM64
- [Examples](docs/examples.md) - Usage examples and patterns
## Why Assembly?
Modern compilers are excellent, but JSON parsing has predictable patterns that benefit from manual optimization:
### x86-64 Optimizations
- **AVX-512 masked operations** - Process 64 bytes with predicated loads/stores
- **AVX2 string scanning** - Find structural chars across 32 bytes simultaneously
- **BMI2 PEXT/PDEP** - Branchless bit manipulation for escape handling
- **POPCNT/LZCNT/TZCNT** - O(1) bit counting for mask processing
### ARM64 Optimizations
- **SVE/SVE2 variable-length vectors** - Scalable from 128 to 2048 bits
- **NEON 128-bit operations** - Parallel character classification
- **AdvSIMD dot product** - Fast numeric accumulation
- **Load/store pair instructions** - Efficient memory access patterns
### Micro-architectural Optimizations
- **Branchless number parsing** - Eliminates misprediction penalties (~15 cycles)
- **Computed goto dispatch** - O(1) state machine transitions
- **Explicit prefetching** - Hide memory latency for sequential access
- **Register pressure optimization** - Keep hot data in registers
- **Instruction-level parallelism** - Maximize superscalar execution
See [docs/architecture.md](docs/architecture.md) for technical details.
## Design Approach
json-asm uses hand-optimized assembly to achieve high performance:
| Aspect | json-asm |
|---------------------|----------------------------------|
| Implementation | Native assembly (x86-64 + ARM64) |
| SIMD x86-64 | AVX-512/AVX2/SSE4.2 hand-written |
| SIMD ARM64 | NEON/SVE/SVE2 hand-written |
| String scanning | Custom SIMD routines |
| Number parsing | Branchless assembly algorithms |
| Memory layout | Compact 24 bytes/value |
By controlling every instruction in hot paths, assembly allows for:
- Explicit SIMD instruction selection
- Branchless algorithms to avoid misprediction penalties
- Cache-line alignment and prefetching control
- Register allocation optimization
## CPU Feature Detection
json-asm automatically detects and uses the best available instructions:
```
x86-64 dispatch (best to worst):
AVX-512 + BMI2 → AVX2 + BMI2 → AVX2 → SSE4.2 → Scalar
ARM64 dispatch (best to worst):
SVE2 → SVE → NEON → Scalar
```
Runtime detection ensures optimal performance on any conforming CPU.
## API Overview
```c
// Parsing
json_doc *json_parse(const char *json, size_t len);
json_doc *json_parse_file(const char *path);
// Document access
json_val *json_doc_root(json_doc *doc);
void json_doc_free(json_doc *doc);
// Value inspection
json_type json_get_type(json_val *val);
const char *json_get_str(json_val *val);
double json_get_num(json_val *val);
int64_t json_get_int(json_val *val);
bool json_get_bool(json_val *val);
// Object/Array access
json_val *json_obj_get(json_val *obj, const char *key);
json_val *json_arr_get(json_val *arr, size_t index);
// Serialization
char *json_stringify(json_val *val);
```
See [docs/api.md](docs/api.md) for the complete API reference.
## Requirements
### x86-64
- **CPU**: x86-64-v2 minimum (SSE4.2), x86-64-v4 recommended (AVX-512)
- **OS**: Linux, macOS, Windows, FreeBSD
- **Build**: NASM 2.15+, GCC 10+ or Clang 12+
### ARM64
- **CPU**: ARMv8-A minimum, ARMv9-A recommended (SVE2)
- **OS**: Linux, macOS (Apple Silicon), Android
- **Build**: GCC 10+ or Clang 12+
## License
MIT License. See [LICENSE](LICENSE) for details.
## Contributing
Contributions welcome! Performance improvements should include benchmark results. Please read the architecture docs before working on assembly code.