An open API service indexing awesome lists of open source software.

https://github.com/bug-ops/fast-yaml

Parse YAML at Rust speed. Full 1.2.2 spec, zero unsafe code, built-in linter, parallel processing. Native bindings for Python & Node.js.
https://github.com/bug-ops/fast-yaml

high-performance linter napi-rs nodejs parallel-processing parser pyo3 python rust yaml yaml-linter yaml-parser

Last synced: 20 days ago
JSON representation

Parse YAML at Rust speed. Full 1.2.2 spec, zero unsafe code, built-in linter, parallel processing. Native bindings for Python & Node.js.

Awesome Lists containing this project

README

          

# fast-yaml

[![CI Status](https://img.shields.io/github/actions/workflow/status/bug-ops/fast-yaml/ci.yml?branch=main)](https://github.com/bug-ops/fast-yaml/actions)
[![codecov](https://codecov.io/gh/bug-ops/fast-yaml/graph/badge.svg?token=E33WB16NUD)](https://codecov.io/gh/bug-ops/fast-yaml)
[![Crates.io](https://img.shields.io/crates/v/fast-yaml-cli)](https://crates.io/crates/fast-yaml-cli)
[![docs.rs](https://img.shields.io/docsrs/fast-yaml-core)](https://docs.rs/fast-yaml-core)
[![PyPI](https://img.shields.io/pypi/v/fastyaml-rs)](https://pypi.org/project/fastyaml-rs/)
[![npm](https://img.shields.io/npm/v/fastyaml-rs)](https://www.npmjs.com/package/fastyaml-rs)
[![License](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue)](LICENSE-MIT)

**High-performance YAML 1.2.2 parser for Python and Node.js, powered by Rust.**

Drop-in replacement for PyYAML and js-yaml. Matches or beats PyYAML C on small/medium files, **2-4x faster** than pure Python, **1.2-1.4x faster** than js-yaml. Full YAML 1.2.2 Core Schema compliance, comprehensive linting, and multi-threaded parallel processing.

> [!IMPORTANT]
> **YAML 1.2.2 Compliance** — Unlike PyYAML (YAML 1.1), `fast-yaml` follows the modern YAML 1.2.2 specification. This means `yes/no/on/off` are strings, not booleans.

## Installation

```bash
# Python
pip install fastyaml-rs

# Node.js
npm install fastyaml-rs

# CLI
cargo install fast-yaml-cli
```

> [!WARNING]
> Requires Rust 1.88+, Python 3.10+ or Node.js 20+

Build from source

```bash
git clone https://github.com/bug-ops/fast-yaml.git
cd fast-yaml

# Python
uv sync && uv run maturin develop

# Node.js
cd nodejs && npm install && npm run build
```

## Quick Start

### Python

```python
import fast_yaml

data = fast_yaml.safe_load("""
name: fast-yaml
features: [fast, safe, yaml-1.2.2]
""")

yaml_str = fast_yaml.safe_dump(data)
```

> [!TIP]
> Migrating from PyYAML? Just change your import: `import fast_yaml as yaml`

### Node.js

```typescript
import { safeLoad, safeDump } from 'fastyaml-rs';

const data = safeLoad(`name: fast-yaml`);
const yamlStr = safeDump(data);
```

### CLI

```bash
# Single file operations
fy parse config.yaml # Validate syntax
fy format -i config.yaml # Format in-place
fy convert json config.yaml # YAML → JSON
fy lint config.yaml # Lint with diagnostics

# Batch mode (directories, globs, multiple files)
fy format -i src/ # Format entire directory
fy format -i "**/*.yaml" # Format with glob pattern
fy format -i -j 8 project/ # Parallel processing (8 workers)
fy lint --exclude "tests/**" . # Lint all except tests
```

> [!TIP]
> Batch mode activates automatically for directories, globs, or multiple files. Supports parallel processing, include/exclude patterns, and respects `.gitignore`.

## Features

- **High Performance** — Matches PyYAML C on small/medium files, 2-4x faster than pure Python
- **YAML 1.2.2** — Full Core Schema compliance
- **Drop-in API** — Compatible with PyYAML/js-yaml
- **Batch Processing** — Multi-file operations with parallel workers, glob patterns, .gitignore support
- **Linting** — Rich diagnostics with line/column tracking
- **Parallel** — Multi-threaded processing for large files
- **Safe** — Memory-safe Rust with minimal `unsafe` (FFI boundaries only, explicitly documented)

> [!TIP]
> Parallel processing provides 3-6x speedup on 4-8 core systems for multi-document files.

Feature details

### Linting

```python
from fast_yaml._core.lint import lint

diagnostics = lint("key: value\nkey: duplicate")
for diag in diagnostics:
print(f"{diag.severity}: {diag.message} at line {diag.span.start.line}")
```

### Parallel Processing

```python
from fast_yaml._core.parallel import parse_parallel, ParallelConfig

config = ParallelConfig(thread_count=4, max_input_size=100*1024*1024)
docs = parse_parallel(multi_doc_yaml, config)
```

## Performance

> [!NOTE]
> Three separate benchmark suites: **Python API** (vs PyYAML), **Node.js API** (vs js-yaml), and **CLI Batch Mode** (vs yamlfmt).

> [!NOTE]
> Process startup overhead (~15ms for Python, ~20-25ms for Node.js) affects small file benchmarks. In long-running servers (persistent processes), speedups would be 2-4x higher.

> [!TIP]
> Batch mode is where fast-yaml excels with parallel processing. Use `-j` to specify worker count.

Benchmark results

### Python API vs PyYAML

**Parse (loading):**

| File Size | fast-yaml | PyYAML (C) | PyYAML (pure) | vs C | vs pure |
|-----------|-----------|------------|---------------|------|---------|
| Small (502B) | **15.5 ms** | 20.2 ms | 20.8 ms | **1.30x** | **1.34x** |
| Medium (44KB) | **26.3 ms** | 26.4 ms | 61.2 ms | **1.00x** | **2.33x** |
| Large (449KB) | 130.3 ms | **79.3 ms** | 429.6 ms | 0.61x | **3.30x** |

**Dump (serialization):**

| File Size | fast-yaml | PyYAML (C) | PyYAML (pure) | vs C | vs pure |
|-----------|-----------|------------|---------------|------|---------|
| Small (502B) | **15.7 ms** | 20.8 ms | 21.2 ms | **1.33x** | **1.35x** |
| Medium (44KB) | **31.6 ms** | 31.7 ms | 82.7 ms | **1.00x** | **2.62x** |
| Large (449KB) | 177.6 ms | **131.1 ms** | 653.8 ms | 0.74x | **3.68x** |

**Key findings:**
- **Small/Medium files**: fast-yaml matches or beats PyYAML C (1.0-1.3x speedup)
- **Pure Python**: fast-yaml consistently 1.3-3.7x faster across all sizes
- **Large files**: PyYAML C optimized for single large files; use fast-yaml's parallel mode for multi-document streams

Full benchmarks: [benches/comparison](benches/comparison/)

### Node.js API vs js-yaml (Apple M3 Pro, 12 cores)

**Parse (loading):**

| File Size | fast-yaml | js-yaml | Speedup |
|-----------|-----------|---------|---------|
| Small (502B) | **24.4 ms** | 28.1 ms | **1.15x** |
| Medium (44KB) | **26.2 ms** | 31.9 ms | **1.22x** |
| Large (449KB) | **40.4 ms** | 48.3 ms | **1.20x** |

**Dump (serialization):**

| File Size | fast-yaml | js-yaml | Speedup |
|-----------|-----------|---------|---------|
| Small (502B) | **24.1 ms** | 29.3 ms | **1.22x** |
| Medium (44KB) | **27.1 ms** | 34.9 ms | **1.29x** |
| Large (449KB) | **50.7 ms** | 72.1 ms | **1.42x** |

**Key findings:**
- **Consistent advantage**: fast-yaml 1.15-1.42x faster across all scenarios
- **Best performance**: Large file dump operations (1.42x speedup)
- **V8 JIT competitive**: js-yaml benefits from TurboFan optimization, reducing speedup vs pure Python
- **Real-world servers**: In persistent processes without startup overhead, expect 2-4x speedup

### CLI Single-File vs yamlfmt (Apple M3 Pro, 12 cores)

| File Size | fast-yaml | yamlfmt | Result |
|-----------|-----------|---------|--------|
| Small (502 bytes) | **1.7 ms** | 3.1 ms | **1.80x faster** ✓ |
| Medium (45 KB) | **2.5 ms** | 2.9 ms | **1.19x faster** ✓ |
| Large (460 KB) | 8.4 ms | **2.9 ms** | yamlfmt 2.88x faster |

### CLI Batch Mode vs yamlfmt

| Workload | fast-yaml (parallel) | yamlfmt (sequential) | Speedup |
|----------|---------------------|----------------------|---------|
| 50 files (26 KB) | **4.3 ms** | 10.3 ms | **2.40x faster** ✓ |
| 200 files (204 KB) | **8.0 ms** | 52.7 ms | **6.63x faster** ✓ |
| 500 files (1 MB) | **15.5 ms** | 244.7 ms | **15.77x faster** ⚡ |
| 1000 files (1 MB) | **23.4 ms** | 323.4 ms | **13.80x faster** ⚡ |

**Key takeaway:** Batch mode with parallel workers provides 6-15x speedup on multi-file operations, making it ideal for formatting entire codebases.

```bash
# Run benchmarks
bash benches/comparison/scripts/run_python_benchmark.sh # Python API
bash benches/comparison/scripts/run_nodejs_benchmark.sh # Node.js API
bash benches/comparison/scripts/run_batch_benchmark.sh # CLI batch mode
```

**Test environment:** macOS 14, Apple M3 Pro (12 cores), fast-yaml 0.4.1, PyYAML 6.0.3, js-yaml 4.1.1, Node.js 25.2.1, yamlfmt 0.21.0

## YAML 1.2.2 Differences

Differences from PyYAML (YAML 1.1)

| Feature | PyYAML (YAML 1.1) | fast-yaml (YAML 1.2.2) |
|---------|-------------------|------------------------|
| `yes/no` | `True/False` | `"yes"/"no"` (strings) |
| `on/off` | `True/False` | `"on"/"off"` (strings) |
| `014` (octal) | `12` | `14` (decimal) |
| `0o14` (octal) | Error | `12` |

```python
fast_yaml.safe_load("yes") # "yes" (string, not True!)
fast_yaml.safe_load("0o14") # 12 (octal)
fast_yaml.safe_load("014") # 14 (decimal, NOT octal!)
```

## API Reference

Loading YAML

```python
# Single document
data = fast_yaml.safe_load(yaml_string)

# Multiple documents
for doc in fast_yaml.safe_load_all(yaml_string):
print(doc)

# PyYAML-compatible
data = fast_yaml.load(yaml_string, Loader=fast_yaml.SafeLoader)
```

Dumping YAML

```python
yaml_str = fast_yaml.safe_dump(data)

# With options
yaml_str = fast_yaml.dump(
data,
indent=2,
width=80,
explicit_start=True,
sort_keys=False,
)

# Multiple documents
yaml_str = fast_yaml.safe_dump_all([doc1, doc2, doc3])
```

Type mappings

| YAML Type | Python Type |
|-----------|-------------|
| `null`, `~` | `None` |
| `true`, `false` | `bool` |
| `123`, `0x1F`, `0o17` | `int` |
| `1.23`, `.inf`, `.nan` | `float` |
| `"string"`, `'string'` | `str` |
| `[a, b, c]` | `list` |
| `{a: 1, b: 2}` | `dict` |

## Security

Input validation prevents denial-of-service attacks.

Security limits

| Limit | Default | Configurable |
|-------|---------|--------------|
| Max input size | 100 MB | Yes (up to 1GB) |
| Max documents | 100,000 | Yes (up to 10M) |
| Max threads | 128 | Yes |

## Project

Project structure

```
fast-yaml/
├── crates/
│ ├── fast-yaml-core/ # Core YAML parser/emitter
│ ├── fast-yaml-linter/ # Linting engine
│ ├── fast-yaml-parallel/ # Multi-threaded processing
│ └── fast-yaml-ffi/ # FFI utilities
├── python/ # PyO3 Python bindings
├── nodejs/ # NAPI-RS Node.js bindings
└── Cargo.toml # Workspace manifest
```

Technology stack

| Component | Library |
|-----------|---------|
| YAML Parser | [saphyr](https://github.com/saphyr-rs/saphyr) |
| Python Bindings | [PyO3](https://pyo3.rs/) |
| Node.js Bindings | [NAPI-RS](https://napi.rs/) |
| Parallelism | [Rayon](https://github.com/rayon-rs/rayon) |

**Rust 2024 Edition** • **Python 3.10+** • **Node.js 20+**

## Contributing

Contributions welcome! All PRs must pass CI checks:

```bash
cargo +nightly fmt --all
cargo clippy --workspace --all-targets -- -D warnings
cargo nextest run --workspace
```

## FAQ

Why not just use PyYAML?

PyYAML is excellent. Use fast-yaml when you need performance (5-10x faster), YAML 1.2.2 compliance, built-in linting, or parallel processing.

Is this a drop-in replacement?

For `safe_*` functions, yes. Just change `import yaml` to `import fast_yaml as yaml`. Note that YAML 1.2.2 has different boolean/octal handling.

When should I use parallel processing?

Use `parse_parallel()` for multi-document YAML files (separated by `---`) larger than 1MB with 4+ CPU cores. For single documents, use `safe_load()`.

## License

Licensed under [MIT](LICENSE-MIT) or [Apache-2.0](LICENSE-APACHE) at your option.