An open API service indexing awesome lists of open source software.

https://github.com/bitflight-devops/mcp-json-yaml-toml

A structured data reader and writer like 'jq' and 'yq' for AI Agents
https://github.com/bitflight-devops/mcp-json-yaml-toml

agentic-workflows ai-agents config-management gemini jq json llm-tools mcp mcp-server model-context-protocol n8n openai schema-validation stdio-mcp toml yaml yq

Last synced: 3 months ago
JSON representation

A structured data reader and writer like 'jq' and 'yq' for AI Agents

Awesome Lists containing this project

README

          


JYT Logo

mcp-json-yaml-toml


A token-efficient, schema-aware MCP server for safely reading and modifying JSON, YAML, and TOML files


Getting Started
CLI Usage
Available Tools
Development


Test
Publish
PyPI version

---

Stop AI coding tools from breaking your data files. No more grep guesswork, hallucinated fields, or non-schema-compliant data added to files. This MCP server gives AI assistants a strict, round-trip safe interface for working with structured data.

## The Problem

AI coding tools often destroy structured data files:

- They **grep** through huge json, yaml, and toml files (like json logs, or AI transcript files) and guess at keys.
- They **hallucinate** fields that never existed.
- They use **sed and regex** that leave files in invalid states.
- They break **YAML indentation** and **TOML syntax**.
- They can't **validate** changes before writing.

## The Solution

**mcp-json-yaml-toml** provides AI assistants with proper tools for structured data:

- **Token-efficient**: Extract exactly what you need without loading entire files.
- **Schema validation**: Enforce correctness using SchemaStore.org or custom schemas.
- **Safe modifications**: **Enforced validation on write**; preserve comments and formatting.
- **Multi-format**: JSON, YAML, and TOML through a unified interface.
- **Directive-based detection**: Support for `# yaml-language-server`, `#:schema`, and `$schema` keys in all formats.
- **Constraint-based guided generation**: Native LMQL support for proactive validation of partial inputs.
- **Local-First**: All processing happens locally. No data ever leaves your machine.
- **Transparent JIT Assets**: The server auto-downloads `yq` if missing and fetches missing schemas from SchemaStore.org for local caching.

> [!NOTE]
>
> **JSONC Support**: Files with `.jsonc` extension (JSON with Comments) are fully supported for **reading**, **querying**, and **schema validation**. However, **write operations will strip comments** due to library limitations.

---

## Getting Started

### Prerequisites

- **Python ≥ 3.11** installed.
- An **MCP-compatible client** (Claude Code, Cursor, Windsurf, Gemini 2.0, n8n, etc.).

### Installation

The server uses `uvx` for automatic dependency management and zero-config execution.

#### AI Agents & CLI Tools

```bash
uvx mcp-json-yaml-toml
```

#### Claude Code (CLI)

```bash
claude mcp add --scope user mcp-json-yaml-toml -- uvx mcp-json-yaml-toml
```

#### Other MCP Clients

Add this to your client's MCP configuration:

```json
{
"mcpServers": {
"json-yaml-toml": {
"command": "uvx",
"args": ["mcp-json-yaml-toml"]
}
}
}
```

> [!TIP]
> See [docs/clients.md](docs/clients.md) for detailed setup guides for Cursor, VS Code, and more.

---

## Schema Discovery & Recognition

The server automatically identifies the correct JSON schema for your files using multiple strategies:

1. **Directives**: Recognizes `# yaml-language-server: $schema=...` and `#:schema ...` directives.
2. **In-File Keys**: Detects `$schema` keys in JSON and YAML (also supports quoted `"$schema"` in TOML).
3. **Local IDE Config**: Discovers schemas from VS Code/Cursor extension settings and caches.
4. **SchemaStore.org**: Performs glob-based auto-detection against thousands of known formats.
5. **Manual Association**: Use the `data_schema` tool to bind a file to a specific schema URL or name.

---

## LMQL & Guided Generation

This server provides native support for **LMQL (Language Model Query Language)** to enable **Guided Generation**. This allows AI agents to validate partial inputs (e.g., path expressions) incrementally before execution.

- **Incremental Validation**: Check partial inputs (e.g., `.data.us`) and get the remaining pattern needed.
- **Improved Reliability**: Eliminate syntax errors by guiding the LLM toward valid tool inputs.
- **Rich Feedback**: Get suggestions and detailed error messages for common mistakes.

> [!TIP]
> See the [Deep Dive: LMQL Constraints](docs/tools.md#deep-dive-lmql-constraints) for detailed usage examples.

---

## Available Tools

| Tool | Description |
| --------------------- | ---------------------------------------------- |
| `data` | Get, set, or delete values at specific paths |
| `data_query` | Advanced yq/jq expressions for transformations |
| `data_schema` | Manage schemas and validate files |
| `data_convert` | Convert between JSON, YAML, and TOML |
| `data_merge` | Deep merge structured data files |
| `constraint_validate` | Validate inputs against LMQL constraints |
| `constraint_list` | List available generation constraints |

> [!NOTE]
> Conversion **TO TOML** is not supported due to yq's internal encoder limitations for complex structures.

---

## Development

### Setup

```bash
git clone https://github.com/bitflight-devops/mcp-json-yaml-toml.git
cd mcp-json-yaml-toml
uv sync
```

### Testing

ash

# Run all tests (coverage included)

uv run pytest

````

### Code Quality

The project uses `prek` (a Rust-based pre-commit tool) for unified linting and formatting. AI Agents MUST use the scoped verification command:

```bash
# Recommended: Verify only touched files
uv run prek run --files
````

> [!IMPORTANT]
> Avoid `--all-files` during feature development to keep PR diffs clean and preserve git history.

---

## Project Structure

```text
mcp-json-yaml-toml/
├── packages/mcp_json_yaml_toml/ # Core logic
│ ├── server.py # MCP implementation
│ ├── yq_wrapper.py # Binary management
│ ├── schemas.py # Schema validation
├── .github/ # CI/CD and assets
├── docs/ # Documentation
└── pyproject.toml # Project config
```

```bash
# Run all tests (coverage included)
uv run pytest
```

### Code Quality

The project uses `prek` (a Rust-based pre-commit tool) for unified linting and formatting. AI Agents MUST use the scoped verification command:

```bash
# Recommended: Verify only touched files
uv run prek run --files
```

> [!IMPORTANT]
> Avoid `--all-files` during feature development to keep PR diffs clean and preserve git history.

---

## Project Structure

```mermaid
graph TD
Repo[mcp-json-yaml-toml]
Repo --> Packages[packages/mcp_json_yaml_toml]
Repo --> Github[.github]
Repo --> Docs[docs]
Repo --> Config[pyproject.toml]

subgraph "Core Logic"
Packages --> Server[server.py
MCP Server & Tools]
Packages --> Schemas[schemas.py
Schema Validation]
Packages --> Constraints[lmql_constraints.py
LMQL Constraints]
Packages --> YQ[yq_wrapper.py
Binary Manager]
Packages --> YAML[yaml_optimizer.py
YAML Anchors]
Packages --> TOML[toml_utils.py
TOML Utils]
Packages --> Conf[config.py
Config Manager]
end

style Packages fill:#f9f,stroke:#333,stroke-width:2px
style Repo fill:#eee,stroke:#333,stroke-width:4px
```

---

## Token Efficiency Experiment

Two identical Claude Code sub-agents were given the same task: read `~/.claude.json` and report every MCP server listed, including command, args, and env vars.

### Setup

- **Agent A** — standard prompt, used the built-in `Read` tool
- **Agent B** — same prompt with one line appended: `You must use the mcp__json-yaml-toml for all file interactions.`

Both agents used the `sonnet` model.

### Prompts

**Agent A prompt:**

```text
Read the file ~/.claude.json and report back:
1. Every MCP server listed in the mcpServers section
2. For each server: the command, args, and any env vars configured

Just report the raw findings. Do not summarize or interpret.
```

**Agent B prompt:**

```text
Read the file ~/.claude.json and report back:
1. Every MCP server listed in the mcpServers section
2. For each server: the command, args, and any env vars configured

You must use the mcp__json-yaml-toml for all file interactions.

Just report the raw findings. Do not summarize or interpret.
```

### Results

Both agents returned identical findings (8 MCP servers with correct configs).

| Metric | Agent A (Read tool) | Agent B (mcp-json-yaml-toml) |
| ---------------- | ------------------- | ---------------------------- |
| **Total tokens** | 37,119 | 28,734 |
| **Tool uses** | 4 | 2 |
| **Duration** | 29.3s | 12.7s |

Agent B used **22.6% fewer tokens** and completed in **43% of the time** with half the tool calls.

### Why

The `Read` tool loads the entire file into context. `~/.claude.json` is a large file — the agent had to consume all of it to find the `mcpServers` section. The MCP server's `data_query` tool extracted just the `mcpServers` section directly, keeping the context window small.

---


Built with FastMCP, yq, and LMQL