https://github.com/argahsuknesib/toon-ld

Token Oriented Object Notation (TOON) for Linked Data
https://github.com/argahsuknesib/toon-ld

context-window json-ld knowledge-graph linked-data llm rag rdf rust semantic-web serialization token-optimization token-oriented-object-notation wasm

Last synced: 7 months ago
JSON representation

Token Oriented Object Notation (TOON) for Linked Data

Host: GitHub
URL: https://github.com/argahsuknesib/toon-ld
Owner: argahsuknesib
License: mit
Created: 2025-12-19T12:43:52.000Z (7 months ago)
Default Branch: main
Last Pushed: 2025-12-21T23:31:26.000Z (7 months ago)
Last Synced: 2025-12-23T10:39:55.823Z (7 months ago)
Topics: context-window, json-ld, knowledge-graph, linked-data, llm, rag, rdf, rust, semantic-web, serialization, token-optimization, token-oriented-object-notation, wasm
Language: Rust
Homepage:
Size: 1 MB
Stars: 6
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

README

          # TOON-LD

[![npm](https://img.shields.io/npm/v/toon-ld)](https://www.npmjs.com/package/toon-ld)

[![PyPI](https://img.shields.io/pypi/v/toon-ld)](https://pypi.org/project/toon-ld/)

[![Crates.io](https://img.shields.io/crates/v/toon-ld)](https://crates.io/crates/toon-ld)

**Token-Oriented Object Notation for Linked Data** — A lossless **Knowledge Graph Compression** format for LLM Context Windows.

TOON-LD reduces token usage by **40-60%** compared to JSON-LD, allowing you to fit twice as much structured data into your prompts for RAG (Retrieval-Augmented Generation) applications.

It works by extending standard TOON syntax with Linked Data semantics, meaning **every valid TOON-LD document is also a valid TOON document**. Base TOON parsers can process it natively, while TOON-LD processors unlock the full semantic graph.

## Why TOON-LD?

**The Problem:** Knowledge Graphs (JSON-LD) are incredibly verbose. Using them in RAG pipelines burns through token budgets and hits context limits fast.

**The Solution:** TOON-LD acts as a compression layer. It combines the semantic expressiveness of RDF with radical token efficiency through tabular arrays. By eliminating repetitive keys and using CSV-like rows for uniform data, TOON-LD fits significantly more information into LLM context windows without losing structure.

## Features

- **Pure TOON Extension**: Every TOON-LD document is valid TOON (like JSON-LD extends JSON)

- **Tabular Arrays**: Serialize arrays of objects as CSV-like rows with shared headers

- **40-60% Token Reduction**: Fewer tokens means lower costs and more data in context

- **Full JSON-LD Compatibility**: Round-trip conversion without data loss

- **All JSON-LD 1.1 Keywords**: Complete support for `@context`, `@graph`, `@id`, `@type`, value nodes, etc.

- **Cross-Platform**: Rust, WebAssembly (npm), and Python (PyPI) implementations

- **High Performance**: Optimized serialization with automatic tabular array detection

## Benchmarks

Real-world token savings across different dataset sizes:

| Records | JSON-LD Size | TOON-LD Size | Size Saved | Tokens Saved |

|---------|--------------|--------------|------------|--------------|

| 10      | 862 B        | 518 B        | **39.9%**  | **54.2%**    |

| 100     | 8,782 B      | 5,109 B      | **41.8%**  | **56.3%**    |

| 1,000   | 90,682 B     | 53,710 B     | **40.8%**  | **56.5%**    |

| 10,000  | 936,682 B    | 566,711 B    | **39.5%**  | **53.4%**    |

**Key takeaway**: Token savings scale well and are especially valuable for LLM context windows.

### Sparsity Analysis

TOON-LD's efficiency depends on data sparsity. Shape-based partitioning (enabled by default) ensures TOON-LD remains efficient even for highly heterogeneous data.

![Token Efficiency Graph](docs/images/benchmark_sparsity.png)

![Savings Percentage Graph](docs/images/benchmark_savings.png)

- **Low Sparsity (0-30%)**: Both Union and Partition approaches save ~40-50% tokens.

- **High Sparsity (60%+)**: Partitioning significantly outperforms the Union schema, maintaining efficiency where standard tabular formats fail.

#### Token Cost Analysis

**Union Schema**: High cost when `null_count` is large (sparse data).

**Partitioned Schema**: Low cost when partitions have dense, non-overlapping fields.

**Break-even point**: ~30% sparsity threshold balances both approaches.

**Partitioning excels when:**

- High field diversity (heterogeneous graphs)

- Large datasets

- Mixed entity types

## Quick Example

**JSON-LD:**

```json

{

  "@context": {

    "foaf": "http://xmlns.com/foaf/0.1/"

  },

  "@graph": [

    {"@id": "ex:1", "@type": "foaf:Person", "foaf:name": "Alice", "foaf:age": 30},

    {"@id": "ex:2", "@type": "foaf:Person", "foaf:name": "Bob", "foaf:age": 25}

  ]

}

```

**TOON-LD:**

```

@context:

  foaf: http://xmlns.com/foaf/0.1/

@graph[2]{@id,@type,foaf:age,foaf:name}:

  ex:1, foaf:Person, 30, Alice

  ex:2, foaf:Person, 25, Bob

```

Notice how object keys appear once in the header instead of repeating for each object.

## How TOON-LD Extends TOON

Just as JSON-LD extends JSON by adding semantic meaning to certain key names (those starting with `@`), TOON-LD extends TOON the same way:

- **No new syntax**: TOON-LD uses only standard TOON syntax (objects, arrays, tabular format)

- **Semantic interpretation**: Keys like `@context`, `@id`, `@type` have special JSON-LD meaning

- **Full compatibility**: Any TOON parser can parse TOON-LD documents

- **Value nodes**: Language tags and datatypes use tabular format for efficiency

Example value node with language tag:

```

title[2]{@value,@language}:

  The Hobbit,en

  Der Hobbit,de

```

This is standard TOON tabular syntax that base TOON parsers handle natively, while TOON-LD processors interpret it as JSON-LD value nodes.

## Installation

### Rust

```toml

[dependencies]

toon-ld = "0.2"

```

### CLI

```bash

cargo install toon-cli

```

### Python

```bash

pip install toon-ld

```

### JavaScript/TypeScript

```bash

npm install toon-ld

```

## Quick Start

### CLI

```bash

# Convert JSON-LD to TOON-LD

toon-ld convert -i data.jsonld -o data.toon

# Convert back to JSON-LD

toon-ld convert -i data.toon -o data.jsonld

# Run benchmark

toon-ld benchmark --max-records 10000

```

### Rust

```rust

use toon_ld::{encode, decode};

let json_ld = r#"{"@context": {"foaf": "http://xmlns.com/foaf/0.1/"}, "foaf:name": "Alice"}"#;

let toon = encode(json_ld)?;

let back = decode(&toon)?;

```

### Python

```python

import toon_ld

json_ld = '{"@context": {"foaf": "http://xmlns.com/foaf/0.1/"}, "foaf:name": "Alice"}'

toon_str = toon_ld.encode(json_ld)

json_str = toon_ld.decode(toon_str)

```

### JavaScript

```javascript

import { encode, decode, parse, stringify } from 'toon-ld';

// String conversion

const jsonLd = '{"@context": {"foaf": "http://xmlns.com/foaf/0.1/"}, "foaf:name": "Alice"}';

const toon = encode(jsonLd);

const json = decode(toon);

// Object helpers

const data = parse(toon); // Returns JS Object

console.log(data['foaf:name']); // "Alice"

const toonStr = stringify({ "foaf:name": "Bob" }); // Takes JS Object

```

 

## Key Concepts

### Tabular Arrays

Arrays of objects share a header with field names, followed by CSV-like rows:

```

@context:

  foaf: http://xmlns.com/foaf/0.1/

  vcard: http://www.w3.org/2006/vcard/ns#

foaf:knows[3]{foaf:name,foaf:age,vcard:locality}:

  Alice, 30, null

  Bob, null, Portland

  Carol, 28, Seattle

```

### Value Nodes

Language tags and datatypes use standard TOON object or tabular syntax:

```

@context:

  dc: http://purl.org/dc/terms/

  schema: http://schema.org/

  xsd: http://www.w3.org/2001/XMLSchema#

dc:title:

  @value: Bonjour

  @language: fr

schema:datePublished:

  @value: "2024-01-15"

  @type: xsd:date

```

Or using tabular format for multiple values:

```

dc:titles[2]{@value,@language}:

  Bonjour,fr

  Hello,en

```

### Context Support

Automatic URI compaction using `@context`:

```

@context:

  foaf: http://xmlns.com/foaf/0.1/

foaf:name: Alice

```

## Project Structure

- `toon-core/` - Core Rust implementation

- `toon-cli/` - Command-line tool

- `toon-wasm/` - WebAssembly bindings (npm)

- `toon-py/` - Python bindings (PyPI)

## Building from Source

```bash

# Build all workspace members

cargo build --release

# Run tests

cargo test --workspace

# Build WASM package

cd toon-wasm && wasm-pack build --target web

# Build Python wheel

cd toon-py && maturin build --release

```

## License

MIT License - See [LICENSE](LICENSE) for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/argahsuknesib/toon-ld

Awesome Lists containing this project

README