https://github.com/fabio-rovai/tardygrada

Trust infrastructure for AI agents. Know who produced a value, when, and that it hasn't been tampered with. Zero dependencies. Pure C.
https://github.com/fabio-rovai/tardygrada

agent-framework ai-agents ai-safety byzantine-fault-tolerance c coq cryptography ed25519 formal-verification hallucination-detection llm mcp mcp-server ontology programming-language verification zero-dependencies

Last synced: 4 months ago
JSON representation

Trust infrastructure for AI agents. Know who produced a value, when, and that it hasn't been tampered with. Zero dependencies. Pure C.

Host: GitHub
URL: https://github.com/fabio-rovai/tardygrada
Owner: fabio-rovai
License: mit
Created: 2026-03-31T14:18:51.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-04-09T12:48:24.000Z (4 months ago)
Last Synced: 2026-04-11T13:38:54.888Z (4 months ago)
Topics: agent-framework, ai-agents, ai-safety, byzantine-fault-tolerance, c, coq, cryptography, ed25519, formal-verification, hallucination-detection, llm, mcp, mcp-server, ontology, programming-language, verification, zero-dependencies
Language: C
Homepage:
Size: 32.7 MB
Stars: 17
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          [![CI](https://github.com/fabio-rovai/tardygrada/actions/workflows/ci.yml/badge.svg)](https://github.com/fabio-rovai/tardygrada/actions/workflows/ci.yml)

[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)



  



Catch lazy agents, contradicting claims, and tampered data


---

## Your agent says it checked three sources. Did it?

Your document says "completed on time" on page 2 and "delayed 3 months" on page 7. Did anyone notice?

Your scoring pipeline passed through 5 agents. Can you prove the scores weren't changed along the way?

```bash

git clone https://github.com/fabio-rovai/tardygrada && cd tardygrada && make

tardy run "Paris is in France"                    # VERIFIED (80%)

tardy verify-doc report.md                        # 2 contradictions found

tardy daemon start && tardy run "check this"      # persistent, remembers everything

```

---

## What it does

### Catches lazy agents

Your agent claims it queried the knowledge base, consulted sources, and cross-checked. Tardygrada records every operation independently — like a dashcam. If the agent faked it, you'll know.

| Laziness type | What it means | Caught? |

|---|---|:-:|

| Did nothing, produced output anyway | NoWork | Yes |

| Skimmed instead of analyzing | ShallowWork | Yes |

| Fabricated evidence of work | FakeProof | Yes |

| Copied another agent's answer | CopiedWork | Yes |

| "Verified" itself in a circle | CircularVerification | Yes |

### Catches contradicting claims

"The project was completed on time." and "The project was delayed by 3 months." — both sound fine alone. Together, they're a contradiction. Existing tools check claims one by one and miss this.

Tardygrada checks them together. Three layers:

- Logical contradictions (direct opposites, impossible combinations)

- Numeric contradictions (the math doesn't add up)

- Domain contradictions (the science doesn't work)

```bash

tardy verify-doc paper.md

# [CONFLICT] Lines 42 vs 89:

#   "We used no external APIs"

#   "API costs totalled $2,400"

#   → claims no APIs but reports API costs

```

### Catches tampered data

A score of 8.5 stored in a Python dict — any agent can silently change it to 9.5. In Tardygrada, values are locked by the operating system. Tampering requires breaking SHA-256 or forging an ed25519 signature.

---

## Get started

**Just the CLI:**

```bash

make                                    # builds in < 3 seconds

tardy run "your claim here"             # verify anything

tardy verify-doc your-file.md           # scan for contradictions

```

**Persistent mode** (remembers between runs):

```bash

tardy daemon start                      # start background service

tardy run "claim"                       # uses persistent knowledge base

tardy daemon status                     # see what it knows

tardy daemon stop                       # clean shutdown

```

**Inside Claude Code** (MCP server):

```json

{

  "mcpServers": {

    "tardygrada": {

      "command": "tardygrada",

      "args": ["mcp-bridge"]

    }

  }

}

```

Then just ask: *"verify this document for contradictions"*

**Inside Claude Code** (session monitor):

```bash

/targyactivate

```

Activates Tardygrada as a contradiction monitor for the entire session. Every claim you and Claude make is recorded in the palace memory and checked against session history. If either side contradicts itself, Tardygrada flags it. Say `targy off` to deactivate.

**Inside Qwen Code** (MCP server):

Qwen Code uses newline-delimited JSON-RPC instead of Content-Length framing. Use the included adapter:

```json

{

  "mcpServers": {

    "tardygrada": {

      "command": "/bin/bash",

      "args": ["path/to/tardygrada/hooks/targy-mcp-wrapper.sh"]

    }

  }

}

```

This gives Qwen Code access to `verify_claim`, `verify_document`, `spawn_agent`, `read_agent`, and `daemon_status` as native MCP tools. The wrapper starts the daemon automatically if it isn't running.

**Convert your existing agents:**

```bash

tardy terraform /path/to/crewai         # 153K lines → 53 instructions

tardy terraform /path/to/llamaindex     # 237K lines → 15 instructions

```

---

## How well does it work?

### Laziness detection

| | Precision | Recall | F1 |

|---|:-:|:-:|:-:|

| Clear cases (60 traces) | 1.00 | 1.00 | 1.00 |

| + Adversarial (100 total) | 1.00 | 0.85 | **0.92** |

100 traces total. Zero false positives. Smart copiers who change 10-15% of the text slip through (similarity below threshold) — a known limitation. No existing tool does any of this.

### Contradiction and hallucination detection

| Dataset | What it is | Tardygrada | Best alternative |

|---|---|:-:|:-:|

| Clear contradictions (125) | Designed compositional | **95%** | SelfCheck: 59% |

| + Borderline cases (225 total) | Soft/ambiguous contradictions | **69%** | SelfCheck: 38% |

| **AgentHallu (693 trajectories)** | Real agent hallucinations, 7 frameworks | **F1: 0.58** | DeepSeek-V3.1: 0.52 |

| **ContraDoc (891 docs)** | Real documents, human-annotated | **F1: 0.58** | SelfCheck: 0.16 |

| HaluEval (500 responses) | Individual factual errors | F1: 0.03 | SelfCheck: 0.32 |

Detection runs in two modes: deterministic (all benchmarks use this) or LLM-enhanced for broader coverage. Typical speeds: 5.7ms/trajectory (AgentHallu), 7.5ms/document (ContraDoc), 0.015ms/case (synthetic).

On ContraDoc (891 real documents) — **F1 0.58**, up from 0.16 after fixing a bug where the benchmark accidentally used the SelfCheck baseline instead of proper triple checking. Recall jumped from 9.1% to 64.8%.

On AgentHallu (693 real agent trajectories) — **F1 0.58**, beats DeepSeek-V3.1 (0.52). GPT-5 gets 0.70 but costs per-trajectory API calls.

HaluEval (individual factual errors) — F1 0.03. Expected: our pipeline catches contradictions between claims, not individual factual mistakes. SelfCheck does better here (0.32) because its loose heuristics accidentally catch some errors.

> **What runs where:** Contradiction detection (verify-doc, all benchmarks) uses the internal decomposition + consistency + numeric layers — no external calls. Claim grounding (`tardy run "claim"`) optionally connects to [open-ontologies](https://github.com/fabio-rovai/open-ontologies) for OWL reasoning, or uses the built-in Datalog engine. Different features, different paths.

AgentHallu per-category recall

| Category | Recall |

|---|:-:|

| Reasoning | 68% |

| Planning | 66% |

| Retrieval | 59% |

| Human-Interaction | 53% |

| Tool-Use | 21% |

Detailed breakdown (clear cases)

| Difficulty | Detection |

|---|:-:|

| Easy (direct opposites) | 100% |

| Medium (logical) | 100% |

| Hard (math/physics) | 96% |

| Subtle (domain knowledge) | 92% |

| Very subtle (statistical) | 88% |

### Scaling

| Agents | Time |

|-------:|-----:|

| 5 | 0.6 ms |

| 500 | 21 ms |

| 5,000 | 97 ms |

---

## Under the hood

How verification works

```mermaid

graph LR

    subgraph Pipeline["Verification Pipeline"]

        direction LR

        C["Claim"] --> D["Decompose"]

        D --> G["Ground"]

        G --> CON["Consistency"]

        CON --> P["Probabilistic"]

        P --> PR["Protocol"]

        PR --> F["Certification"]

        F --> CR["Cross-Rep"]

        CR --> W["Work Verify"]

        W --> V{"VERIFIED /
CONFLICT /
UNVERIFIABLE"}

    end

    style Pipeline fill:transparent

```

Claims are decomposed into triples, grounded against a knowledge base, checked for consistency, scored probabilistically, and verified for work integrity. Eight layers, all deterministic.

How tamper protection works

```mermaid

graph LR

    subgraph Trust["Protection Levels"]

        direction LR

        MUT["Mutable"] --> DEF["Default
(OS-locked)"]

        DEF --> VER["Verified
(+ SHA-256)"]

        VER --> HARD["Hardened
(+ replicas)"]

        HARD --> SOV["Sovereign
(+ ed25519 + BFT)"]

    end

    style Trust fill:transparent

```

Values are protected at the operating system level. The OS kernel enforces read-only memory. SHA-256 hashes detect any change. Ed25519 signatures prove authorship. BFT consensus requires corrupting multiple independent replicas.

How the daemon works

```mermaid

graph TB

    subgraph visible["What you see"]

        USER["You"] --> CLI["tardy run / verify-doc"]

    end

    subgraph hidden["What happens"]

        CLI --> DAEMON["Persistent daemon"]

        DAEMON --> AGENTS["Living agents"]

        DAEMON --> KB["Growing knowledge base"]

        DAEMON --> VERIFY["Verification pipeline"]

    end

    style visible fill:transparent

    style hidden fill:transparent

```

The daemon keeps agents alive between commands. The knowledge base grows as verified claims accumulate. Sovereign agents persist to disk on shutdown and reload on restart.

Architecture

```mermaid

graph TB

    subgraph Tardygrada["Tardygrada"]

        CLI_CMD["CLI"] --> DAEMON_S["Daemon"]

        DAEMON_S --> VM["VM Core"]

        VM --> VERIFY_S["Verification"]

        VM --> ONTO["Knowledge Base"]

        VM --> CRYPTO_S["Cryptography"]

        VERIFY_S --> DECOMP_S["Decompose"]

        VERIFY_S --> NUMERIC_S["Numeric Check"]

        VERIFY_S --> DOMAIN_S["Domain Check"]

        VERIFY_S --> WORK_S["Work Verify"]

    end

    subgraph External["Optional integrations"]

        BITF["brain-in-the-fish
(multi-agent debate)"]

        OO["open-ontologies
(OWL reasoning)"]

    end

    VM -- "coordinate" --> BITF

    VM -- "grounded_in" --> OO

    style Tardygrada fill:transparent

    style External fill:transparent

```

The language (for power users)

```

agent MedicalAdvisor @sovereign @semantics(truth.min_confidence: 0.99) {

    invariant(trust_min: @verified)

    let diagnosis: Fact = receive("symptom analysis") grounded_in(medical) @verified

    let data: str = exec("sqlite3 patients.db 'SELECT * FROM current'")

    coordinate {analyzer, validator} on("verify diagnosis") consensus(ProofWeight)

}

```

Every value is an agent. Programs compile to servers. `receive()` accepts claims from external systems. `@sovereign` means the value is cryptographically signed and replicated. `coordinate` dispatches to multi-agent debate.

You don't need to learn this to use Tardygrada. The CLI and daemon handle everything.

Reproduce all evaluations

```bash

cd evaluation && make

./laziness_bench           # 60 traces, F1 1.00

./hallucination_bench      # 500 cases, 95% compositional

./scaling_bench            # 5→5000 agents, linear

./ablation_bench           # layer-by-layer analysis

./contradoc_bench          # 891 real documents (external)

./halueval_bench           # 500 HaluEval examples (external)

```

---

## Research

Built on: [AgentSpec](https://arxiv.org/abs/2503.18666) (ICSE 2026), [Bythos](https://arxiv.org/abs/2302.01527) (Coq BFT), Minsky frames (1974), CRDTs (Shapiro 2011), Datalog (1986).

Evaluated against: [SelfCheckGPT](https://arxiv.org/abs/2303.08896) (EMNLP 2023), [FActScore](https://aclanthology.org/2023.emnlp-main.741/) (EMNLP 2023), [ContraDoc](https://aclanthology.org/2024.naacl-long.362/) (NAACL 2024), [HaluEval](https://huggingface.co/datasets/pminervini/HaluEval).

Related: [Mundler et al.](https://arxiv.org/abs/2305.15852) (ICLR 2024), [Fang et al.](https://arxiv.org/abs/2409.11283) (AAAI 2025), [He et al.](https://arxiv.org/abs/2601.13600) (2026).

## License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fabio-rovai/tardygrada

Awesome Lists containing this project

README

Catch lazy agents, contradicting claims, and tampered data