https://github.com/eliasstepanik/verdict

Guarded agent runtime for Rust
https://github.com/eliasstepanik/verdict
agent ai-agents ai-safety autonomous-agents guardrails llm llm-framework mcp pipeline rust
Last synced: 2 days ago
JSON representation
Guarded agent runtime for Rust
Host: GitHub
URL: https://github.com/eliasstepanik/verdict
Owner: eliasstepanik
License: other
Created: 2026-06-10T13:47:42.000Z (17 days ago)
Default Branch: master
Last Pushed: 2026-06-10T21:00:13.000Z (17 days ago)
Last Synced: 2026-06-10T22:11:26.697Z (16 days ago)
Topics: agent, ai-agents, ai-safety, autonomous-agents, guardrails, llm, llm-framework, mcp, pipeline, rust
Language: Rust
Size: 258 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          verdict

**A Rust framework for building agents that actually complete their work through code-enforced structure, guarded execution, and composable pipelines.**

![Social Preview](.github/social-preview.png)

[![Rust](https://img.shields.io/badge/rust-1.70+-orange.svg)](https://www.rust-lang.org/)

[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)

[![Status](https://img.shields.io/badge/status-10%20phases%20complete-brightgreen.svg)]()

## How It Works

```

User Task Input

      │

      ▼

┌─────────────────────────────────────────────────────┐

│                     Pipeline                         │

│                                                      │

│  Step 1: Generate Code                               │

│  ┌──────────────────────────────────────────────┐   │

│  │ guard_in:  Guard::None  ✓                    │   │

│  │ action:    LlmCall { "write hello world" }   │   │

│  │ guard_out: Guard::ValidRustSyntax            │   │

│  │ verdict:   Automated(Guard::ValidRustSyntax) │   │

│  └──────────────────────────────────────────────┘   │

│         │ ✅ Guard passes → proceed                  │

│         ▼                                            │

│  Step 2: Run Tests (LoopUntil)                       │

│  ┌──────────────────────────────────────────────┐   │

│  │ body:      ToolCall { "cargo test" }         │   │

│  │ condition: Guard::TestsPass                  │   │

│  │ max_iter:  10                                │   │

│  │ on_fail:   DelegateAgent("debugger")         │   │

│  └──────────────────────────────────────────────┘   │

│         │ ✅ Tests pass → proceed                    │

│         ▼                                            │

│  Step 3: User Review                                 │

│  ┌──────────────────────────────────────────────┐   │

│  │ action:  UserInput { "Approve this diff?" }  │   │

│  │ verdict: UserApproval                        │   │

│  └──────────────────────────────────────────────┘   │

│         │ ✅ Approved → pipeline succeeds             │

└─────────────────────────────────────────────────────┘

      │

      ▼

  PipelineResult { success: true, cost: $0.002, ... }

  AuditLog [GuardPass, LlmCall, ToolCall, UserApproval]

```

Every guard is **enforced in code** — not just hoped for in a prompt.

---

## Overview

Verdict is a Rust framework designed for building autonomous agents that can be **trusted** to complete complex tasks through hard, verifiable guarantees—not soft prompts and hopes.

Traditional agent frameworks are built around LLM calls + tool definitions. Verdict is different:

- **Guards**: Hard conditions (not soft suggestions) that check preconditions, postconditions, and loop invariants

- **Pipelines**: DAG structures of steps, each with its own guard-driven verdict and scoped tool access

- **Verdicts**: Automated or user-approval gates that decide whether a step succeeded and should proceed

- **Agents**: Reusable agent objects with their own pipelines, tools, skills, and policies—can delegate to each other

- **Registry**: Central coordination for agents, tools (built-in, MCP, local Rust functions), and skills

- **Budget tracking**: Cost control, token limits, and rate limiting built in from the start

- **Evaluation**: Test-driven agent improvement via suites that automatically validate agent quality

- **Self-improvement**: Agents can propose patches to themselves, but only after strict guards and user approval

- **Audit logging**: Full trace of every step, tool call, decision, and cost for compliance and debugging

Verdict runs on **9 phases of evolution**, each phase unlocking new capabilities:

| Phase | Theme | Features |

|-------|-------|----------|

| **1** | Core Pipeline & Guards | Pipeline execution, Guard evaluation, basic Verdicts |

| **2** | Tool Registry & Audit | Tool trait, built-in tools, audit logging, cost tracking |

| **3** | MCP Integration | MCP server support, tool discovery, namespaced tool calls |

| **4** | Agent Delegation | AgentRegistry, DelegateAgent action, delegation policy |

| **5** | Skills | SkillRegistry, Skill definitions, UseSkill action, built-in skills |

| **6** | Built-in Agents | 6 specialist agents (planner, coder, reviewer, debugger, reflector, orchestrator) |

| **7** | Safety & Production | InjectionScanner, SecretScanner, enhanced guards, deployment patterns |

| **8** | Self-Improvement | EvaluationSuite, SelfUpdateEngine, agent versioning & promotion |

| **9** | Advanced Execution | Plugin system, HotReload, RemoteAgent, MonitoringServer, WebUI |

---

## Features

### Core Execution

- ✅ **Pipeline execution** with DAG support and parallel steps

- ✅ **Guard-driven safety** (pre-conditions, post-conditions, loop invariants)

- ✅ **Verdict gates** for automated or user-approval-based step progression

- ✅ **Conditional branching** via `Guard` composition (`AllOf`, `AnyOf`, `Not`)

- ✅ **Loop control** with `LoopUntil` and iteration failure modes

### Agents & Delegation

- ✅ **Agent registry** for centralized agent management

- ✅ **Agent delegation** with depth control, allowlists, and policy inheritance

- ✅ **Agent versioning** for self-improvement tracking

- ✅ **Scoped tool inheritance** (agent → pipeline → step → skill)

### Tools & Resources

- ✅ **Tool registry** (built-in, MCP, local functions, CLI)

- ✅ **MCP (Model Context Protocol)** server integration

- ✅ **Local Rust function tools** via `FunctionTool`

- ✅ **Tool scoping** (ReadOnly, ReadWrite, Allow-list, Deny-list, Intersection, Union)

- ✅ **Tool audit logging** with full call tracing

### Skills & Knowledge

- ✅ **Skill registry** with reusable capabilities

- ✅ **Built-in skills**: rust_debugging, code_review, api_design, test_writing, refactoring

- ✅ **Skill mode selection** (PromptOnly, Pipeline, Auto)

- ✅ **Skill examples & evaluation** for quality assurance

### Safety & Control

- ✅ **Budget tracking** (cost, tokens, rate limits)

- ✅ **Injection detection** (prompt injection & secret detection)

- ✅ **Permission management** (filesystem isolation, network policies)

- ✅ **Workspace isolation** (temp dirs, sandboxing, per-task separation)

- ✅ **Extensive guard library** (50+ guard types covering syntax, output, files, security, delegation)

### Evaluation & Self-Improvement

- ✅ **Evaluation suites** for testing agent quality

- ✅ **Self-update engine** with patching & validation

- ✅ **Automated improvement loops** with guard-gated promotion

- ✅ **Cost-benefit analysis** for self-updates

### Monitoring & Debugging

- ✅ **Comprehensive audit logging** (JSON-serializable events)

- ✅ **Pipeline tracing** with step results and timing

- ✅ **Monitoring server** (HTTP + WebUI)

- ✅ **Hot-reload support** for live agent updates

---

## Installation

Add to your `Cargo.toml`:

```toml

[dependencies]

verdict = "0.1"

tokio = { version = "1", features = ["rt", "rt-multi-thread", "macros"] }

serde_json = "1"

```

---

## Quick Start

Here's a simple pipeline with a coder and reviewer agent delegating to each other:

```rust

use verdict::prelude::*;

use serde_json::json;

#[tokio::main]

async fn main() -> Result<(), Box> {

    // Create a simple pipeline step

    let coding_step = AgentStep {

        name: "write_code".into(),

        guard_in: Guard::None,

        action: StepAction::LlmCall {

            system: "You are a code generator.".into(),

            user: "Write a hello world function in Rust.".into(),

            model: None,

        },

        guard_out: Guard::ValidRustSyntax,

        verdict: Verdict::Automated(Guard::ValidRustSyntax),

        tools: ToolSet::ReadOnly,

        injection_protection: InjectionProtection::Strict,

        output_schema: None,

    };

    let pipeline = Pipeline {

        name: "simple_code_gen".into(),

        steps: vec![coding_step],

        on_failure: FailureMode::Abort,

        max_retries: 1,

    };

    // Create a runner and execute

    let mut runner = PipelineRunner::new();

    let result = runner.run(&pipeline, json!({})).await?;

    println!("Pipeline result: {:?}", result);

    println!("Output: {}", result.output.raw);

    Ok(())

}

```

This example:

1. Defines a single `AgentStep` with an `LlmCall` action

2. Sets up guards: input must pass `Guard::None` (always), output must be `Guard::ValidRustSyntax`

3. Creates a `Pipeline` containing the step

4. Runs it with a `PipelineRunner`

5. Checks the result

---

## Using a Real LLM Provider

Verdict ships with a built-in OpenAI-compatible provider. Any endpoint that speaks the OpenAI chat completions API works — OpenAI, Anthropic via proxy, Ollama, LM Studio, etc.

### From environment variables

```rust

use verdict::prelude::*;

// Reads OPENAI_API_KEY (required), OPENAI_BASE_URL, OPENAI_MODEL from env

let client = LlmClient::from_env()?;

let mut runner = PipelineRunner::new().with_llm_client(Arc::new(client));

```

### Hardcoded provider

```rust

use verdict::prelude::*;

use verdict::llm::OpenAiCompatibleProvider;

use std::sync::Arc;

let provider = OpenAiCompatibleProvider::new(

    "https://api.openai.com".into(),  // base URL (without /v1)

    "sk-your-api-key".into(),

    "gpt-4o".into(),                  // default model

);

let client = Arc::new(LlmClient::new(Arc::new(provider)));

let mut runner = PipelineRunner::new().with_llm_client(client);

```

### Per-step model routing

Each `LlmCall` step can override the model — useful for routing easy tasks to a fast

cheap model and hard tasks to a more capable one:

```rust

use verdict::action::ProviderSpec;

AgentStep {

    action: StepAction::LlmCall {

        system: "You are an expert analyst.".into(),

        user: "Analyse this in depth.".into(),

        model: Some(ProviderSpec {

            model: "claude-opus-4-7".into(),

            provider: "openai-compatible".into(),

        }),

    },

    // ...

}

```

---

## Setting Up in Your Application

Here's a complete step-by-step guide for using Verdict as a library in a real Rust project.

### Step 1: Add to `Cargo.toml`

```toml

[dependencies]

verdict = { path = "./verdict" }  # or from crates.io once published

tokio = { version = "1", features = ["rt", "rt-multi-thread", "macros"] }

serde_json = "1"

```

### Step 2: Create Your Main with Async Runtime

```rust

use verdict::prelude::*;

use serde_json::json;

use std::sync::Arc;

#[tokio::main]

async fn main() -> Result<(), Box> {

    // Set up logging (optional)

    env_logger::init();

    // Create registries

    let tool_registry = Arc::new(ToolRegistry::with_builtins());

    let agent_registry = Arc::new(AgentRegistry::new());

    let skill_registry = Arc::new(SkillRegistry::new());

    // Register agents

    agent_registry.register(coder_agent());

    agent_registry.register(reviewer_agent());

    agent_registry.register(debugger_agent());

    agent_registry.register(planner_agent());

    agent_registry.register(reflector_agent());

    // Create a pipeline

    let my_pipeline = Pipeline {

        name: "test_pipeline".into(),

        steps: vec![

            AgentStep {

                name: "generate_code".into(),

                guard_in: Guard::None,

                action: StepAction::LlmCall {

                    system: "You are a code generator.".into(),

                    user: "Write a function that adds two numbers.".into(),

                    model: None,

                },

                guard_out: Guard::NonEmptyOutput,

                verdict: Verdict::Automated(Guard::NonEmptyOutput),

                tools: ToolSet::ReadOnly,

                injection_protection: InjectionProtection::Strict,

                output_schema: None,

            },

        ],

        on_failure: FailureMode::Abort,

        max_retries: 1,

    };

    // Create a runner

    let mut runner = PipelineRunner::with_registries(

        tool_registry.clone(),

        agent_registry.clone(),

    ).with_skill_registry(skill_registry);

    // Get the planner agent

    let agent = planner_agent();

    // Run the pipeline

    let result = runner.run(

        &my_pipeline,

        &agent,

        json!({}),

    ).await?;

    // Inspect results

    println!("Pipeline completed: {:?}", result.success);

    println!("Output: {}", result.output.raw);

    println!("Cost: ${:.4}", result.cost);

    // Access audit log

    for entry in runner.audit_log.entries() {

        println!("Audit: {:?}", entry.event);

    }

    Ok(())

}

```

### Step 3: Handle Pipeline Results

The `PipelineResult` contains:

```rust

pub struct PipelineResult {

    pub success: bool,

    pub output: StepOutput,

    pub cost: f64,

    pub step_results: HashMap,

}

```

Check the result and take action:

```rust

match runner.run(&pipeline, &agent, input).await {

    Ok(result) => {

        if result.success {

            println!("Pipeline succeeded. Output:\n{}", result.output.raw);

        } else {

            println!("Pipeline failed");

        }

        println!("Total cost: ${:.4}", result.cost);

    }

    Err(e) => {

        eprintln!("Pipeline error: {:?}", e);

    }

}

```

---

## Running the Monitoring Web UI

Verdict includes a built-in `MonitoringServer` that serves:

- An HTML dashboard showing pipeline execution in real-time

- A JSON API for audit log entries

- A JSON API for trace data

### Starting the Monitoring Server

Create the server and run it as a background task:

```rust

use verdict::prelude::*;

use std::net::SocketAddr;

use std::sync::Arc;

#[tokio::main]

async fn main() -> Result<(), Box> {

    // ... your pipeline setup ...

    let mut runner = PipelineRunner::new();

    let agent = planner_agent();

    let pipeline = my_pipeline();

    // Run the pipeline and collect audit log

    let result = runner.run(&pipeline, &agent, input).await?;

    // Create the monitoring server

    let audit_log = runner.audit_log.clone();  // AuditLog is cloneable

    let trace = runner.trace.clone();           // PipelineTrace is cloneable

    let monitoring_server = MonitoringServer::new(audit_log, trace);

    // Spawn the server on a background task

    let server_addr: SocketAddr = "127.0.0.1:8080".parse()?;

    

    tokio::spawn(async move {

        if let Err(e) = monitoring_server.serve(server_addr).await {

            eprintln!("Monitoring server error: {:?}", e);

        }

    });

    // Server is now listening. Open a browser:

    println!("📊 Monitoring UI available at http://127.0.0.1:8080");

    // Let the server run while you do other work

    tokio::time::sleep(tokio::time::Duration::from_secs(300)).await;

    Ok(())

}

```

### MonitoringServer Endpoints

| Endpoint | Method | Purpose |

|----------|--------|---------|

| `/` | GET | HTML dashboard (interactive UI) |

| `/api/entries` | GET | JSON array of audit log entries |

| `/api/trace` | GET | JSON trace object with step timing |

### Example: Full Integration with Monitoring

```rust

#[tokio::main]

async fn main() -> Result<(), Box> {

    // Create registries

    let tool_registry = Arc::new(ToolRegistry::with_builtins());

    let agent_registry = Arc::new(AgentRegistry::new());

    // Register agents

    agent_registry.register(coder_agent());

    agent_registry.register(reviewer_agent());

    // Create runner

    let mut runner = PipelineRunner::with_registries(

        tool_registry,

        agent_registry,

    );

    // Start monitoring server in background

    let audit_log = runner.audit_log.clone();

    let trace = runner.trace.clone();

    

    tokio::spawn(async move {

        let server = MonitoringServer::new(audit_log, trace);

        let addr = "127.0.0.1:8080".parse()?;

        server.serve(addr).await

    });

    println!("✅ Monitoring server started on http://127.0.0.1:8080");

    // Run your pipeline

    let pipeline = my_pipeline();

    let agent = coder_agent();

    let input = json!({"task": "implement feature X"});

    let result = runner.run(&pipeline, &agent, input).await?;

    println!("Pipeline result: {:?}", result.success);

    println!("Check the dashboard at http://127.0.0.1:8080 for details");

    // Keep the server alive

    std::thread::sleep(std::time::Duration::from_secs(600));

    Ok(())

}

```

---

## Core Concepts

### Pipeline & AgentStep

A **Pipeline** is a DAG of **AgentStep**s to be executed sequentially (or with controlled concurrency). Each step has:

- **name**: unique identifier

- **guard_in**: precondition (must pass before execution)

- **action**: what to do (LlmCall, ToolCall, DelegateAgent, LoopUntil, Custom, UserInput, UseSkill, SubPipeline, RemoteAgent)

- **guard_out**: postcondition (output must satisfy)

- **verdict**: decides success (Automated or UserApproval)

- **tools**: scoped tool allowlist

- **injection_protection**: input sanitization level

### Guard & GuardEngine

A **Guard** is a verifiable condition. The **GuardEngine** evaluates guards against step context. Guards include:

- Output validation: `ValidJson`, `ValidRustSyntax`, `MatchesSchema`, `MaxTokens`

- File checks: `FileExists`, `FileContains`, `FormatPass`, `LintPass`, `Compiles`, `TestsPass`

- Security: `NoSecretsInOutput`, `NoPermissionEscalation`, `DiffTouchesAllowedPaths`

- Delegation: `MaxDelegationDepth`, `OnlyAllowedAgentsUsed`

- Composition: `AllOf`, `AnyOf`, `Not`

### Verdict & VerdictEngine

A **Verdict** decides whether a step succeeded. It can be:

- **Automated**: succeeded if a guard passes (e.g., `Verdict::Automated(Guard::TestsPass)`)

- **UserApproval**: wait for human confirmation with optional diff display

- **AllOf** / **AnyOf**: multiple verdicts that must all/any pass

### PipelineRunner

The **PipelineRunner** orchestrates pipeline execution. It:

- Tracks agent, tool, and skill registries

- Manages budget, audit logs, and traces

- Evaluates guards before/after each step

- Handles delegation recursively

- Supports hot-reload of agents/tools via plugins

### StepAction (Overview)

Actions determine what a step does:

- **LlmCall**: talk to an LLM

- **ToolCall**: call a registered tool

- **DelegateAgent**: recursively run another agent

- **SubPipeline**: inline execute a sub-pipeline

- **LoopUntil**: repeat action until guard passes (max iterations)

- **Custom**: arbitrary async Rust closure

- **UserInput**: prompt the user

- **UseSkill**: run/inject a reusable skill

- **RemoteAgent**: call a remote agent (distributed execution)

### AgentRegistry & ToolRegistry

**Registries** are central hubs:

- **AgentRegistry**: maps agent names to `Agent` objects (for delegation)

- **ToolRegistry**: maps tool names to `Tool` trait objects (built-in, MCP, local, external)

- **SkillRegistry**: maps skill names to `Skill` definitions

### AuditLog

Comprehensive logging of every event:

- Step execution start/end

- Guard evaluation pass/fail

- Tool calls with args & results

- Delegation paths

- Cost tracking

- User approvals

- All JSON-serializable for compliance

### EvaluationSuite

Tests that verify agent quality:

- **EvaluationCase**: (input, expected output/schema/guard)

- **EvaluationRunner**: runs cases and scores results

- **Minimum score threshold** for promotion to production

### SelfUpdateEngine

Enables agents to improve themselves:

- Reflect on past failures/costs

- Propose patches to pipeline/guards/tools

- Validate patches via compilation, tests, evaluation

- Require user approval (configurable)

- Version the agent on successful update

### Plugin System

Extend Verdict without recompiling:

- **Plugin trait**: load at runtime

- **PluginRegistry**: manage lifecycle

- **HotReloadHandle**: live update agents/tools

### MonitoringServer

HTTP + WebUI for monitoring:

- Real-time pipeline execution

- Cost dashboards

- Audit log viewer

- Agent health

- Listens on configurable port (default 8080)

---

## Guards Reference

| Guard | Purpose |

|-------|---------|

| `None` | Always passes |

| `ValidJson` | Output is valid JSON |

| `ValidRustSyntax` | Output is syntactically valid Rust |

| `ValidToml` / `ValidYaml` | Config file validation |

| `MatchesSchema(Value)` | JSON Schema validation |

| `Compiles` | Rust code compiles (`cargo check`) |

| `TestsPass` | Tests pass (auto-detected runner) |

| `TestsPassWith(TestRunner)` | Tests pass with explicit runner |

| `LintPass` | Linting passes |

| `FormatPass` | Code formatting correct |

| `FileExists(path)` | File exists |

| `FileContains { path, pattern }` | File contains regex pattern |

| `MaxTokens(n)` | Output ≤ n tokens (cl100k_base) |

| `MaxOutputBytes(n)` | Output ≤ n bytes |

| `MaxLines(n)` | Output ≤ n lines |

| `TimeoutSeconds(s)` | Command finishes within s seconds |

| `NonEmptyOutput` | Output is not empty |

| `NoSecretsInOutput` | No API keys or secrets detected |

| `NoPermissionEscalation` | No privilege escalation |

| `DiffTouchesAllowedPaths(vec)` | Modified files in allowlist |

| `DiffDoesNotTouchForbiddenPaths(vec)` | Modified files not in denylist |

| `StepPassed(name)` | Previous step with name passed |

| `UserApproved(name)` | User approved step with name |

| `AllOf(vec)` | All guards must pass |

| `AnyOf(vec)` | Any guard must pass |

| `Not(box)` | Negate guard |

See `src/guard.rs` for the full list (50+ variants).

---

## StepAction Reference

| Variant | Purpose |

|---------|---------|

| `LlmCall { system, user, model }` | Call an LLM with system & user prompts |

| `ToolCall { tool, args }` | Call a registered tool by name |

| `DelegateAgent { agent, input, policy, ... }` | Recursively run another agent |

| `SubPipeline(pipeline)` | Inline execute a sub-pipeline |

| `LoopUntil { body, condition, max_iterations, ... }` | Repeat action until condition met |

| `UseSkill { skill, input, mode }` | Run or inject a reusable skill |

| `UserInput { prompt, schema }` | Prompt user for input |

| `Custom(fn)` | Call arbitrary async Rust closure |

| `RemoteAgent { url, agent, input, ... }` | Call a distributed agent via HTTP |

---

## Phase Roadmap

All 10 phases are **complete** ✅:

1. ✅ **Phase 1: Core Pipeline & Guards** — Basic execution, guard evaluation

2. ✅ **Phase 2: Tool Registry & Audit** — Tool trait, built-in tools, audit logging

3. ✅ **Phase 3: MCP Integration** — Model Context Protocol server support

4. ✅ **Phase 4: Agent Delegation** — AgentRegistry, delegation policy, recursive execution

5. ✅ **Phase 5: Skills** — SkillRegistry, reusable capabilities, built-in skills

6. ✅ **Phase 6: Built-in Agents** — 6 specialist agents (planner, coder, reviewer, debugger, reflector, orchestrator)

7. ✅ **Phase 7: Safety & Production** — Injection detection, secret detection, enhanced guards

8. ✅ **Phase 8: Self-Improvement** — EvaluationSuite, SelfUpdateEngine, agent versioning

9. ✅ **Phase 9: Advanced Execution** — Plugin system, hot-reload, remote agents, monitoring server

10. ✅ **Phase 10: Stub Completion** — Real LLM provider, HTTP tool, MCP JSON-RPC, TOML/YAML guard parsing

---

## Example: TDD Loop

Here's a real-world example using `LoopUntil` to implement test-driven development:

```rust

use verdict::prelude::*;

use serde_json::json;

let tdd_loop = StepAction::LoopUntil {

    body: Box::new(StepAction::SubPipeline(Pipeline {

        name: "tdd_iteration".into(),

        steps: vec![

            AgentStep {

                name: "write_or_fix_code".into(),

                guard_in: Guard::None,

                action: StepAction::LlmCall {

                    system: "Fix failing tests.".into(),

                    user: "Failing tests:\n{test_output}".into(),

                    model: None,

                },

                guard_out: Guard::ValidRustSyntax,

                verdict: Verdict::Automated(Guard::ValidRustSyntax),

                tools: ToolSet::Allow(vec!["fs.write".into()]),

                injection_protection: InjectionProtection::Strict,

                output_schema: None,

            },

            AgentStep {

                name: "run_tests".into(),

                guard_in: Guard::None,

                action: StepAction::ToolCall {

                    tool: "shell.cargo_test".into(),

                    args: json!({}),

                },

                guard_out: Guard::NonEmptyOutput,

                verdict: Verdict::Automated(Guard::NonEmptyOutput),

                tools: ToolSet::Allow(vec!["shell.cargo_test".into()]),

                injection_protection: InjectionProtection::Strict,

                output_schema: None,

            },

        ],

        on_failure: FailureMode::Abort,

        max_retries: 0,

    })),

    condition: Guard::TestsPass,

    max_iterations: 10,

    on_iteration_failure: IterationFailureMode::Retry,

};

```

This loop:

- Repeats up to 10 times

- Each iteration: (1) LLM writes/fixes code, (2) runs tests

- Exits when `Guard::TestsPass` succeeds

- On iteration failure, retries immediately

- Prevents infinite loops via `max_iterations`

---

## Built-in Agents

Verdict ships with 6 specialist agents (defined in `src/agents/`):

| Agent | Role | Default Tools |

|-------|------|----------------|

| **planner** | Breaks down tasks into steps | ReadOnly |

| **coder** | Implements code changes | ReadWrite |

| **reviewer** | Reviews code quality | ReadOnly |

| **debugger** | Fixes compilation/test failures | ReadWrite |

| **reflector** | Analyzes agent performance | ReadOnly |

| **orchestrator** | Coordinates multi-agent workflows | ReadOnly |

Each agent has its own pipeline, policy, and allowed tool scope. Agents can delegate to each other (with depth limits).

---

## Built-in Skills

Verdict includes 5 built-in skills (in `src/skills/builtin/`):

- **rust_debugging**: Fix Rust compile/test failures

- **code_review**: Review code for quality & security

- **api_design**: Design clean APIs

- **test_writing**: Generate comprehensive tests

- **refactoring**: Refactor code safely

Skills can be injected into LLM prompts or run as sub-pipelines.

---

## Testing

Run all tests:

```bash

cargo test

```

Run a specific phase:

```bash

cargo test --test phase1

cargo test --test phase2

# ... up to phase9

```

Each phase file tests the corresponding set of features in isolation.

---

## Contributing

Verdict is designed to be extended. You can:

1. Add new guards to `src/guard.rs`

2. Add new built-in tools to `src/tools/`

3. Register custom agents in `AgentRegistry`

4. Create custom skills in `src/skills/`

5. Write plugins implementing the `Plugin` trait

See the architecture document for detailed design decisions.

---

## License

MIT (see LICENSE file for details)

---

## Examples

Two standalone example projects demonstrate Verdict in action:

### [verdict-demo](https://github.com/eliasstepanik/verdict-demo)

A showcase binary with 9 subcommands, each demonstrating a different feature of the framework:

| Command | Demonstrates |

|---------|-------------|

| `pipeline` | Pipeline structure, guard enforcement, graceful LLM-absent failure |

| `agents` | AgentRegistry, built-in agent introspection |

| `guards` | All major guard types with real TOML/YAML/JSON/secrets parsing |

| `tools` | FunctionTool, ToolRegistry, ToolSet scoping |

| `audit` | AuditLog, InjectionScanner, SecretScanner |

| `eval` | EvaluationSuite with Custom closure evaluation |

| `budget` | BudgetTracker exhaustion, RateLimiter |

| `monitor` | MonitoringServer on `http://127.0.0.1:9001` |

| `live` | **Real 3-step LLM pipeline** — Haiku drafts → Sonnet refines → Opus critiques |

```bash

git clone https://github.com/eliasstepanik/verdict-demo

cd verdict-demo

cargo run -- guards    # no LLM needed

cargo run -- live      # requires an OpenAI-compatible endpoint

```

---

### [verdict-micro-agent](https://github.com/eliasstepanik/verdict-micro-agent)

A Micro Agent implementation — give it a natural-language function description and it

generates Python code using a TDD loop: generate tests → write code → run → fix → repeat.

```bash

git clone https://github.com/eliasstepanik/verdict-micro-agent

cd verdict-micro-agent

cargo run -- "Write a Python function that checks if a number is prime"

```

The agent routes across three models based on task difficulty:

- **Claude Haiku** — fast first code attempt

- **Claude Sonnet** — fixes on iterations 1–2

- **Claude Opus** — deep debugging on iterations 3+

Loop exits as soon as all tests pass. Typical run: 1–2 iterations.

---

### [verdict-code](https://github.com/eliasstepanik/verdict-code)

An interactive opencode-like CLI assistant that demonstrates all major verdict features

in a single runnable project: `Pipeline`, `Guard`, `Verdict`, `ToolRegistry`, `SkillRegistry`,

`FunctionTool`, `StepAction::ToolCall`, and `StepAction::UseSkill`.

Every user message runs through a real verdict pipeline. Slash commands let you exercise

tools and skills directly:

| Command | Demonstrates |

|---------|-------------|

| `/tools` | `ToolRegistry`, `FunctionTool`, `StepAction::ToolCall`, `ToolSet::Allow` |

| `/skills` | `SkillRegistry`, built-in skills, `StepAction::UseSkill` |

| type "count words" | triggers a 4th `ToolCall` step automatically in the pipeline |

```bash

git clone https://github.com/eliasstepanik/verdict-code

cd verdict-code

# Edit BASE_URL and API_KEY in src/main.rs

cargo run

```

---

## References

- **Architecture**: Read `architecture.md` for the full design and extended examples

- **How-to guide**: Read `how_to.md` for a field-by-field reference of every `AgentStep` option

- **API Docs**: `cargo doc --open` to browse generated Rust docs

- **Tests**: See `tests/phase*.rs` for working examples of all features

---

**Built with Rust 🦀 | Designed for safety, auditability, and self-improvement.**
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/eliasstepanik/verdict

Awesome Lists containing this project

README