{"id":51119833,"url":"https://github.com/eliasstepanik/verdict","last_synced_at":"2026-06-25T01:01:12.666Z","repository":{"id":363902840,"uuid":"1265131632","full_name":"eliasstepanik/verdict","owner":"eliasstepanik","description":"Guarded agent runtime for Rust","archived":false,"fork":false,"pushed_at":"2026-06-10T21:00:13.000Z","size":270963,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-06-10T22:11:26.697Z","etag":null,"topics":["agent","ai-agents","ai-safety","autonomous-agents","guardrails","llm","llm-framework","mcp","pipeline","rust"],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eliasstepanik.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-10T13:47:42.000Z","updated_at":"2026-06-10T21:01:00.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/eliasstepanik/verdict","commit_stats":null,"previous_names":["eliasstepanik/verdict"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/eliasstepanik/verdict","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eliasstepanik%2Fverdict","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eliasstepanik%2Fverdict/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eliasstepanik%2Fverdict/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eliasstepanik%2Fverdict/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eliasstepanik","download_url":"https://codeload.github.com/eliasstepanik/verdict/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eliasstepanik%2Fverdict/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34755063,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-24T02:00:07.484Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","ai-agents","ai-safety","autonomous-agents","guardrails","llm","llm-framework","mcp","pipeline","rust"],"created_at":"2026-06-25T01:01:11.590Z","updated_at":"2026-06-25T01:01:12.646Z","avatar_url":"https://github.com/eliasstepanik.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"verdict\n\n**A Rust framework for building agents that actually complete their work through code-enforced structure, guarded execution, and composable pipelines.**\n\n![Social Preview](.github/social-preview.png)\n\n[![Rust](https://img.shields.io/badge/rust-1.70+-orange.svg)](https://www.rust-lang.org/)\n[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)\n[![Status](https://img.shields.io/badge/status-10%20phases%20complete-brightgreen.svg)]()\n\n## How It Works\n\n```\nUser Task Input\n      │\n      ▼\n┌─────────────────────────────────────────────────────┐\n│                     Pipeline                         │\n│                                                      │\n│  Step 1: Generate Code                               │\n│  ┌──────────────────────────────────────────────┐   │\n│  │ guard_in:  Guard::None  ✓                    │   │\n│  │ action:    LlmCall { \"write hello world\" }   │   │\n│  │ guard_out: Guard::ValidRustSyntax            │   │\n│  │ verdict:   Automated(Guard::ValidRustSyntax) │   │\n│  └──────────────────────────────────────────────┘   │\n│         │ ✅ Guard passes → proceed                  │\n│         ▼                                            │\n│  Step 2: Run Tests (LoopUntil)                       │\n│  ┌──────────────────────────────────────────────┐   │\n│  │ body:      ToolCall { \"cargo test\" }         │   │\n│  │ condition: Guard::TestsPass                  │   │\n│  │ max_iter:  10                                │   │\n│  │ on_fail:   DelegateAgent(\"debugger\")         │   │\n│  └──────────────────────────────────────────────┘   │\n│         │ ✅ Tests pass → proceed                    │\n│         ▼                                            │\n│  Step 3: User Review                                 │\n│  ┌──────────────────────────────────────────────┐   │\n│  │ action:  UserInput { \"Approve this diff?\" }  │   │\n│  │ verdict: UserApproval                        │   │\n│  └──────────────────────────────────────────────┘   │\n│         │ ✅ Approved → pipeline succeeds             │\n└─────────────────────────────────────────────────────┘\n      │\n      ▼\n  PipelineResult { success: true, cost: $0.002, ... }\n  AuditLog [GuardPass, LlmCall, ToolCall, UserApproval]\n```\n\nEvery guard is **enforced in code** — not just hoped for in a prompt.\n\n---\n\n## Overview\n\nVerdict is a Rust framework designed for building autonomous agents that can be **trusted** to complete complex tasks through hard, verifiable guarantees—not soft prompts and hopes.\n\nTraditional agent frameworks are built around LLM calls + tool definitions. Verdict is different:\n\n- **Guards**: Hard conditions (not soft suggestions) that check preconditions, postconditions, and loop invariants\n- **Pipelines**: DAG structures of steps, each with its own guard-driven verdict and scoped tool access\n- **Verdicts**: Automated or user-approval gates that decide whether a step succeeded and should proceed\n- **Agents**: Reusable agent objects with their own pipelines, tools, skills, and policies—can delegate to each other\n- **Registry**: Central coordination for agents, tools (built-in, MCP, local Rust functions), and skills\n- **Budget tracking**: Cost control, token limits, and rate limiting built in from the start\n- **Evaluation**: Test-driven agent improvement via suites that automatically validate agent quality\n- **Self-improvement**: Agents can propose patches to themselves, but only after strict guards and user approval\n- **Audit logging**: Full trace of every step, tool call, decision, and cost for compliance and debugging\n\nVerdict runs on **9 phases of evolution**, each phase unlocking new capabilities:\n\n| Phase | Theme | Features |\n|-------|-------|----------|\n| **1** | Core Pipeline \u0026 Guards | Pipeline execution, Guard evaluation, basic Verdicts |\n| **2** | Tool Registry \u0026 Audit | Tool trait, built-in tools, audit logging, cost tracking |\n| **3** | MCP Integration | MCP server support, tool discovery, namespaced tool calls |\n| **4** | Agent Delegation | AgentRegistry, DelegateAgent action, delegation policy |\n| **5** | Skills | SkillRegistry, Skill definitions, UseSkill action, built-in skills |\n| **6** | Built-in Agents | 6 specialist agents (planner, coder, reviewer, debugger, reflector, orchestrator) |\n| **7** | Safety \u0026 Production | InjectionScanner, SecretScanner, enhanced guards, deployment patterns |\n| **8** | Self-Improvement | EvaluationSuite, SelfUpdateEngine, agent versioning \u0026 promotion |\n| **9** | Advanced Execution | Plugin system, HotReload, RemoteAgent, MonitoringServer, WebUI |\n\n---\n\n## Features\n\n### Core Execution\n- ✅ **Pipeline execution** with DAG support and parallel steps\n- ✅ **Guard-driven safety** (pre-conditions, post-conditions, loop invariants)\n- ✅ **Verdict gates** for automated or user-approval-based step progression\n- ✅ **Conditional branching** via `Guard` composition (`AllOf`, `AnyOf`, `Not`)\n- ✅ **Loop control** with `LoopUntil` and iteration failure modes\n\n### Agents \u0026 Delegation\n- ✅ **Agent registry** for centralized agent management\n- ✅ **Agent delegation** with depth control, allowlists, and policy inheritance\n- ✅ **Agent versioning** for self-improvement tracking\n- ✅ **Scoped tool inheritance** (agent → pipeline → step → skill)\n\n### Tools \u0026 Resources\n- ✅ **Tool registry** (built-in, MCP, local functions, CLI)\n- ✅ **MCP (Model Context Protocol)** server integration\n- ✅ **Local Rust function tools** via `FunctionTool`\n- ✅ **Tool scoping** (ReadOnly, ReadWrite, Allow-list, Deny-list, Intersection, Union)\n- ✅ **Tool audit logging** with full call tracing\n\n### Skills \u0026 Knowledge\n- ✅ **Skill registry** with reusable capabilities\n- ✅ **Built-in skills**: rust_debugging, code_review, api_design, test_writing, refactoring\n- ✅ **Skill mode selection** (PromptOnly, Pipeline, Auto)\n- ✅ **Skill examples \u0026 evaluation** for quality assurance\n\n### Safety \u0026 Control\n- ✅ **Budget tracking** (cost, tokens, rate limits)\n- ✅ **Injection detection** (prompt injection \u0026 secret detection)\n- ✅ **Permission management** (filesystem isolation, network policies)\n- ✅ **Workspace isolation** (temp dirs, sandboxing, per-task separation)\n- ✅ **Extensive guard library** (50+ guard types covering syntax, output, files, security, delegation)\n\n### Evaluation \u0026 Self-Improvement\n- ✅ **Evaluation suites** for testing agent quality\n- ✅ **Self-update engine** with patching \u0026 validation\n- ✅ **Automated improvement loops** with guard-gated promotion\n- ✅ **Cost-benefit analysis** for self-updates\n\n### Monitoring \u0026 Debugging\n- ✅ **Comprehensive audit logging** (JSON-serializable events)\n- ✅ **Pipeline tracing** with step results and timing\n- ✅ **Monitoring server** (HTTP + WebUI)\n- ✅ **Hot-reload support** for live agent updates\n\n---\n\n## Installation\n\nAdd to your `Cargo.toml`:\n\n```toml\n[dependencies]\nverdict = \"0.1\"\ntokio = { version = \"1\", features = [\"rt\", \"rt-multi-thread\", \"macros\"] }\nserde_json = \"1\"\n```\n\n---\n\n## Quick Start\n\nHere's a simple pipeline with a coder and reviewer agent delegating to each other:\n\n```rust\nuse verdict::prelude::*;\nuse serde_json::json;\n\n#[tokio::main]\nasync fn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    // Create a simple pipeline step\n    let coding_step = AgentStep {\n        name: \"write_code\".into(),\n        guard_in: Guard::None,\n        action: StepAction::LlmCall {\n            system: \"You are a code generator.\".into(),\n            user: \"Write a hello world function in Rust.\".into(),\n            model: None,\n        },\n        guard_out: Guard::ValidRustSyntax,\n        verdict: Verdict::Automated(Guard::ValidRustSyntax),\n        tools: ToolSet::ReadOnly,\n        injection_protection: InjectionProtection::Strict,\n        output_schema: None,\n    };\n\n    let pipeline = Pipeline {\n        name: \"simple_code_gen\".into(),\n        steps: vec![coding_step],\n        on_failure: FailureMode::Abort,\n        max_retries: 1,\n    };\n\n    // Create a runner and execute\n    let mut runner = PipelineRunner::new();\n    let result = runner.run(\u0026pipeline, json!({})).await?;\n\n    println!(\"Pipeline result: {:?}\", result);\n    println!(\"Output: {}\", result.output.raw);\n\n    Ok(())\n}\n```\n\nThis example:\n1. Defines a single `AgentStep` with an `LlmCall` action\n2. Sets up guards: input must pass `Guard::None` (always), output must be `Guard::ValidRustSyntax`\n3. Creates a `Pipeline` containing the step\n4. Runs it with a `PipelineRunner`\n5. Checks the result\n\n---\n\n## Using a Real LLM Provider\n\nVerdict ships with a built-in OpenAI-compatible provider. Any endpoint that speaks the OpenAI chat completions API works — OpenAI, Anthropic via proxy, Ollama, LM Studio, etc.\n\n### From environment variables\n\n```rust\nuse verdict::prelude::*;\n\n// Reads OPENAI_API_KEY (required), OPENAI_BASE_URL, OPENAI_MODEL from env\nlet client = LlmClient::from_env()?;\nlet mut runner = PipelineRunner::new().with_llm_client(Arc::new(client));\n```\n\n### Hardcoded provider\n\n```rust\nuse verdict::prelude::*;\nuse verdict::llm::OpenAiCompatibleProvider;\nuse std::sync::Arc;\n\nlet provider = OpenAiCompatibleProvider::new(\n    \"https://api.openai.com\".into(),  // base URL (without /v1)\n    \"sk-your-api-key\".into(),\n    \"gpt-4o\".into(),                  // default model\n);\nlet client = Arc::new(LlmClient::new(Arc::new(provider)));\nlet mut runner = PipelineRunner::new().with_llm_client(client);\n```\n\n### Per-step model routing\n\nEach `LlmCall` step can override the model — useful for routing easy tasks to a fast\ncheap model and hard tasks to a more capable one:\n\n```rust\nuse verdict::action::ProviderSpec;\n\nAgentStep {\n    action: StepAction::LlmCall {\n        system: \"You are an expert analyst.\".into(),\n        user: \"Analyse this in depth.\".into(),\n        model: Some(ProviderSpec {\n            model: \"claude-opus-4-7\".into(),\n            provider: \"openai-compatible\".into(),\n        }),\n    },\n    // ...\n}\n```\n\n---\n\n## Setting Up in Your Application\n\nHere's a complete step-by-step guide for using Verdict as a library in a real Rust project.\n\n### Step 1: Add to `Cargo.toml`\n\n```toml\n[dependencies]\nverdict = { path = \"./verdict\" }  # or from crates.io once published\ntokio = { version = \"1\", features = [\"rt\", \"rt-multi-thread\", \"macros\"] }\nserde_json = \"1\"\n```\n\n### Step 2: Create Your Main with Async Runtime\n\n```rust\nuse verdict::prelude::*;\nuse serde_json::json;\nuse std::sync::Arc;\n\n#[tokio::main]\nasync fn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    // Set up logging (optional)\n    env_logger::init();\n\n    // Create registries\n    let tool_registry = Arc::new(ToolRegistry::with_builtins());\n    let agent_registry = Arc::new(AgentRegistry::new());\n    let skill_registry = Arc::new(SkillRegistry::new());\n\n    // Register agents\n    agent_registry.register(coder_agent());\n    agent_registry.register(reviewer_agent());\n    agent_registry.register(debugger_agent());\n    agent_registry.register(planner_agent());\n    agent_registry.register(reflector_agent());\n\n    // Create a pipeline\n    let my_pipeline = Pipeline {\n        name: \"test_pipeline\".into(),\n        steps: vec![\n            AgentStep {\n                name: \"generate_code\".into(),\n                guard_in: Guard::None,\n                action: StepAction::LlmCall {\n                    system: \"You are a code generator.\".into(),\n                    user: \"Write a function that adds two numbers.\".into(),\n                    model: None,\n                },\n                guard_out: Guard::NonEmptyOutput,\n                verdict: Verdict::Automated(Guard::NonEmptyOutput),\n                tools: ToolSet::ReadOnly,\n                injection_protection: InjectionProtection::Strict,\n                output_schema: None,\n            },\n        ],\n        on_failure: FailureMode::Abort,\n        max_retries: 1,\n    };\n\n    // Create a runner\n    let mut runner = PipelineRunner::with_registries(\n        tool_registry.clone(),\n        agent_registry.clone(),\n    ).with_skill_registry(skill_registry);\n\n    // Get the planner agent\n    let agent = planner_agent();\n\n    // Run the pipeline\n    let result = runner.run(\n        \u0026my_pipeline,\n        \u0026agent,\n        json!({}),\n    ).await?;\n\n    // Inspect results\n    println!(\"Pipeline completed: {:?}\", result.success);\n    println!(\"Output: {}\", result.output.raw);\n    println!(\"Cost: ${:.4}\", result.cost);\n\n    // Access audit log\n    for entry in runner.audit_log.entries() {\n        println!(\"Audit: {:?}\", entry.event);\n    }\n\n    Ok(())\n}\n```\n\n### Step 3: Handle Pipeline Results\n\nThe `PipelineResult` contains:\n\n```rust\npub struct PipelineResult {\n    pub success: bool,\n    pub output: StepOutput,\n    pub cost: f64,\n    pub step_results: HashMap\u003cString, StepResult\u003e,\n}\n```\n\nCheck the result and take action:\n\n```rust\nmatch runner.run(\u0026pipeline, \u0026agent, input).await {\n    Ok(result) =\u003e {\n        if result.success {\n            println!(\"Pipeline succeeded. Output:\\n{}\", result.output.raw);\n        } else {\n            println!(\"Pipeline failed\");\n        }\n        println!(\"Total cost: ${:.4}\", result.cost);\n    }\n    Err(e) =\u003e {\n        eprintln!(\"Pipeline error: {:?}\", e);\n    }\n}\n```\n\n---\n\n## Running the Monitoring Web UI\n\nVerdict includes a built-in `MonitoringServer` that serves:\n- An HTML dashboard showing pipeline execution in real-time\n- A JSON API for audit log entries\n- A JSON API for trace data\n\n### Starting the Monitoring Server\n\nCreate the server and run it as a background task:\n\n```rust\nuse verdict::prelude::*;\nuse std::net::SocketAddr;\nuse std::sync::Arc;\n\n#[tokio::main]\nasync fn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    // ... your pipeline setup ...\n\n    let mut runner = PipelineRunner::new();\n    let agent = planner_agent();\n    let pipeline = my_pipeline();\n\n    // Run the pipeline and collect audit log\n    let result = runner.run(\u0026pipeline, \u0026agent, input).await?;\n\n    // Create the monitoring server\n    let audit_log = runner.audit_log.clone();  // AuditLog is cloneable\n    let trace = runner.trace.clone();           // PipelineTrace is cloneable\n\n    let monitoring_server = MonitoringServer::new(audit_log, trace);\n\n    // Spawn the server on a background task\n    let server_addr: SocketAddr = \"127.0.0.1:8080\".parse()?;\n    \n    tokio::spawn(async move {\n        if let Err(e) = monitoring_server.serve(server_addr).await {\n            eprintln!(\"Monitoring server error: {:?}\", e);\n        }\n    });\n\n    // Server is now listening. Open a browser:\n    println!(\"📊 Monitoring UI available at http://127.0.0.1:8080\");\n\n    // Let the server run while you do other work\n    tokio::time::sleep(tokio::time::Duration::from_secs(300)).await;\n\n    Ok(())\n}\n```\n\n### MonitoringServer Endpoints\n\n| Endpoint | Method | Purpose |\n|----------|--------|---------|\n| `/` | GET | HTML dashboard (interactive UI) |\n| `/api/entries` | GET | JSON array of audit log entries |\n| `/api/trace` | GET | JSON trace object with step timing |\n\n### Example: Full Integration with Monitoring\n\n```rust\n#[tokio::main]\nasync fn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    // Create registries\n    let tool_registry = Arc::new(ToolRegistry::with_builtins());\n    let agent_registry = Arc::new(AgentRegistry::new());\n\n    // Register agents\n    agent_registry.register(coder_agent());\n    agent_registry.register(reviewer_agent());\n\n    // Create runner\n    let mut runner = PipelineRunner::with_registries(\n        tool_registry,\n        agent_registry,\n    );\n\n    // Start monitoring server in background\n    let audit_log = runner.audit_log.clone();\n    let trace = runner.trace.clone();\n    \n    tokio::spawn(async move {\n        let server = MonitoringServer::new(audit_log, trace);\n        let addr = \"127.0.0.1:8080\".parse()?;\n        server.serve(addr).await\n    });\n\n    println!(\"✅ Monitoring server started on http://127.0.0.1:8080\");\n\n    // Run your pipeline\n    let pipeline = my_pipeline();\n    let agent = coder_agent();\n    let input = json!({\"task\": \"implement feature X\"});\n\n    let result = runner.run(\u0026pipeline, \u0026agent, input).await?;\n\n    println!(\"Pipeline result: {:?}\", result.success);\n    println!(\"Check the dashboard at http://127.0.0.1:8080 for details\");\n\n    // Keep the server alive\n    std::thread::sleep(std::time::Duration::from_secs(600));\n\n    Ok(())\n}\n```\n\n---\n\n## Core Concepts\n\n### Pipeline \u0026 AgentStep\nA **Pipeline** is a DAG of **AgentStep**s to be executed sequentially (or with controlled concurrency). Each step has:\n- **name**: unique identifier\n- **guard_in**: precondition (must pass before execution)\n- **action**: what to do (LlmCall, ToolCall, DelegateAgent, LoopUntil, Custom, UserInput, UseSkill, SubPipeline, RemoteAgent)\n- **guard_out**: postcondition (output must satisfy)\n- **verdict**: decides success (Automated or UserApproval)\n- **tools**: scoped tool allowlist\n- **injection_protection**: input sanitization level\n\n### Guard \u0026 GuardEngine\nA **Guard** is a verifiable condition. The **GuardEngine** evaluates guards against step context. Guards include:\n- Output validation: `ValidJson`, `ValidRustSyntax`, `MatchesSchema`, `MaxTokens`\n- File checks: `FileExists`, `FileContains`, `FormatPass`, `LintPass`, `Compiles`, `TestsPass`\n- Security: `NoSecretsInOutput`, `NoPermissionEscalation`, `DiffTouchesAllowedPaths`\n- Delegation: `MaxDelegationDepth`, `OnlyAllowedAgentsUsed`\n- Composition: `AllOf`, `AnyOf`, `Not`\n\n### Verdict \u0026 VerdictEngine\nA **Verdict** decides whether a step succeeded. It can be:\n- **Automated**: succeeded if a guard passes (e.g., `Verdict::Automated(Guard::TestsPass)`)\n- **UserApproval**: wait for human confirmation with optional diff display\n- **AllOf** / **AnyOf**: multiple verdicts that must all/any pass\n\n### PipelineRunner\nThe **PipelineRunner** orchestrates pipeline execution. It:\n- Tracks agent, tool, and skill registries\n- Manages budget, audit logs, and traces\n- Evaluates guards before/after each step\n- Handles delegation recursively\n- Supports hot-reload of agents/tools via plugins\n\n### StepAction (Overview)\nActions determine what a step does:\n- **LlmCall**: talk to an LLM\n- **ToolCall**: call a registered tool\n- **DelegateAgent**: recursively run another agent\n- **SubPipeline**: inline execute a sub-pipeline\n- **LoopUntil**: repeat action until guard passes (max iterations)\n- **Custom**: arbitrary async Rust closure\n- **UserInput**: prompt the user\n- **UseSkill**: run/inject a reusable skill\n- **RemoteAgent**: call a remote agent (distributed execution)\n\n### AgentRegistry \u0026 ToolRegistry\n**Registries** are central hubs:\n- **AgentRegistry**: maps agent names to `Agent` objects (for delegation)\n- **ToolRegistry**: maps tool names to `Tool` trait objects (built-in, MCP, local, external)\n- **SkillRegistry**: maps skill names to `Skill` definitions\n\n### AuditLog\nComprehensive logging of every event:\n- Step execution start/end\n- Guard evaluation pass/fail\n- Tool calls with args \u0026 results\n- Delegation paths\n- Cost tracking\n- User approvals\n- All JSON-serializable for compliance\n\n### EvaluationSuite\nTests that verify agent quality:\n- **EvaluationCase**: (input, expected output/schema/guard)\n- **EvaluationRunner**: runs cases and scores results\n- **Minimum score threshold** for promotion to production\n\n### SelfUpdateEngine\nEnables agents to improve themselves:\n- Reflect on past failures/costs\n- Propose patches to pipeline/guards/tools\n- Validate patches via compilation, tests, evaluation\n- Require user approval (configurable)\n- Version the agent on successful update\n\n### Plugin System\nExtend Verdict without recompiling:\n- **Plugin trait**: load at runtime\n- **PluginRegistry**: manage lifecycle\n- **HotReloadHandle**: live update agents/tools\n\n### MonitoringServer\nHTTP + WebUI for monitoring:\n- Real-time pipeline execution\n- Cost dashboards\n- Audit log viewer\n- Agent health\n- Listens on configurable port (default 8080)\n\n---\n\n## Guards Reference\n\n| Guard | Purpose |\n|-------|---------|\n| `None` | Always passes |\n| `ValidJson` | Output is valid JSON |\n| `ValidRustSyntax` | Output is syntactically valid Rust |\n| `ValidToml` / `ValidYaml` | Config file validation |\n| `MatchesSchema(Value)` | JSON Schema validation |\n| `Compiles` | Rust code compiles (`cargo check`) |\n| `TestsPass` | Tests pass (auto-detected runner) |\n| `TestsPassWith(TestRunner)` | Tests pass with explicit runner |\n| `LintPass` | Linting passes |\n| `FormatPass` | Code formatting correct |\n| `FileExists(path)` | File exists |\n| `FileContains { path, pattern }` | File contains regex pattern |\n| `MaxTokens(n)` | Output ≤ n tokens (cl100k_base) |\n| `MaxOutputBytes(n)` | Output ≤ n bytes |\n| `MaxLines(n)` | Output ≤ n lines |\n| `TimeoutSeconds(s)` | Command finishes within s seconds |\n| `NonEmptyOutput` | Output is not empty |\n| `NoSecretsInOutput` | No API keys or secrets detected |\n| `NoPermissionEscalation` | No privilege escalation |\n| `DiffTouchesAllowedPaths(vec)` | Modified files in allowlist |\n| `DiffDoesNotTouchForbiddenPaths(vec)` | Modified files not in denylist |\n| `StepPassed(name)` | Previous step with name passed |\n| `UserApproved(name)` | User approved step with name |\n| `AllOf(vec)` | All guards must pass |\n| `AnyOf(vec)` | Any guard must pass |\n| `Not(box)` | Negate guard |\n\nSee `src/guard.rs` for the full list (50+ variants).\n\n---\n\n## StepAction Reference\n\n| Variant | Purpose |\n|---------|---------|\n| `LlmCall { system, user, model }` | Call an LLM with system \u0026 user prompts |\n| `ToolCall { tool, args }` | Call a registered tool by name |\n| `DelegateAgent { agent, input, policy, ... }` | Recursively run another agent |\n| `SubPipeline(pipeline)` | Inline execute a sub-pipeline |\n| `LoopUntil { body, condition, max_iterations, ... }` | Repeat action until condition met |\n| `UseSkill { skill, input, mode }` | Run or inject a reusable skill |\n| `UserInput { prompt, schema }` | Prompt user for input |\n| `Custom(fn)` | Call arbitrary async Rust closure |\n| `RemoteAgent { url, agent, input, ... }` | Call a distributed agent via HTTP |\n\n---\n\n## Phase Roadmap\n\nAll 10 phases are **complete** ✅:\n\n1. ✅ **Phase 1: Core Pipeline \u0026 Guards** — Basic execution, guard evaluation\n2. ✅ **Phase 2: Tool Registry \u0026 Audit** — Tool trait, built-in tools, audit logging\n3. ✅ **Phase 3: MCP Integration** — Model Context Protocol server support\n4. ✅ **Phase 4: Agent Delegation** — AgentRegistry, delegation policy, recursive execution\n5. ✅ **Phase 5: Skills** — SkillRegistry, reusable capabilities, built-in skills\n6. ✅ **Phase 6: Built-in Agents** — 6 specialist agents (planner, coder, reviewer, debugger, reflector, orchestrator)\n7. ✅ **Phase 7: Safety \u0026 Production** — Injection detection, secret detection, enhanced guards\n8. ✅ **Phase 8: Self-Improvement** — EvaluationSuite, SelfUpdateEngine, agent versioning\n9. ✅ **Phase 9: Advanced Execution** — Plugin system, hot-reload, remote agents, monitoring server\n10. ✅ **Phase 10: Stub Completion** — Real LLM provider, HTTP tool, MCP JSON-RPC, TOML/YAML guard parsing\n\n---\n\n## Example: TDD Loop\n\nHere's a real-world example using `LoopUntil` to implement test-driven development:\n\n```rust\nuse verdict::prelude::*;\nuse serde_json::json;\n\nlet tdd_loop = StepAction::LoopUntil {\n    body: Box::new(StepAction::SubPipeline(Pipeline {\n        name: \"tdd_iteration\".into(),\n        steps: vec![\n            AgentStep {\n                name: \"write_or_fix_code\".into(),\n                guard_in: Guard::None,\n                action: StepAction::LlmCall {\n                    system: \"Fix failing tests.\".into(),\n                    user: \"Failing tests:\\n{test_output}\".into(),\n                    model: None,\n                },\n                guard_out: Guard::ValidRustSyntax,\n                verdict: Verdict::Automated(Guard::ValidRustSyntax),\n                tools: ToolSet::Allow(vec![\"fs.write\".into()]),\n                injection_protection: InjectionProtection::Strict,\n                output_schema: None,\n            },\n            AgentStep {\n                name: \"run_tests\".into(),\n                guard_in: Guard::None,\n                action: StepAction::ToolCall {\n                    tool: \"shell.cargo_test\".into(),\n                    args: json!({}),\n                },\n                guard_out: Guard::NonEmptyOutput,\n                verdict: Verdict::Automated(Guard::NonEmptyOutput),\n                tools: ToolSet::Allow(vec![\"shell.cargo_test\".into()]),\n                injection_protection: InjectionProtection::Strict,\n                output_schema: None,\n            },\n        ],\n        on_failure: FailureMode::Abort,\n        max_retries: 0,\n    })),\n    condition: Guard::TestsPass,\n    max_iterations: 10,\n    on_iteration_failure: IterationFailureMode::Retry,\n};\n```\n\nThis loop:\n- Repeats up to 10 times\n- Each iteration: (1) LLM writes/fixes code, (2) runs tests\n- Exits when `Guard::TestsPass` succeeds\n- On iteration failure, retries immediately\n- Prevents infinite loops via `max_iterations`\n\n---\n\n## Built-in Agents\n\nVerdict ships with 6 specialist agents (defined in `src/agents/`):\n\n| Agent | Role | Default Tools |\n|-------|------|----------------|\n| **planner** | Breaks down tasks into steps | ReadOnly |\n| **coder** | Implements code changes | ReadWrite |\n| **reviewer** | Reviews code quality | ReadOnly |\n| **debugger** | Fixes compilation/test failures | ReadWrite |\n| **reflector** | Analyzes agent performance | ReadOnly |\n| **orchestrator** | Coordinates multi-agent workflows | ReadOnly |\n\nEach agent has its own pipeline, policy, and allowed tool scope. Agents can delegate to each other (with depth limits).\n\n---\n\n## Built-in Skills\n\nVerdict includes 5 built-in skills (in `src/skills/builtin/`):\n\n- **rust_debugging**: Fix Rust compile/test failures\n- **code_review**: Review code for quality \u0026 security\n- **api_design**: Design clean APIs\n- **test_writing**: Generate comprehensive tests\n- **refactoring**: Refactor code safely\n\nSkills can be injected into LLM prompts or run as sub-pipelines.\n\n---\n\n## Testing\n\nRun all tests:\n\n```bash\ncargo test\n```\n\nRun a specific phase:\n\n```bash\ncargo test --test phase1\ncargo test --test phase2\n# ... up to phase9\n```\n\nEach phase file tests the corresponding set of features in isolation.\n\n---\n\n## Contributing\n\nVerdict is designed to be extended. You can:\n\n1. Add new guards to `src/guard.rs`\n2. Add new built-in tools to `src/tools/`\n3. Register custom agents in `AgentRegistry`\n4. Create custom skills in `src/skills/`\n5. Write plugins implementing the `Plugin` trait\n\nSee the architecture document for detailed design decisions.\n\n---\n\n## License\n\nMIT (see LICENSE file for details)\n\n---\n\n## Examples\n\nTwo standalone example projects demonstrate Verdict in action:\n\n### [verdict-demo](https://github.com/eliasstepanik/verdict-demo)\nA showcase binary with 9 subcommands, each demonstrating a different feature of the framework:\n\n| Command | Demonstrates |\n|---------|-------------|\n| `pipeline` | Pipeline structure, guard enforcement, graceful LLM-absent failure |\n| `agents` | AgentRegistry, built-in agent introspection |\n| `guards` | All major guard types with real TOML/YAML/JSON/secrets parsing |\n| `tools` | FunctionTool, ToolRegistry, ToolSet scoping |\n| `audit` | AuditLog, InjectionScanner, SecretScanner |\n| `eval` | EvaluationSuite with Custom closure evaluation |\n| `budget` | BudgetTracker exhaustion, RateLimiter |\n| `monitor` | MonitoringServer on `http://127.0.0.1:9001` |\n| `live` | **Real 3-step LLM pipeline** — Haiku drafts → Sonnet refines → Opus critiques |\n\n```bash\ngit clone https://github.com/eliasstepanik/verdict-demo\ncd verdict-demo\ncargo run -- guards    # no LLM needed\ncargo run -- live      # requires an OpenAI-compatible endpoint\n```\n\n---\n\n### [verdict-micro-agent](https://github.com/eliasstepanik/verdict-micro-agent)\nA Micro Agent implementation — give it a natural-language function description and it\ngenerates Python code using a TDD loop: generate tests → write code → run → fix → repeat.\n\n```bash\ngit clone https://github.com/eliasstepanik/verdict-micro-agent\ncd verdict-micro-agent\ncargo run -- \"Write a Python function that checks if a number is prime\"\n```\n\nThe agent routes across three models based on task difficulty:\n- **Claude Haiku** — fast first code attempt\n- **Claude Sonnet** — fixes on iterations 1–2\n- **Claude Opus** — deep debugging on iterations 3+\n\nLoop exits as soon as all tests pass. Typical run: 1–2 iterations.\n\n---\n\n### [verdict-code](https://github.com/eliasstepanik/verdict-code)\nAn interactive opencode-like CLI assistant that demonstrates all major verdict features\nin a single runnable project: `Pipeline`, `Guard`, `Verdict`, `ToolRegistry`, `SkillRegistry`,\n`FunctionTool`, `StepAction::ToolCall`, and `StepAction::UseSkill`.\n\nEvery user message runs through a real verdict pipeline. Slash commands let you exercise\ntools and skills directly:\n\n| Command | Demonstrates |\n|---------|-------------|\n| `/tools` | `ToolRegistry`, `FunctionTool`, `StepAction::ToolCall`, `ToolSet::Allow` |\n| `/skills` | `SkillRegistry`, built-in skills, `StepAction::UseSkill` |\n| type \"count words\" | triggers a 4th `ToolCall` step automatically in the pipeline |\n\n```bash\ngit clone https://github.com/eliasstepanik/verdict-code\ncd verdict-code\n# Edit BASE_URL and API_KEY in src/main.rs\ncargo run\n```\n\n---\n\n## References\n\n- **Architecture**: Read `architecture.md` for the full design and extended examples\n- **How-to guide**: Read `how_to.md` for a field-by-field reference of every `AgentStep` option\n- **API Docs**: `cargo doc --open` to browse generated Rust docs\n- **Tests**: See `tests/phase*.rs` for working examples of all features\n\n---\n\n**Built with Rust 🦀 | Designed for safety, auditability, and self-improvement.**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feliasstepanik%2Fverdict","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feliasstepanik%2Fverdict","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feliasstepanik%2Fverdict/lists"}