{"id":46350290,"url":"https://github.com/mezmo/aura","last_synced_at":"2026-06-11T00:00:59.161Z","repository":{"id":342119310,"uuid":"1171124338","full_name":"mezmo/aura","owner":"mezmo","description":"A production-ready framework for composing AI agents from declarative TOML configuration, with MCP tool integration, RAG pipelines, and an OpenAI-compatible web API.","archived":false,"fork":false,"pushed_at":"2026-06-10T17:47:26.000Z","size":4581,"stargazers_count":81,"open_issues_count":47,"forks_count":17,"subscribers_count":3,"default_branch":"main","last_synced_at":"2026-06-10T19:07:26.485Z","etag":null,"topics":["ai-agent-framework","ai-agents","ai-sre","aiops","devops","harness","llm","llmops","mcp","model-context-protocol","observability","ollama","openai-api","opentelemetry","production-ready","rag","rust","sre","toml"],"latest_commit_sha":null,"homepage":"https://www.mezmo.com/aura","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mezmo.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":"CLA.md"}},"created_at":"2026-03-02T22:36:21.000Z","updated_at":"2026-06-10T13:24:45.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/mezmo/aura","commit_stats":null,"previous_names":["mezmo/aura"],"tags_count":42,"template":false,"template_full_name":null,"purl":"pkg:github/mezmo/aura","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mezmo%2Faura","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mezmo%2Faura/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mezmo%2Faura/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mezmo%2Faura/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mezmo","download_url":"https://codeload.github.com/mezmo/aura/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mezmo%2Faura/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34175887,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-10T02:00:07.152Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agent-framework","ai-agents","ai-sre","aiops","devops","harness","llm","llmops","mcp","model-context-protocol","observability","ollama","openai-api","opentelemetry","production-ready","rag","rust","sre","toml"],"created_at":"2026-03-04T23:01:20.936Z","updated_at":"2026-06-11T00:00:59.143Z","avatar_url":"https://github.com/mezmo.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AURA\n\nAURA is an agentic harness that turns an LLM model into a reliable, autonomous service capable of executing real SRE work. AURA provides the guardrails, API servers, state management, authentication, streaming, error handling, and tool integrations necessary to run AI SRE agents safely in production.\n\nKey capabilities:\n\n- Declarative agent composition via TOML with multi-provider LLM support and multi-agent serving\n- Dynamic [MCP](https://modelcontextprotocol.io) tool discovery via HTTP streamable, SSE, and STDIO transports\n- Automatic schema sanitization for OpenAI function-calling compatibility\n- Vector search integration with Qdrant and AWS Bedrock Knowledge Base\n- Embeddable Rust core independent from configuration layer\n- Multi-agent orchestration with coordinator/worker architecture and DAG-based parallel execution\n- Dependency-aware multi-wave execution with plan/execute loops\n- [A2A protocol](https://github.com/a2a-protocol) support for agent-to-agent interoperability\n\n## Table of Contents\n\n- [Quick Start](#quick-start)\n- [Project Structure](#project-structure)\n- [Development Setup](#development-setup)\n- [Usage](#usage)\n  - [Web API Server](#web-api-server)\n  - [Client-Side Tools](#client-side-tools)\n- [Configuration](#configuration)\n  - [Multiple Agents](#multiple-agents)\n  - [Configuration Sections](#configuration-sections)\n  - [Orchestration](#orchestration)\n  - [Scratchpad (Context Window Management)](#scratchpad-context-window-management)\n  - [Ollama](#ollama)\n  - [Observability](#observability)\n- [Development and Testing](#development-and-testing)\n- [Testing](#testing)\n- [Documentation](#documentation)\n- [Architecture](#architecture)\n\n## Quick Start\n\n```bash\ncp .env.example .env            # set your LLM provider, model, and API key\ndocker compose up -d            # starts Aura (orchestrator mode) + LibreChat + Phoenix\ndocker exec -it aura ./aura-cli # chat with the orchestrator from your terminal\n```\n\nAura boots in **orchestrator mode**: a coordinator routes each request — answering simple ones directly and decomposing complex ones across specialized workers. The bundled `aura-cli` connects to the in-container server automatically and renders the coordinator's plan and worker activity as it streams.\n\nPrefer a browser? Open \u003chttp://localhost:3080\u003e to chat in LibreChat, or \u003chttp://localhost:6006\u003e to inspect traces in Phoenix.\n\n**[Full quickstart guide](docs/quickstart.md)** — provider setup (OpenAI, Anthropic, Ollama, llama-server), adding MCP tools, enabling vector search, serving multiple agents, and troubleshooting.\n\n### More Quickstarts\n\n- **[Orchestration — Math MCP](examples/quickstart-orchestration-math/README.md)** — Multi-agent orchestration with coordinator/worker architecture\n- **[Kubernetes SRE](examples/quickstart-k8s-sre/README.md)** — AI-powered SRE agent on KIND with Kubernetes and Prometheus MCP servers\n- **[Example Configs](examples/README.md)** — Minimal per-provider configs and complete agent compositions\n\n## Project Structure\n\n```text\naura/\n├── crates/\n│   ├── aura/                # Core library (agent builder + orchestration)\n│   ├── aura-cli/            # Interactive terminal client (HTTP + standalone modes)\n│   ├── aura-config/         # TOML parser and config loader\n│   ├── aura-events/         # Shared SSE event types\n│   ├── aura-test-utils/     # Shared testing utilities\n│   └── aura-web-server/     # OpenAI-compatible HTTP/SSE server\n├── compose/                 # Docker Compose (integration + orchestration overlays)\n├── configs/                 # E2E test and orchestration configurations\n├── deployment/              # Helm charts and K8s manifests\n├── docs/                    # Architecture and protocol documentation\n├── examples/                # Example and reference configurations\n├── scripts/                 # CI and utility scripts\n└── tests/                   # Integration test fixtures and helpers\n```\n\n## Development Setup\n\nFor building AURA from source without Docker.\n\n1. Install Rust if needed:\n   ```bash\n   curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh\n   ```\n2. Clone and configure:\n   ```bash\n   cd aura\n   cp examples/reference.toml config.toml\n   ```\n3. Set required environment variables:\n   ```bash\n   export OPENAI_API_KEY=\"your-api-key\"\n   ```\n4. Build and run:\n   ```bash\n   cargo run --bin aura-web-server\n   ```\n\nSecurity: keep secrets in environment variables and reference them in TOML using `{{ env.VAR_NAME }}`.\n\n## Usage\n\n### Web API Server\n\nRun the web server:\n\n```bash\n# Default: reads config.toml\ncargo run --bin aura-web-server\n\n# Custom config file\nCONFIG_PATH=my-config.toml cargo run --bin aura-web-server\n\n# Config directory (serves multiple agents)\nCONFIG_PATH=configs/ cargo run --bin aura-web-server\n\n# Host/port override\nHOST=0.0.0.0 PORT=3000 cargo run --bin aura-web-server\n\n# Enable Aura custom SSE events\nAURA_CUSTOM_EVENTS=true cargo run --bin aura-web-server\n\n# Kitchen sink: all options\nCONFIG_PATH=configs/ \\\n  HOST=0.0.0.0 PORT=8080 \\\n  AURA_CUSTOM_EVENTS=true \\\n  AURA_EMIT_REASONING=true \\\n  TOOL_RESULT_MODE=aura \\\n  OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 \\\n  cargo run --bin aura-web-server -- --verbose\n```\n\nCore server options:\n\n| Option                       | Env Variable               | Default       | Description                         |\n| ---------------------------- | -------------------------- | ------------- | ----------------------------------- |\n| `--config`                   | `CONFIG_PATH`              | `config.toml` | Path to TOML config file or directory |\n| `--host`                     | `HOST`                     | `127.0.0.1`   | Bind host                           |\n| `--port`                     | `PORT`                     | `8080`        | Bind port                           |\n| `--server-url`               | `AURA_SERVER_URL`          | host/port     | Canonical public origin published in the A2A agent card (see below) |\n| `--streaming-timeout-secs`   | `STREAMING_TIMEOUT_SECS`   | `900`         | Max SSE request duration            |\n| `--first-chunk-timeout-secs` | `FIRST_CHUNK_TIMEOUT_SECS` | `30`          | Max time to first provider chunk    |\n| `--streaming-buffer-size`    | `STREAMING_BUFFER_SIZE`    | `400`         | SSE backpressure buffer             |\n| `--aura-custom-events`       | `AURA_CUSTOM_EVENTS`       | `false`       | Enable `aura.*` events              |\n| `--aura-emit-reasoning`      | `AURA_EMIT_REASONING`      | `false`       | Enable `aura.reasoning`             |\n| `--tool-result-mode`         | `TOOL_RESULT_MODE`         | `none`        | Tool result streaming: none, open-web-ui, aura |\n| `--tool-result-max-length`   | `TOOL_RESULT_MAX_LENGTH`   | `1000`        | Max chars before truncation (aura events) |\n| `--shutdown-timeout-secs`    | `SHUTDOWN_TIMEOUT_SECS`    | `30`          | Graceful shutdown window            |\n\nTool result modes:\n\n- `none`: spec-compliant; tool results appear only in model summary.\n- `open-web-ui`: tool results emitted through `tool_calls` for OpenWebUI compatibility.\n- `aura`: tool results emitted via `aura.tool_complete` events.\n\nAPI examples:\n\n```bash\n# Health\ncurl http://localhost:8080/health\n\n# List available models (agents)\ncurl http://localhost:8080/v1/models\n\n# OpenAI-compatible chat completion\ncurl -X POST http://localhost:8080/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}]}'\n\n# Select a specific agent by name or alias via the model field\ncurl -X POST http://localhost:8080/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\": \"my-agent\", \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}]}'\n\n# Streaming response\ncurl -X POST http://localhost:8080/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}], \"stream\": true}'\n```\n\nSSE protocol details, event types, custom events, and client handling are documented in [docs/streaming-api-guide.md](docs/streaming-api-guide.md).\n\n#### A2A Protocol\n\n\u003e **Disabled by default.** A2A endpoints are only activated when the server is started with `--enable-a2a` (or `AURA_ENABLE_A2A=true`). Omitting the flag means no A2A routes are registered and the agent card is not served.\n\nAura exposes [A2A protocol](https://github.com/a2a-protocol) endpoints for agent-to-agent interoperability. This allows other A2A-compatible agents and clients to discover and interact with Aura agents using a standardized protocol.\n\n```bash\n# Agent card (capability discovery)\ncurl http://localhost:8080/.well-known/agent-card.json\n\n# Send a message via REST\ncurl -X POST http://localhost:8080/a2a/v1/message:send \\\n  -H \"Content-Type: application/json\" \\\n  -H \"A2A-Version: 1.0\" \\\n  -d '{\"message\": {\"messageId\": \"msg-001\", \"role\": \"ROLE_USER\", \"parts\": [{\"text\": \"Hello\"}]}}'\n\n# Send a message via JSON-RPC\ncurl -X POST http://localhost:8080/a2a/v1/rpc \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"jsonrpc\": \"2.0\", \"method\": \"SendMessage\", \"params\": {\"message\": {\"messageId\": \"msg-002\", \"role\": \"ROLE_USER\", \"parts\": [{\"text\": \"Hello\"}]}}, \"id\": 1}'\n```\n\n\u003e **Set `AURA_SERVER_URL` when running behind a proxy, load balancer, or in Kubernetes.** The agent card must advertise **absolute** endpoint URLs, and A2A clients use those URLs directly — a relative or wrong-host URL makes `message:send` fail even though the card itself loads. Aura builds the card's URLs from `AURA_SERVER_URL` (or `--server-url`); set it to the externally-reachable origin clients use (e.g. `https://aura.example.com`). When unset, it falls back to the bind host/port, which is only correct for direct local access.\n\nA2A endpoints, transport modes, the agent card URL, task lifecycle, and testing examples are documented in [docs/a2a-implementation.md](docs/a2a-implementation.md).\n\n### Client-Side Tools\n\n\u003e ---\n\u003e # **USE AT YOUR OWN RISK**\n\u003e ---\n\u003e\n\u003e **Setting `enable_client_tools = true` on an agent grants the LLM the ability to call tools that execute on the *client's* machine.** When clients (e.g. `aura-cli`) advertise tools like `Shell`, `Read`, or `Update`, the LLM can invoke them and the client will execute them with the privileges of the user running the client. This is functionally equivalent to giving the model a shell prompt on every connecting client.\n\u003e\n\u003e **The risks are real:**\n\u003e - **Prompt injection.** Anything the model reads — a file, an MCP tool output, a vector-store hit, a URL — can contain instructions that hijack the model into running destructive commands. The server cannot tell a legitimate request from an injected one.\n\u003e - **Hallucination.** The model can confidently call the wrong tool with the wrong arguments. There is no undo for a `Shell(\"rm -rf ...\")` invocation.\n\u003e - **No server-side sandbox.** The server only forwards tool calls; execution happens client-side with full host privileges. Whatever sandboxing exists is the client's responsibility.\n\u003e - **Per-agent permission filters reduce blast radius but are not a security boundary.** `client_tool_filter` controls which tools the model *can ask for*, not what they do once invoked.\n\u003e\n\u003e **Only enable on agents where:**\n\u003e - You trust the model, the provider, and every data source the model can read (configs, MCP servers, vector stores, web fetches).\n\u003e - You trust every client that will connect with `--enable-client-tools` and the user account it runs under.\n\u003e - You and your users accept that worst-case loss (deleted files, leaked credentials, modified source) is acceptable or recoverable.\n\u003e\n\u003e Disabled by default. Opting an agent in is your decision and your responsibility — and your users'.\n\u003e\n\u003e See [aura-cli's matching warning](crates/aura-cli/README.md#client-side-tools) for the client-side perspective.\n\n\u003e **Single-agent configurations only.** Client-side tools are not supported in orchestrated (multi-agent) configurations — when `[orchestration].enabled = true`, any `tools` array on the request is dropped with a warning. The reason: the passthrough mechanism requires terminating the user-facing SSE stream with `finish_reason: \"tool_calls\"`, which doesn't compose with the coordinator/worker pipeline. If you need local tools, use a single-agent config.\n\nThe server honors a `tools` array on incoming chat completion requests. Whether those tools are actually attached to the LLM is a **per-agent opt-in** in TOML — there is no server-wide flag. Tools that get attached are registered as **passthrough** tools: the LLM sees them alongside any server-side MCP tools and can call them, but instead of executing server-side, the stream terminates with `finish_reason: \"tool_calls\"` so the client can run the tool locally and submit the result back as a `role: \"tool\"` follow-up.\n\n```toml\n[agent]\nname = \"Assistant\"\nsystem_prompt = \"...\"\nenable_client_tools = true\nclient_tool_filter = [\"Read\", \"ListFiles\", \"Find*\"]   # optional; omitted/empty = all\n```\n\n`client_tool_filter` is a list of glob patterns matched against the request's `tools[].function.name`. An empty or omitted filter means all client tools are available. A request that supplies tools never reaches an agent that did not opt in.\n\n```bash\n# 1) Initial request advertising a client-side tool\ncurl -X POST http://localhost:8080/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"stream\": true,\n    \"messages\": [{\"role\": \"user\", \"content\": \"What time is it?\"}],\n    \"tools\": [{\n      \"type\": \"function\",\n      \"function\": {\n        \"name\": \"get_current_time\",\n        \"description\": \"Get the current time\",\n        \"parameters\": {\"type\": \"object\", \"properties\": {}}\n      }\n    }]\n  }'\n\n# 2) Stream ends with finish_reason: \"tool_calls\". The client executes the tool\n#    locally and submits the result back in a follow-up request:\ncurl -X POST http://localhost:8080/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"stream\": true,\n    \"tools\": [ ... same tools array ... ],\n    \"messages\": [\n      {\"role\": \"user\", \"content\": \"What time is it?\"},\n      {\"role\": \"assistant\", \"content\": null, \"tool_calls\": [\n        {\"id\": \"call_abc\", \"type\": \"function\",\n         \"function\": {\"name\": \"get_current_time\", \"arguments\": \"{}\"}}\n      ]},\n      {\"role\": \"tool\", \"tool_call_id\": \"call_abc\", \"content\": \"2026-04-30T14:30:00Z\"}\n    ]\n  }'\n```\n\nWhen the loaded agent doesn't opt in (the default), any `tools` field on the request is silently dropped; the server runs MCP tools as usual but never asks the client to execute anything. Per-agent opt-in is the design — accepting client-supplied tool definitions means trusting the client to execute them, so it should be a deliberate config decision. See [aura-cli](crates/aura-cli/README.md#client-side-tools) for the matching client-side flag (`--enable-client-tools`) and how the two halves coordinate.\n\n## Configuration\n\n\u003cdetails\u003e\n\u003csummary\u003eRecent breaking changes\u003c/summary\u003e\n\n- **21 April 2026**: `[llm]` moved under `[agent.llm]`; workers may override via `[orchestration.worker.\u003cname\u003e.llm]`. See [migration guide](docs/breaking-changes/20260421-llm-under-agent.md).\n- **10 April 2026**: Several fields moved from `[agent]` to `[llm]`; Ollama params consolidated under `[llm.additional_params]`. See [migration guide](docs/breaking-changes/20260410-agent-llm-toml-configuration.md).\n\n\u003c/details\u003e\n\n`CONFIG_PATH` can point to a single TOML file or a directory of `.toml` files. When pointed at a directory, AURA loads every `.toml` file and serves each as a selectable agent. Clients choose an agent via the `model` field in chat completion requests — the same field that tools like LibreChat, OpenWebUI, and CLI clients use to present a model picker.\n\n### Multiple Agents\n\nTo serve multiple agents, create a directory with one TOML file per agent:\n\n```\nconfigs/\n├── research-assistant.toml\n├── devops-agent.toml\n└── code-reviewer.toml\n```\n\n```bash\nCONFIG_PATH=configs/ cargo run --bin aura-web-server\n```\n\nEach agent is identified by its `alias` (if set) or `name`. Clients discover available agents via `GET /v1/models` and select one by passing its identifier as the `model` field in requests. When no `model` is specified, the server resolves the agent via `DEFAULT_AGENT`, or automatically when only one config is loaded.\n\nThe `alias` field provides a stable, client-facing identifier that is independent of the agent's display name:\n\n```toml\n[agent]\nname = \"DevOps Assistant\"\nalias = \"devops\"             # clients send \"model\": \"devops\"\nsystem_prompt = \"You are a DevOps expert.\"\nmodel_owner = \"mezmo\"        # override owned_by in /v1/models (defaults to LLM provider)\n```\n\nAliases must be unique across all loaded configs. If two configs share the same `name` and neither has an alias, loading fails with a validation error.\n\n### Configuration Sections\n\nConfiguration sections:\n\n- `[agent]`: identity, system prompt, and runtime behavior.\n- `[agent.llm]`: provider and model configuration for the agent.\n- `[[vector_stores]]`: optional vector search configuration.\n- `[mcp]` and `[mcp.servers.*]`: MCP configuration, schema sanitization, and transports.\n\nSupported providers: OpenAI, Anthropic, Bedrock, Gemini, Ollama, and OpenRouter.\n\nSupported MCP transports:\n\n- `http_streamable` (recommended for production)\n- `sse`\n- `stdio` - launches a local child process per request. The [MCP specification](https://modelcontextprotocol.io/specification/2024-11-05/basic/transports) defines this transport for client-side sidecars, not for server deployments. If you need high concurrency, use `http_streamable`.\n\nSTDIO configuration uses `cmd` and `args`, both are lists. `cmd[0]` is the executable. Any additional elements in `cmd` are part of the command itself, such as a script path that needs an interpreter. `args` are passed to the spawned process separately:\n\n```toml\n# Binary with package arguments\n[mcp.servers.my_stdio]\ntransport = \"stdio\"\ncmd = [\"npx\"]\nargs = [\"-y\", \"@modelcontextprotocol/server-everything\"]\n\n# Script that needs an interpreter\n[mcp.servers.my_script]\ntransport = \"stdio\"\ncmd = [\"python3\", \"/opt/mcp-servers/weather.py\"]\nargs = [\"--verbose\"]\n\n# Direct binary\n[mcp.servers.my_binary]\ntransport = \"stdio\"\ncmd = [\"/usr/local/bin/mcp-server\"]\nargs = [\"--config\", \"/etc/mcp/config.json\"]\n```\n\nTool names are not namespaced by server. If two servers register a tool with the same name, the first one loaded wins silently ([#186](https://github.com/mezmo/aura/issues/186)).\n\n`headers_from_request` can forward incoming request headers to MCP servers for per-request auth.\n\n`turn_depth` controls how many tool-calling rounds can happen in a single turn. Higher values allow multi-step tool workflows before final response generation. This acts as a failsafe to prevent models from spinning out in unbounded tool-call loops.\n\n`context_window` sets the context window size (in tokens) for the agent, used for usage percentage reporting in `aura.session_info` streaming events.\n\nThe complete starter configuration is in [examples/reference.toml](examples/reference.toml). Minimal per-provider configs are in `examples/minimal/` and complete agent examples are in `examples/complete/`.\n\nMinimal example:\n\n```toml\n[agent]\nname = \"Assistant\"\nalias = \"my-assistant\"       # optional: stable client-facing identifier\nsystem_prompt = \"You are a helpful assistant.\"\nturn_depth = 2\n\n[agent.llm]\nprovider = \"openai\"\napi_key = \"{{ env.OPENAI_API_KEY }}\"\nmodel = \"gpt-5.2\"\ncontext_window = 128000\n\n[mcp.servers.my_server]\ntransport = \"http_streamable\"\nurl = \"http://localhost:8081/mcp\"\nheaders = { \"Authorization\" = \"Bearer {{ env.MCP_TOKEN }}\" }\n```\n\nValidate built-in config examples and tests:\n\n```bash\ncargo test -p aura-config\n```\n\nThis runs all config validation tests, including `test_all_shipped_configs_parse` which validates every `.toml` file in `configs/`, `examples/`, and `quickstart.toml`.\n\nTo validate your own config file, start the web server or CLI — both validate the config immediately and exit with a clear error if parsing fails, before binding to any port or entering the REPL:\n\n```bash\n# Validate via web server (exits on parse error before binding)\ncargo run -p aura-web-server -- --config your-config.toml\n\n# Validate via standalone CLI (exits on parse error before REPL)\ncargo run -p aura-cli --features standalone-cli -- --standalone --config your-config.toml\n```\n\n### Orchestration\n\nEnable orchestration mode in config:\n\n```toml\n# Top-level: shared by orchestration persistence and single-agent scratchpad\nmemory_dir = \"/tmp/orchestration-memory\"\n\n[orchestration]\nenabled = true\nmax_planning_cycles = 3\ntools_in_planning = \"summary\"\nallow_direct_answers = true\nallow_clarification = true\n\n[orchestration.worker.operations]\ndescription = \"Operational analysis and diagnostics\"\npreamble = \"You are an operations specialist.\"\nmcp_filter = [\"ops_*\"]\nvector_stores = []\n\n[orchestration.worker.knowledge]\ndescription = \"Documentation and procedures\"\npreamble = \"You are a knowledge specialist.\"\nmcp_filter = []\nvector_stores = [\"docs\"]\n```\n\nEach worker inherits `[agent.llm]` by default. To run a worker against a different model (cheaper, faster, bigger context, different provider), add a _complete_ LLM configuration at `[orchestration.worker.\u003cname\u003e.llm]` - this must be a complete LLM configuration not just the individual LLM fields you want to \"override\":\n\n```toml\n[orchestration.worker.formatting.llm]\nprovider = \"anthropic\"\napi_key = \"{{ env.ANTHROPIC_API_KEY }}\"\nmodel = \"claude-haiku-4-5-20251001\"\ncontext_window = 200000\n```\n\nThe worker's resolved `context_window` is what gets reported in per-worker `aura.session_info` events.\n\nExecution loop:\n\n- `Plan`: coordinator decomposes the request into a task DAG.\n- `Execute`: dependency-ready tasks run in parallel waves on worker agents.\n- `Continue`: coordinator consolidates worker outputs and routes to a final response, replan, or clarification.\n\nWorkers run with isolated task context windows and filtered MCP/vector-store access based on each worker block.\n\nFor a fuller multi-worker example, see [configs/example-math-orchestration.toml](configs/example-math-orchestration.toml).\n\n#### Orchestration fields\n\n| Field | Type | Default | Description |\n|-------|------|---------|-------------|\n| `enabled` | bool | `false` | Enable orchestration mode |\n| `max_planning_cycles` | int | `3` | Maximum plan→execute→continue iterations |\n| `allow_direct_answers` | bool | `true` | Allow coordinator to answer simple queries directly |\n| `allow_clarification` | bool | `true` | Allow coordinator to ask for clarification |\n| `tools_in_planning` | string | `\"summary\"` | Tool visibility for coordinator: `\"none\"`, `\"summary\"` (names only), `\"full\"` (with descriptions) |\n| `max_plan_parse_retries` | int | `3` | Retries if coordinator produces unparseable plan JSON |\n| `max_tools_per_worker` | int | `10` | Cap on MCP tools exposed to each worker |\n| `duplicate_call_nudge_threshold` | int | `3` | Consecutive identical tool calls before appending guidance annotation |\n| `duplicate_call_block_threshold` | int | `5` | Consecutive identical tool calls before appending abort annotation and setting escalation flag |\n| `worker_system_prompt` | string | — | Optional global system prompt prepended to all workers |\n| `coordinator_vector_stores` | list | `[]` | Vector stores available to the coordinator agent |\n| `result_artifact_threshold` | int | `4000` | Character count above which worker results are saved as artifacts |\n| `result_summary_length` | int | `2000` | Max characters for artifact summaries passed to coordinator |\n| `timeouts.per_call_timeout_secs` | int | `0` | Per-tool-call timeout in seconds (0 = disabled) |\n\n#### Worker fields (`[orchestration.worker.\u003cname\u003e]`)\n\n| Field | Type | Default | Description |\n|-------|------|---------|-------------|\n| `description` | string | *required* | Short description shown to coordinator during planning |\n| `preamble` | string | *required* | System prompt for this worker |\n| `mcp_filter` | list | `[]` | Glob patterns selecting which MCP tools this worker can use |\n| `vector_stores` | list | `[]` | Named vector stores this worker has access to |\n| `turn_depth` | int | — | Per-worker tool-call depth limit (overrides `[agent].turn_depth`) |\n| `llm` | table | inherits `[agent.llm]` | Optional per-worker LLM override — different model (and other `[agent.llm]` fields) while reusing provider credentials |\n| `scratchpad` | table | inherits `[agent.scratchpad]` | Optional per-worker scratchpad config override |\n\n### Scratchpad (Context Window Management)\n\nMCP tools can return responses far larger than an LLM's context window — a single Kubernetes workload listing or log export can be tens of thousands of tokens. Without intervention, this fills the context and degrades reasoning quality.\n\nScratchpad solves this by intercepting large tool outputs and storing them on disk. The LLM gets a summary and eight read-only exploration tools (`head`, `slice`, `grep`, `schema`, `item_schema`, `get_in`, `iterate_over`, `read`) to selectively pull in only the data it needs.\n\nScratchpad works in both single-agent and orchestration modes. Configure at `[agent.scratchpad]` (applies to the single agent, or provides defaults for orchestration workers) and optionally override per worker at `[orchestration.worker.\u003cname\u003e.scratchpad]`. Set a top-level `memory_dir` for persistence:\n\n```toml\n# Top-level — required when scratchpad is enabled. Shared by single-agent\n# scratchpad and orchestration persistence.\nmemory_dir = \"/tmp/aura\"\n\n[agent.scratchpad]\nenabled = true\ncontext_safety_margin = 0.20          # 20% of context reserved for reasoning/output\nmax_extraction_tokens = 10_000        # cap per extraction tool call\nturn_depth_bonus = 6                  # extra ReAct turns when scratchpad is active\n\n[orchestration.worker.data-explorer.scratchpad]\n# Override just for this worker\nmax_extraction_tokens = 5_000\n```\n\n**Storage location**:\n- Single-agent: `{memory_dir}/scratchpad/`\n- Orchestration: `{memory_dir}/{run_id}/iteration-{n}/scratchpad/` (legacy `[orchestration.artifacts].memory_dir` still works as a fallback)\n\nPer-tool interception thresholds are configured at `[mcp.servers.\u003cserver\u003e.scratchpad]`. Keys are **glob patterns** (default threshold `5_120` if omitted) that are matched against tool names at interception time:\n\n```toml\n[mcp.servers.k8s-sre.scratchpad]\n\"*_list_*\"                  = { min_tokens = 512 }   # broad\n\"k8s_list_service_monitors\" = { min_tokens = 384 }   # specific override\n\"*\"                         = { min_tokens = 4096 }  # catch-all\n```\n\nWhen multiple patterns match the same tool, the **longest (most specific) pattern wins**; on length ties the smallest threshold wins. Token counting uses real BPE tokenization via `tiktoken-rs` — not byte/character heuristics — so `min_tokens` reflects actual model token cost.\n\n**Per-call extraction limit (`max_extraction_tokens`, default 10_000):** every exploration tool checks the size of its result before returning. If a single call would exceed this cap (or the cumulative `ContextBudget`), the tool returns a structured JSON error like `{\"error\": \"head_too_large\", \"estimated_tokens\": ..., \"suggestions\": [...]}` instead of the content. The LLM sees this as a successful tool result and retries with smaller params — each retry consumes a turn, which is why `turn_depth_bonus` exists.\n\nEach agent (single-agent or orchestration worker) gets a **fresh `ContextBudget`** scoped to that agent's effective LLM's `context_window`. LLM-reported per-turn token counts feed back into the budget as ground truth, so `remaining()` reflects actual context pressure (orchestration via `StreamItem::TurnUsage`, single-agent via the streaming hook). A per-agent `aura.scratchpad_usage` SSE event is emitted when the agent finishes — the same event name fires for both single-agent and worker contexts (it lives in the base `aura.*` namespace, not `aura.orchestrator.*`).\n\n### Ollama\n\nAURA supports Ollama, including fallback tool-call parsing for models that emit tool calls as text. Full setup, parameter guidance, and model caveats are in [docs/ollama-guide.md](docs/ollama-guide.md).\n\n### Observability\n\nOpenTelemetry support is enabled by default via the `otel` feature in both `aura` and `aura-web-server`. Configure your OTLP endpoint using standard environment variables (for example `OTEL_EXPORTER_OTLP_ENDPOINT`) to export traces.\n\nAURA emits spans using the [OpenInference](https://github.com/Arize-ai/openinference/tree/main/spec) semantic convention (`llm.*`, `tool.*`, `input.*`, `output.*`) rather than the `gen_ai.*` conventions. Any `gen_ai.*` attributes from underlying provider libraries (Rig.rs) are automatically translated to OpenInference equivalents at export time. This makes AURA traces natively compatible with [Phoenix](https://github.com/Arize-ai/phoenix) and other OpenInference-aware observability tools.\n\n## Development and Testing\n\nQuick commands:\n\n```bash\n# Full local quality checks\nmake ci\n\n# Individual checks\nmake fmt\nmake fmt-check\nmake test\nmake lint\n\n# Build targets\nmake build\n```\n\n## Testing\n\nWeb server integration tests live under `crates/aura-web-server/tests/`.\n\nRun integration workflows:\n\n```bash\n# Standard integration suites\nmake test-integration\n\n# Local integration run against locally started test infra\nmake test-integration-local\n\n# Orchestration-specific integration suites\nmake test-integration-orchestration\n\n# Local orchestration integration run\nmake test-integration-orchestration-local\n\n# SRE orchestration integration suites\nmake test-integration-sre-orchestration\n\n# Local SRE orchestration integration run\nmake test-integration-sre-orchestration-local\n```\n\nIntegration test feature flags (`crates/aura-web-server/Cargo.toml`):\n\n- Parent flag: `integration`\n- Suite flags: `integration-streaming`, `integration-header-forwarding`, `integration-mcp`, `integration-events`, `integration-cancellation`, `integration-progress`\n- Orchestration suite: `integration-orchestration` (separate from parent `integration`)\n- SRE orchestration suite: `integration-orchestration-sre` (requires k8s-sre-mcp server config)\n- Optional suite: `integration-vector` (requires external Qdrant setup)\n\nDetailed test guidance: [crates/aura-web-server/README.md](crates/aura-web-server/README.md).\n\n## Documentation\n\n- [docs/quickstart.md](docs/quickstart.md): getting started guide — setup, customization, architecture, and troubleshooting.\n- [CHANGELOG.md](CHANGELOG.md): release and version history.\n- [docs/streaming-api-guide.md](docs/streaming-api-guide.md): SSE protocol guide, event taxonomy, tool result modes, custom `aura.*` events, orchestration events, and client examples.\n- [docs/request-lifecycle.md](docs/request-lifecycle.md): request flow diagram, lifecycle, timeout, cancellation, and shutdown behavior.\n- [docs/ollama-guide.md](docs/ollama-guide.md): Ollama configuration, fallback tool parsing, and local model guidance.\n- [docs/rig-fork-changes.md](docs/rig-fork-changes.md): Rig fork changes, tool execution order, and rationale.\n- [docs/tracing-spans.md](docs/tracing-spans.md): OpenTelemetry span layout, OpenInference span kinds, and trace parenting for both single-agent and orchestration modes.\n- [docs/breaking-changes/20260421-llm-under-agent.md](docs/breaking-changes/20260421-llm-under-agent.md): breaking configuration changes from 21 April 2026 — `[llm]` moved under `[agent.llm]` and per-worker LLM overrides.\n- [docs/breaking-changes/20260410-agent-llm-toml-configuration.md](docs/breaking-changes/20260410-agent-llm-toml-configuration.md): breaking configuration changes from 10 April 2026 — field migrations from `[agent]` to `[llm]` and Ollama parameter consolidation.\n- [docs/a2a-implementation.md](docs/a2a-implementation.md): A2A protocol endpoints, transport modes (REST and JSON-RPC), task lifecycle, and testing examples.\n\n## Architecture\n\nAURA separates concerns across crates:\n\n- `aura`: runtime agent building, MCP integration, orchestration, and vector workflows.\n- `aura-config`: typed TOML parsing and validation.\n- `aura-events`: shared SSE event types (`AuraStreamEvent`, `OrchestrationStreamEvent`) — lightweight, no agent dependencies.\n- `aura-web-server`: OpenAI-compatible REST/SSE serving layer.\n- `aura-cli`: interactive terminal client with HTTP and standalone modes.\n\nThis separation means:\n\n- Embeddable core: use `aura` directly in any Rust application without config file dependencies.\n- Shared event types: `aura-events` can be consumed by any Rust client without pulling in the full agent stack.\n- Testable boundaries: each crate has focused responsibilities and clear interfaces.\n\nKey architectural characteristics:\n\n- Dynamic MCP tool discovery at runtime.\n- Automatic schema sanitization (anyOf, missing types, optional parameters) driven by OpenAI function-calling requirements — MCP tool schemas are transformed at discovery time to conform to OpenAI's strict subset of JSON Schema.\n- Header forwarding support (`headers_from_request`) for per-request MCP auth delegation.  See [examples/reference.toml](examples/reference.toml) for a practical example.\n- Config-driven composition with embeddable Rust core.\n\nPrompt routing and execution model:\n\n- `build_streaming_agent()` routes requests based on `orchestration.enabled`.\n- Direct Mode (`orchestration.enabled = false`): single `Agent` handles the turn.\n- Orchestration Mode (`orchestration.enabled = true`): `Orchestrator` coordinates worker execution.\n- Both `Agent` and `Orchestrator` implement `StreamingAgent`, so they are interchangeable at the API boundary.\n\nOrchestrator components and loop:\n\n- Coordinator agent: plans task DAGs and consolidates worker outputs via continuation.\n- Worker agents: per-task instances with filtered MCP tools and vector stores.\n- Persistence/event layers: track plan state, task outcomes, and stream orchestration events.\n- Loop: Plan -\u003e Execute (dependency waves) -\u003e Continue (respond / plan again / clarify).\n\nRequest execution and cancellation flow are documented in [docs/request-lifecycle.md](docs/request-lifecycle.md).\n\n## License\n\nLicensed under the [Apache License, Version 2.0](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmezmo%2Faura","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmezmo%2Faura","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmezmo%2Faura/lists"}