{"id":49950267,"url":"https://github.com/jmagly/agentic-sandbox","last_synced_at":"2026-06-28T03:00:59.550Z","repository":{"id":351949747,"uuid":"1202486415","full_name":"jmagly/agentic-sandbox","owner":"jmagly","description":"Self-hostable runtime for persistent autonomous coding agents — KVM-isolated VMs (or rootless containers), A2A-protocol executor with signed AgentCard discovery, AIWG mission dispatch, web dashboard, virtiofs shared storage. Runs on your hardware; no hosted control plane.","archived":false,"fork":false,"pushed_at":"2026-06-27T20:49:55.000Z","size":6575,"stargazers_count":9,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-27T22:18:30.915Z","etag":null,"topics":["a2a-protocol","agent-card","agent-orchestration","agent-runtime","agentic-ai","ai-agents","aiwg","autonomous-agents","claude-code","grpc","hitl","jws-signing","kvm","qemu","rust","sandbox","self-hosted","virtiofs","vm-management","websocket"],"latest_commit_sha":null,"homepage":"https://github.com/jmagly/agentic-sandbox","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jmagly.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"docs/security/agent-transport-ca-backends.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-04-06T04:29:14.000Z","updated_at":"2026-06-27T20:50:02.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/jmagly/agentic-sandbox","commit_stats":null,"previous_names":["jmagly/agentic-sandbox"],"tags_count":50,"template":false,"template_full_name":null,"purl":"pkg:github/jmagly/agentic-sandbox","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmagly%2Fagentic-sandbox","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmagly%2Fagentic-sandbox/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmagly%2Fagentic-sandbox/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmagly%2Fagentic-sandbox/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jmagly","download_url":"https://codeload.github.com/jmagly/agentic-sandbox/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmagly%2Fagentic-sandbox/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34875362,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-28T02:00:05.809Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["a2a-protocol","agent-card","agent-orchestration","agent-runtime","agentic-ai","ai-agents","aiwg","autonomous-agents","claude-code","grpc","hitl","jws-signing","kvm","qemu","rust","sandbox","self-hosted","virtiofs","vm-management","websocket"],"created_at":"2026-05-17T18:38:18.972Z","updated_at":"2026-06-28T03:00:59.543Z","avatar_url":"https://github.com/jmagly.png","language":"Rust","funding_links":[],"categories":["⚙️ Implementations \u0026 Libraries"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# Agentic Sandbox\n\n### Self-hostable runtime for persistent autonomous coding agents.\n\nKVM-isolated VMs (or rootless containers) for long-running agent sessions. Management server with gRPC, WebSocket, and HTTP interfaces. Web dashboard, CLI, and REST API. Runs on your hardware; no hosted control plane.\n\n```bash\ngit clone https://github.com/jmagly/agentic-sandbox.git\ncd agentic-sandbox \u0026\u0026 make build \u0026\u0026 cd management \u0026\u0026 ./dev.sh\n# open http://localhost:8122 → \"+ Create Instance\" → Container → Create → done\n```\n\n**New here?** Walk through [**Getting Started**](docs/getting-started.md) — prerequisite check, ~15 min to first running agent.\n\n[![License: AGPL-3.0](https://img.shields.io/badge/License-AGPL--3.0-blue.svg?style=flat-square)](LICENSE)\n[![Rust](https://img.shields.io/badge/Rust-1.75%2B-orange?style=flat-square\u0026logo=rust)](https://www.rust-lang.org)\n[![Platforms](https://img.shields.io/badge/Runtime-QEMU%2FKVM%20%7C%20Docker-purple?style=flat-square)](docs/ARCHITECTURE.md)\n[![gRPC](https://img.shields.io/badge/Protocol-gRPC%20%7C%20WebSocket%20%7C%20HTTP-green?style=flat-square)](docs/API.md)\n\n[**Features**](#features) · [**Quick Start**](#quick-start) · [**Architecture**](#architecture) · [**API**](#api-reference)\n\n\u003c/div\u003e\n\n---\n\n## Features\n\n- **Persistent sessions.** Each agent runs inside its own VM (or container) with a persistent gRPC link to the management server. Closing your terminal does not stop the agent.\n- **Hardware isolation.** Full KVM virtualization — each agent gets its own kernel. Rootless Docker is supported as a lighter-weight alternative.\n- **Shared storage with explicit namespaces.** virtiofs-backed `global` (read-only) and `inbox` (read-write per-agent) mounts.\n- **Live terminal observability.** Server streams every PTY chunk to the dashboard; server-side virtual terminal snapshots available via REST.\n- **Human-in-the-loop.** PTY heuristics detect `(y/n)` and similar pauses, file a HITL request, and inject your response back into stdin.\n- **Restart-safe.** Session reconciliation, crash-loop detection, and ephemeral per-VM secrets.\n- **Resource governance.** Declarative quotas and per-VM CPU/memory/disk limits.\n- **Conformance-tested protocol surface.** A dedicated harness exercises the task API on every push — fast stub checks plus live-agent tiers covering terminal states, HITL round-trips, and restart durability.\n\nSecurity-sensitive claims are tracked in the dated\n[Security Status](docs/security/security-status.md) page. In short: the current\npublic posture is local-first, KVM-capable, and transport-identity aware, with\ncredential and release-provenance claims intentionally qualified by evidence.\n\n---\n\n## Part of the AIWG Suite\n\n[![Part of the AIWG ecosystem](https://aiwg.io/assets/badges/aiwg-wordmark-dark.png)](https://aiwg.io)\n\nAgentic Sandbox is the runtime substrate for the [AIWG SDLC suite](https://aiwg.io). AIWG provides the agents, skills, and workflow scaffolding; Agentic Sandbox provides the isolated execution environment. Either can be used independently.\n\n---\n\n## Quick Start\n\n\u003e **Full walkthrough** — including prerequisite verification, build-time expectations, and troubleshooting — is in [docs/getting-started.md](docs/getting-started.md). The summary below assumes the prerequisites are already installed.\n\u003e\n\u003e **Prerequisites**: Linux host. For the **container path** (fastest): Rust 1.75+, `protoc`, Docker. For the **VM path** (full isolation): all of the above **plus** KVM (`egrep -c '(vmx|svm)' /proc/cpuinfo` \u003e 0), libvirt + QEMU (`apt install qemu-kvm libvirt-daemon-system`), and an Ubuntu 24.04 base image (`cd images/qemu \u0026\u0026 ./build-base-image.sh 24.04`).\n\nThe recommended path launches the **full system** — management server + dashboard. From the dashboard you can create VM or container instances, attach terminal panes, and watch live events without ever touching a shell. Power-user shortcuts for skipping the dashboard are below.\n\n### Install a release package\n\nFor Linux operators, tagged releases publish native packages plus a checksum-verifying installer:\n\n```bash\ncurl -fsSL https://github.com/jmagly/agentic-sandbox/releases/download/v\u003cversion\u003e/agentic-sandbox-install.sh \\\n  | bash -s -- --version v\u003cversion\u003e\n```\n\nRelease verification steps for checksums, package assets, container image\ndigests, optional signatures, SBOMs, and current SLSA status are in\n[docs/releases/verification.md](docs/releases/verification.md).\n\nThe package installs `agentic-mgmt`, `agentic-host-runtime-daemon`, `vm-event-bridge`, `agent-client`, `sandboxctl`, and the `agentic-sandbox` CLI alias under `/usr/bin`, with env templates in `/etc/agentic-sandbox/` and systemd units in `/lib/systemd/system/`. Direct package installs are also supported:\n\n```bash\nsudo apt-get install ./agentic-sandbox_\u003cversion\u003e-1_amd64.deb\nsudo dnf install ./agentic-sandbox-\u003cversion\u003e-1.x86_64.rpm\n```\n\n### Start the full system (recommended)\n\n```bash\n# 1. Build all three crates (management server, agent client, CLI)\nmake build      # or: ( cd management \u0026\u0026 cargo build --release ) \u0026\u0026 \\\n                #     ( cd agent-rs   \u0026\u0026 cargo build --release ) \u0026\u0026 \\\n                #     ( cd cli        \u0026\u0026 cargo build --release )\n\n# 2. Start the management server. Dashboard is at http://localhost:8122,\n#    WebSocket at ws://localhost:8121, plaintext gRPC at loopback :8120,\n#    and agent gRPC mTLS at :8123 for Docker-reachable agents.\ncd management \u0026\u0026 ./dev.sh\n\n# 3. Open the dashboard in a browser:\n#    http://localhost:8122\n```\n\nIn the dashboard:\n\n1. Click **+ Create Instance** in the sidebar header.\n2. Pick **Runtime**:\n   - **Container** — fast (~2s), backed by Docker. Choose an agent image from the dropdown (`agentic/claude:latest`, `codex`, `opencode`).\n   - **VM** — full hardware isolation, ~30s–10m to provision depending on loadout. Pick a loadout (`claude-only`, `full-suite`, `dual-review`, etc.).\n3. Name it (`agent-01`, `my-codex`, anything matching `[a-z0-9-]+`).\n4. Click **Create**. The instance appears in the sidebar with a `[VM]` or `[CT]` badge.\n5. Click the row → click **📺 Pane** to attach a terminal session.\n\nStop / Restart / Force off / Delete are all per-row buttons; the pane has a `⟳ Resync` button if the terminal ever drifts.\n\nContainer bootstrap uses a one-time HTTP enrollment URL first, then reconnects\nover gRPC mTLS with a SPIFFE client identity. If containers cannot reach\n`host.docker.internal:8122`, start dev mode with a Docker-reachable HTTP bind\nor override `AGENTIC_CONTAINER_BOOTSTRAP_ENROLLMENT_URL`.\n\n### Same flow from the CLI\n\nIf you'd rather not open a browser, the `sandboxctl` CLI (also installed as `agentic-sandbox`) does everything the dashboard does:\n\n```bash\n# After `make build`, install or symlink the binary:\nln -sf \"$(pwd)/cli/target/release/sandboxctl\" ~/.local/bin/\n\n# Configure a context pointing at the local management server (one-time)\nsandboxctl config set-context local --server http://localhost:8122\n\n# Spawn a container-runtime agent\nsandboxctl container create agent-01 --image agentic/claude:latest\n\n# Or a VM-runtime agent\nsandboxctl vm create agent-02 --loadout profiles/claude-only.yaml --agentshare --start\n\n# List instances\nsandboxctl agent list\n\n# Find a session on the agent, then attach (Ctrl-A d to detach)\nsandboxctl session list --agent agent-01\nsandboxctl session attach \u003csession-id\u003e --write\n\n# Submit a long-running task from a manifest file\ncat \u003e task.yaml \u003c\u003c'EOF'\nversion: \"1\"\nkind: Task\nmetadata:\n  id: \"\"\n  name: \"Refactor authentication\"\nrepository:\n  url: \"https://github.com/myorg/myapp.git\"\n  branch: \"main\"\nclaude:\n  prompt: \"Refactor the authentication module to use JWT refresh tokens\"\n  model: \"claude-sonnet-4-5-20250929\"\nlifecycle:\n  timeout: \"2h\"\nEOF\nsandboxctl task submit --file task.yaml --wait\n```\n\nRun `sandboxctl --help` for the full noun-first verb tree (agent / session / container / vm / task / hitl / loadout / storage / event / health / ops).\n\n### Advanced: skip the dashboard, provision a VM directly\n\nFor air-gapped boxes, scripted environments, or when you want a single VM without running the management server, drive the provisioner directly:\n\n```bash\n./images/qemu/provision-vm.sh agent-01 \\\n  --loadout profiles/claude-only.yaml \\\n  --agentshare \\\n  --start\n\n# The agent inside the VM will try to dial host.internal:8120 in a loop.\n# Start the management server first for normal gRPC/dashboard access.\n# Direct runtime SSH is a dev/break-glass bypass path; managed-profile SSH\n# moves through the gateway access model (ADR-029) — SSH certificate leases at\n# /api/v2/gateway/ssh/leases. See docs/API.md \"Gateway SSH Certificate Leases\".\n```\n\nUseful flags: `--profile basic` (minimal cloud-init), `--cpus 8 --memory 16G --disk 100G`, `--network-mode isolated|allowlist|full`. See [`images/qemu/README.md`](images/qemu/README.md) for the full reference.\n\n### Submit a task via REST\n\nIf you're scripting against the API directly:\n\n```bash\ncurl -X POST http://localhost:8122/api/v1/tasks \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"manifest\": {\n      \"version\": \"1\",\n      \"kind\": \"Task\",\n      \"metadata\": {\n        \"id\": \"\",\n        \"name\": \"Refactor authentication\"\n      },\n      \"repository\": {\n        \"url\": \"https://github.com/myorg/myapp.git\",\n        \"branch\": \"main\"\n      },\n      \"claude\": {\n        \"prompt\": \"Refactor the authentication module to use JWT refresh tokens\",\n        \"model\": \"claude-sonnet-4-5-20250929\"\n      },\n      \"lifecycle\": {\n        \"timeout\": \"2h\"\n      }\n    }\n  }'\n```\n\nFor the full provisioning, profile, and loadout reference, see [docs/LOADOUTS.md](docs/LOADOUTS.md) and the [Provisioning](#provisioning) section below.\n\n---\n\n## Architecture\n\n### Topology\n\n```\nHost\n├── agent-01 (KVM VM)   192.168.122.201\n│   ├── Claude Code\n│   ├── Rust toolchain\n│   └── agent-client → gRPC → Management Server\n├── agent-02 (KVM VM)   192.168.122.202\n│   └── agent-client → gRPC → Management Server\n└── Management Server   :8120 gRPC  :8121 WS  :8122 HTTP\n```\n\nEach agent runs in a QEMU/KVM virtual machine provisioned from a cloud-init manifest. VMs are first-class objects with independent CPU, memory, and disk quotas, isolated libvirt networking, and ephemeral per-VM secrets. Docker containers are supported as a lighter-weight alternative for faster iteration.\n\n### Management Server\n\nA Rust async server (Tokio, Tonic, Axum) that coordinates all connected agents:\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│                  Management Server (Rust)                    │\n│                                                              │\n│  gRPC :8120          WebSocket :8121        HTTP :8122       │\n│  ┌──────────────┐    ┌───────────────┐    ┌──────────────┐  │\n│  │ AgentService │    │ WebSocketHub  │    │ HTTP API     │  │\n│  │ Connect()    │    │ terminal I/O  │    │ dashboard    │  │\n│  │ Exec()       │    │ metrics push  │    │ REST CRUD    │  │\n│  └──────────────┘    └───────────────┘    └──────────────┘  │\n│                                                              │\n│  AgentRegistry  CommandDispatcher  OutputAggregator          │\n│  HitlStore      ScreenRegistry     CrashLoopDetector         │\n│  TaskOrchestrator                  AiwgServeHandle           │\n└─────────────────────────────────────────────────────────────┘\n```\n\nAgent state — heartbeats, metrics, setup progress, loadout metadata — is tracked in-memory via `DashMap` and exposed through all three interfaces.\n\n### Task Orchestrator\n\nSubmit long-running AI tasks that get assigned to available VMs, monitored through completion, and stream their logs via SSE:\n\n```\nPENDING → STAGING → PROVISIONING → READY → RUNNING → COMPLETING → COMPLETED\n                                                  ↘                ↘\n                                               FAILED           CANCELLED\n```\n\nTasks receive a dedicated workspace in agentshare:\n\n```\n/srv/agentshare/\n├── tasks/{task_id}/manifest.yaml   # Task metadata\n├── inbox/{task_id}/                # Input files (read-only inside VM)\n└── outbox/{task_id}/               # Artifacts written by agent\n```\n\n### Agentshare Storage\n\nVMs get virtiofs-mounted shared storage with separate read-only and read-write namespaces:\n\n| Mount | VM Path | Mode | Purpose |\n|-------|---------|------|---------|\n| Global | `/mnt/global` (`~/global`) | Read-only | Shared tools, prompts, configs |\n| Inbox | `/mnt/inbox` (`~/inbox`) | Read-write | Task inputs, run logs, outputs |\n\nThe inbox layout provides structured access patterns — agents find their task workspace at `~/inbox/current/` without needing to know task IDs.\n\n### Human-in-the-Loop (HITL)\n\nThe management server monitors PTY output and automatically detects when an agent is waiting for human input. Detection runs after every output chunk through a scored heuristic that recognizes patterns like `(y/n)`, `[Y/n]`, `Human:`, `❯`, and explicit confirmation phrases.\n\n```\nAgent PTY output\n      │\n      ▼\nprompt_detector::detect_prompt()   ← scores output chunk\n      │\n  score ≥ 0.85\n      │\n      ▼\nHitlStore::create()                ← deduplicates per session\n      │\n      ├── REST: GET /api/v1/hitl          (operator polls)\n      ├── Dashboard: pending requests UI\n      └── AiwgServeHandle::emit()         (if aiwg serve wired in)\n                    │\n              operator responds\n                    │\n                    ▼\nPOST /api/v1/hitl/{id}/respond     ← injects text into PTY stdin\n```\n\nOne pending request per session at a time — duplicate detections are suppressed until the active request is resolved.\n\n### aiwg Serve Integration\n\nWhen `AIWG_SERVE_ENDPOINT` is set, the management server registers with an [aiwg serve](https://github.com/jmagly/aiwg/blob/main/docs/serve-guide.md) dashboard and streams live sandbox events over a persistent authenticated WebSocket. The integration reconnects with exponential backoff (1 s → 30 s) and never blocks server startup.\n\nThe sandbox additionally registers as an **AIWG executor** (per `executor.v1.md`), accepting mission dispatches via `POST /api/v1/sessions/:id/dispatch` and reporting the full `mission.*` lifecycle (assigned → started → completed/failed/aborted, with HITL and resumability) over a second WS at `/ws/executors/{id}`. Mission state persists across mgmt-server restarts in `\u003csecrets_dir\u003e/../missions.json`. Full integration spec: [`docs/aiwg-executor.md`](docs/aiwg-executor.md).\n\n| Event | Trigger |\n|-------|---------|\n| `agent.connected` | gRPC stream registered |\n| `agent.disconnected` | gRPC stream closed or timed out |\n| `agent.ready` | cloud-init provisioning complete |\n| `agent.provisioning` | loadout step progress |\n| `session.start` / `session.end` | PTY/exec session lifecycle |\n| `hitl.input_required` | HITL prompt detected |\n\n---\n\n## A Real Walkthrough\n\nWhat a typical autonomous coding task looks like end to end.\n\n### Provision\n\n```bash\n./images/qemu/provision-vm.sh agent-01 \\\n  --loadout profiles/claude-only.yaml \\\n  --agentshare \\\n  --start\n```\n\nVM boots, cloud-init runs the loadout manifest, agent-client registers via gRPC, status transitions `Starting → Provisioning → Ready`. If aiwg serve is configured, `agent.ready` fires.\n\n### Submit a Task\n\n```bash\ncurl -X POST http://localhost:8122/api/v1/tasks \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"manifest\": {\n      \"version\": \"1\",\n      \"kind\": \"Task\",\n      \"metadata\": {\n        \"id\": \"\",\n        \"name\": \"Refactor authentication\"\n      },\n      \"repository\": {\n        \"url\": \"https://github.com/myorg/myapp.git\",\n        \"branch\": \"main\"\n      },\n      \"claude\": {\n        \"prompt\": \"Refactor the authentication module to use JWT refresh tokens\",\n        \"model\": \"claude-sonnet-4-5-20250929\"\n      },\n      \"lifecycle\": {\n        \"timeout\": \"2h\"\n      }\n    }\n  }'\n```\n\nTask is assigned to `agent-01`, repository cloned into inbox, Claude Code launched inside the VM.\n\n### Monitor in Real Time\n\nOpen `http://localhost:8122` for the live terminal stream, or:\n\n```bash\ncurl http://localhost:8122/api/v1/tasks/{task_id}/logs\n```\n\n### Agent Pauses — HITL\n\nAn hour in, Claude Code hits an ambiguous refactor decision and prints a confirmation prompt. The dashboard shows a pending HITL request. Respond without opening a terminal:\n\n```bash\ncurl -X POST http://localhost:8122/api/v1/hitl/{hitl_id}/respond \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"response\": \"yes, update all callers\"}'\n```\n\nThe response text is injected into the agent's PTY stdin and the agent continues.\n\n### Collect Artifacts\n\n```bash\nls /srv/agentshare/outbox/{task_id}/\n# auth-module/  jwt-refresh.ts  test-results.json  SUMMARY.md\n```\n\n---\n\n## Provisioning\n\n### Profiles\n\nPre-built profiles for common setups:\n\n| Profile | Tools | Use Case |\n|---------|-------|----------|\n| `agentic-dev` | Python (uv), Node.js (fnm), Go, Rust, Claude Code, Aider, Docker, ripgrep, fd, jq | Full development environment |\n| `basic` | Basic utilities, dev/break-glass direct SSH | Minimal — custom setup via cloud-init |\n\n```bash\n./images/qemu/provision-vm.sh my-agent \\\n  --profile agentic-dev \\\n  --cpus 8 \\\n  --memory 16384 \\\n  --disk 100G \\\n  --agentshare \\\n  --start\n```\n\n### Loadout Manifests\n\nDeclarative YAML manifests for composable provisioning. Loadouts specify tools, runtimes, AI providers, and AIWG frameworks without modifying base profiles:\n\n```yaml\napiVersion: loadout/v1\nkind: loadout\nmetadata:\n  name: claude-only\nextends:\n  - layers/base-dev.yaml\n  - providers/claude-code.yaml\naiwg:\n  enabled: true\n  frameworks:\n    - name: all\n      providers: [claude]\n```\n\nSee [docs/LOADOUTS.md](docs/LOADOUTS.md) for the full manifest schema and available options.\n\n---\n\n## Task Orchestration\n\nSubmit tasks to agents via the REST API. The orchestrator assigns tasks to available VMs, manages the workspace, and tracks lifecycle state.\n\n```bash\n# Submit a task\ncurl -X POST http://localhost:8122/api/v1/tasks \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"manifest\": {\n      \"version\": \"1\",\n      \"kind\": \"Task\",\n      \"metadata\": {\n        \"id\": \"\",\n        \"name\": \"SQL injection audit\"\n      },\n      \"repository\": {\n        \"url\": \"https://github.com/myorg/myapp.git\",\n        \"branch\": \"main\"\n      },\n      \"claude\": {\n        \"prompt\": \"Audit the API for SQL injection vulnerabilities\",\n        \"model\": \"claude-sonnet-4-5-20250929\"\n      },\n      \"lifecycle\": {\n        \"timeout\": \"1h\"\n      }\n    }\n  }'\n\n# Check status\ncurl http://localhost:8122/api/v1/tasks/{task_id}\n\n# Stream logs (SSE)\ncurl http://localhost:8122/api/v1/tasks/{task_id}/logs\n\n# List artifacts\ncurl http://localhost:8122/api/v1/tasks/{task_id}/artifacts\n\n# List A2A task artifacts captured by messages:send\ncurl http://localhost:8122/agents/{instance_id}/v1/tasks/{task_id}/artifacts\n```\n\nSee [docs/task-orchestration-api.md](docs/task-orchestration-api.md) for full API details and [docs/task-run-lifecycle.md](docs/task-run-lifecycle.md) for the lifecycle state machine.\n\n---\n\n## Human-in-the-Loop (HITL)\n\nThe server monitors agent PTY output and automatically detects when an agent is waiting for human input. When detected, a HITL request is created and held until resolved.\n\n```bash\n# List pending requests\ncurl http://localhost:8122/api/v1/hitl\n\n# Respond — text is injected directly into the agent's PTY stdin\ncurl -X POST http://localhost:8122/api/v1/hitl/a3f1b2.../respond \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"response\": \"y\"}'\n```\n\nRequests are deduplicated per session — a second prompt won't fire while the first is pending. Once resolved, the slot opens again.\n\n---\n\n## VM Lifecycle\n\n```bash\n# Provision and start\n./images/qemu/provision-vm.sh agent-01 --profile agentic-dev --agentshare --start\n\n# Lifecycle management\nvirsh start agent-01          # start stopped VM\nvirsh shutdown agent-01       # graceful stop\nvirsh destroy agent-01        # force stop\n\n# Rebuild (preserves IP and config)\n./scripts/reprovision-vm.sh agent-01 --profile agentic-dev\n\n# Remove completely\n./scripts/destroy-vm.sh agent-01\n\n# Deploy updated agent binary to running VM\n./scripts/deploy-agent.sh agent-01 --debug\n```\n\nSee [docs/vm-lifecycle.md](docs/vm-lifecycle.md) for the state machine and [docs/LIFECYCLE.md](docs/LIFECYCLE.md) for the full operations reference.\n\n---\n\n## API Reference\n\n### Agents\n\n| Endpoint | Method | Description |\n|----------|--------|-------------|\n| `/api/v1/agents` | GET | List registered agents with metrics and loadout info |\n| `/api/v1/agents/{id}` | GET | Get agent details |\n| `/api/v1/agents/{id}` | DELETE | Remove agent |\n| `/api/v1/agents/{id}/start` | POST | Start agent VM |\n| `/api/v1/agents/{id}/stop` | POST | Stop agent VM |\n| `/api/v1/agents/{id}/destroy` | POST | Force destroy agent VM |\n| `/api/v1/agents/{id}/reprovision` | POST | Reprovision agent VM |\n\n### Tasks\n\n| Endpoint | Method | Description |\n|----------|--------|-------------|\n| `/api/v1/tasks` | GET | List tasks |\n| `/api/v1/tasks` | POST | Submit new task |\n| `/api/v1/tasks/{id}` | GET | Get task status and metadata |\n| `/api/v1/tasks/{id}` | DELETE | Cancel task |\n| `/api/v1/tasks/{id}/logs` | GET | Stream task logs (SSE) |\n| `/api/v1/tasks/{id}/artifacts` | GET | List task artifacts |\n| `/agents/{instance_id}/v1/tasks/{task_id}/artifacts` | GET | List persisted A2A task artifacts |\n| `/agents/{instance_id}/v1/tasks/{task_id}/artifacts/{artifact_id}` | GET | Return one persisted A2A task artifact |\n\n### VMs\n\n| Endpoint | Method | Description |\n|----------|--------|-------------|\n| `/api/v1/vms` | GET | List all VMs |\n| `/api/v1/vms` | POST | Create VM |\n| `/api/v1/vms/{name}` | GET | Get VM details |\n| `/api/v1/vms/{name}/start` | POST | Start VM |\n| `/api/v1/vms/{name}/stop` | POST | Graceful stop |\n| `/api/v1/vms/{name}/destroy` | POST | Force stop |\n| `/api/v1/vms/{name}` | DELETE | Delete VM |\n\n### Human-in-the-Loop\n\n| Endpoint | Method | Description |\n|----------|--------|-------------|\n| `/api/v1/hitl` | GET | List pending HITL requests |\n| `/api/v1/agents/{id}/hitl` | POST | Create HITL request for agent (returns 409 on duplicate) |\n| `/api/v1/hitl/{id}/respond` | POST | Submit response — injects text into PTY stdin |\n\n### Screen Observer\n\n| Endpoint | Method | Description |\n|----------|--------|-------------|\n| `/api/v1/sessions/{id}/screen` | GET | Current PTY screen snapshot (no WebSocket needed) |\n| `/ws/sessions/{id}/orchestrate` | WS | Live screen updates; defaults to observer/read-only. Add `?role=controller` to allow write/resize/signal frames. |\n\n### System\n\n| Endpoint | Method | Description |\n|----------|--------|-------------|\n| `/api/v1/secrets` | GET / POST / DELETE | Retired legacy shared-secret endpoint; use transport identity credentials |\n| `/api/v1/events` | GET | VM lifecycle event stream (SSE) |\n| `/healthz` | GET | Liveness probe |\n| `/readyz` | GET | Readiness probe |\n| `/metrics` | GET | Prometheus metrics |\n\n### gRPC (Port 8120)\n\n```protobuf\nservice AgentService {\n  rpc Connect(stream AgentMessage) returns (stream ManagementMessage);\n  rpc Exec(ExecRequest) returns (stream ExecOutput);\n}\n```\n\n### WebSocket (Port 8121)\n\nReal-time push of agent metrics, PTY output, session events, and task progress. Used by the dashboard and external monitoring clients.\n\n---\n\n## Configuration\n\n### Management Server\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `LISTEN_ADDR` | `127.0.0.1:8120` | Plain gRPC listen address (WS = port+1, HTTP = port+2); use secure side channels such as UDS, vsock, or mTLS for agent identity |\n| `SECRETS_DIR` | `.run/secrets` | Directory containing management secrets, bootstrap enrollment tokens, and local mTLS CA material |\n| `RUST_LOG` | `info` | Log level: `trace`, `debug`, `info`, `warn`, `error` |\n| `LOG_FORMAT` | `pretty` | Log format: `pretty`, `json`, `compact` |\n| `HEARTBEAT_TIMEOUT` | `90` | Seconds before marking agent disconnected |\n| `METRICS_ENABLED` | `true` | Enable Prometheus metrics export |\n| `AIWG_SERVE_ENDPOINT` | — | aiwg serve base URL (integration disabled if unset) |\n| `AIWG_SERVE_NAME` | `agentic-sandbox` | Display name in aiwg serve dashboard |\n\n### Agent Client\n\n| Variable | Required | Description |\n|----------|----------|-------------|\n| `AGENT_ID` | Yes | Unique identifier for this agent |\n| `MANAGEMENT_SERVER` | Yes | Server address, e.g. `192.168.122.1:8120` |\n| `AGENT_TRANSPORT` | Secure transport | `auto` for mTLS-backed secure transport |\n| `AGENT_GRPC_TLS_CA` / `AGENT_GRPC_TLS_CERT` / `AGENT_GRPC_TLS_KEY` | Secure transport | Guest paths to gRPC mTLS client material |\n| `HEARTBEAT_INTERVAL` | No | Seconds between heartbeats (default: 30) |\n\nOverride settings in `management/.run/dev.env` without modifying environment.\n\n---\n\n## Monitoring\n\nThe management server exports Prometheus metrics at `/metrics`:\n\n```\nagentic_agents_connected         # Connected agent count\nagentic_agents_ready             # Ready agents\nagentic_tasks_running            # Active tasks\nagentic_tasks_completed_total    # Total completed tasks\nagentic_commands_total           # Commands dispatched\nagentic_commands_duration_ms     # Command execution latency (histogram)\n```\n\nSet up Prometheus and AlertManager:\n\n```bash\ncd scripts/prometheus \u0026\u0026 ./deploy.sh\n# Prometheus: http://localhost:9090\n# AlertManager: http://localhost:9093\n```\n\nSee [docs/monitoring.md](docs/monitoring.md) and [docs/observability/](docs/observability/) for alerting rules and dashboards.\n\n---\n\n## Development\n\n```bash\n# Full cycle: rebuild server + agent, deploy to all running VMs\n./scripts/dev-deploy-all.sh --debug\n\n# Deploy agent binary to a specific VM\n./scripts/deploy-agent.sh agent-01 --debug\n\n# Management server live-reload\ncd management \u0026\u0026 ./dev.sh\n\n# Unit tests\ncd management \u0026\u0026 cargo test\ncd agent-rs \u0026\u0026 cargo test\n```\n\n### Testing\n\nThe test surface is Rust-native end to end (the legacy pytest harness was\nretired in v2026.6.0). Tiers, fastest first:\n\n```bash\n# Unit tests — no external dependencies\ncd management \u0026\u0026 cargo test\ncd agent-rs \u0026\u0026 cargo test\n\n# Host-local Rust E2E — spins up an isolated management server per test\ncd management \u0026\u0026 AGENTIC_RUN_RUST_E2E=1 cargo test --test e2e_server_health -- --nocapture\n\n# VM-backed Rust E2E — requires KVM/libvirt and a provisioned base image\ncd management \u0026\u0026 AGENTIC_RUN_RUST_VM_E2E=1 cargo test --test e2e_resource_limits -- --nocapture\n\n# Full E2E lane (host + VM slices, with runner preflight) — what CI runs\n./scripts/run-e2e-tests.sh\n\n# Live-agent conformance tier — terminal states, HITL, adapter-command;\n# synthetic fixtures only\nscripts/test-live-agent-conformance.sh\n\n# Chaos tests\n./scripts/chaos/run-all.sh\n```\n\nE2E suites live in `management/tests/` (`e2e_server_health`,\n`e2e_agent_registration`, `e2e_command_dispatch`, `e2e_concurrent_agents`,\n`e2e_resource_limits`).\n\n### Directory Structure\n\n```\nagentic-sandbox/\n├── management/             # Management server (Rust)\n│   ├── src/\n│   │   ├── http/          # REST API handlers\n│   │   ├── orchestrator/  # Task orchestration engine\n│   │   ├── telemetry/     # Logging, metrics, tracing\n│   │   ├── ws/            # WebSocket hub and connections\n│   │   ├── hitl.rs        # HITL request store\n│   │   ├── aiwg_serve.rs  # Outbound aiwg serve integration\n│   │   ├── screen_state.rs # PTY screen observer\n│   │   ├── prompt_detector.rs # HITL prompt heuristics\n│   │   └── crash_loop.rs  # Crash loop detection\n│   └── ui/                # Embedded web dashboard\n├── agent-rs/              # Agent client (Rust)\n├── cli/                   # CLI tool — VM management\n├── proto/                 # gRPC protocol definitions\n├── images/qemu/           # VM provisioning scripts and loadout profiles\n├── scripts/               # Utility and deployment scripts\n├── configs/               # Security profiles (seccomp)\n├── docs/                  # Reference documentation\n└── tests/                 # Test data and E2E documentation\n```\n\n---\n\n## Documentation\n\n| Document | Description |\n|----------|-------------|\n| [Architecture](docs/ARCHITECTURE.md) | System design and component relationships |\n| [Positioning](docs/positioning.md) | Design axes and when this is (or isn't) a good fit |\n| [Security Status](docs/security/security-status.md) | Dated public security claim boundaries and evidence links |\n| [API Reference](docs/API.md) | Complete HTTP, gRPC, and WebSocket API |\n| [WebSocket Protocol](docs/ws-protocol.md) | Per-message reference: legacy agent-scoped + formal session-registry protocols |\n| [CLI Design](docs/cli-design.md) | `sandboxctl` operator/admin CLI taxonomy and acceptance criteria |\n| [Deployment Guide](docs/DEPLOYMENT.md) | Installation and production configuration |\n| [Operations Guide](docs/OPERATIONS.md) | Day-to-day operations and runbooks |\n| [Loadouts](docs/LOADOUTS.md) | Declarative VM provisioning manifests |\n| [Agentshare Storage](docs/agentshare.md) | virtiofs storage layout and usage |\n| [Task Orchestration](docs/task-orchestration-api.md) | Task API and lifecycle |\n| [Task Run Lifecycle](docs/task-run-lifecycle.md) | State machine and transitions |\n| [Session Reconciliation](docs/SESSION_RECONCILIATION.md) | Session recovery after restarts |\n| [VM Lifecycle](docs/vm-lifecycle.md) | VM state machine and management |\n| [Troubleshooting](docs/TROUBLESHOOTING.md) | Common issues and fixes |\n| [Monitoring](docs/monitoring.md) | Prometheus metrics and alerting |\n| [Observability](docs/observability/) | Full observability setup |\n| [Reliability](docs/reliability-README.md) | Reliability patterns and quickstart |\n\n---\n\n## Roadmap\n\n- [x] QEMU/KVM provisioning with cloud-init\n- [x] Management server (Rust/gRPC/WebSocket/HTTP)\n- [x] Agent client with registration, heartbeat, and metrics\n- [x] virtiofs shared storage (global/inbox)\n- [x] Web dashboard with live terminal access\n- [x] Task orchestration with artifact collection\n- [x] Claude Code integration\n- [x] `sandboxctl` operator/admin CLI ([design](docs/cli-design.md))\n- [x] Declarative loadout manifest system\n- [x] Prometheus metrics and AlertManager alerting\n- [x] Session reconciliation after server restart\n- [x] VM pooling and resource quotas\n- [x] PTY screen observer (server-side virtual terminal snapshots)\n- [x] Human-in-the-Loop detection and REST API\n- [x] aiwg serve outbound registration and event streaming\n- [x] Crash loop detection and alerting\n- [x] Docker runtime with rootless containers\n- [x] Rust-native E2E suite and conformance tiers (live-agent, restart durability)\n- [x] Self-healing CI lane (Docker daemon recovery, bounded E2E, stale-VM reaping)\n- [x] Authenticated agent transports — UDS / vsock / mTLS with SPIFFE\n  identity, bootstrap CSR enrollment, and local/remote CA backend boundary\n  ([accepted plan](https://github.com/jmagly/agentic-sandbox/blob/main/.aiwg/architecture/agent-transport-security-sad.md);\n  see [CA backend operations](docs/security/agent-transport-ca-backends.md))\n- [ ] Multi-host orchestration\n- [ ] Kubernetes operator\n\n---\n\n## License\n\nAGPL-3.0-only — see [LICENSE](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjmagly%2Fagentic-sandbox","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjmagly%2Fagentic-sandbox","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjmagly%2Fagentic-sandbox/lists"}