{"id":46640226,"url":"https://github.com/mostlydev/cllama","last_synced_at":"2026-05-10T04:38:56.745Z","repository":{"id":340923555,"uuid":"1168124527","full_name":"mostlydev/cllama","owner":"mostlydev","description":"The blood-brain barrier for autonomous agents. A context-aware LLM governance proxy that enforces credential starvation —  identity-verified, provider-routed, cost-tracked, and audit-logged.","archived":false,"fork":false,"pushed_at":"2026-04-26T22:01:47.000Z","size":2015,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-04-27T00:08:53.664Z","etag":null,"topics":["ai-agents","inference-api","inference-gateway","llm","llm-inference","llm-proxy"],"latest_commit_sha":null,"homepage":"https://clawdapus.dev/guide/cllama.html","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mostlydev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-27T03:18:27.000Z","updated_at":"2026-04-26T22:01:41.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/mostlydev/cllama","commit_stats":null,"previous_names":["mostlydev/cllama-passthrough","mostlydev/cllama"],"tags_count":12,"template":false,"template_full_name":null,"purl":"pkg:github/mostlydev/cllama","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mostlydev%2Fcllama","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mostlydev%2Fcllama/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mostlydev%2Fcllama/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mostlydev%2Fcllama/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mostlydev","download_url":"https://codeload.github.com/mostlydev/cllama/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mostlydev%2Fcllama/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32416146,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T06:29:02.080Z","status":"ssl_error","status_checked_at":"2026-04-29T06:29:00.631Z","response_time":110,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","inference-api","inference-gateway","llm","llm-inference","llm-proxy"],"created_at":"2026-03-08T02:20:39.609Z","updated_at":"2026-05-10T04:38:56.738Z","avatar_url":"https://github.com/mostlydev.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ![cllama](docs/art/cllama-logo.png)\n\n**The blood-brain barrier for autonomous agents.**\n\n`cllama` is the reference implementation of the [cllama proxy standard](https://github.com/mostlydev/clawdapus/blob/master/docs/CLLAMA_SPEC.md) — a context-aware, bidirectional LLM governance proxy that enforces **credential starvation** on untrusted agent workloads.\n\nIt is a single Go binary with zero dependencies. 15 MB distroless image. Two ports: `:8080` for the OpenAI-compatible API, `:8081` for the operator dashboard. Every agent request is identity-verified, provider-routed, cost-tracked, and audit-logged — transparently. The agent never knows the proxy exists.\n\n```mermaid\nflowchart LR\n  A[Agent\u003cbr/\u003e\u003ci\u003ebearer token\u003c/i\u003e] --\u003e|request| P[cllama-passthrough\u003cbr/\u003e\u003cb\u003eidentity → route → swap key\u003c/b\u003e\u003cbr/\u003e\u003ci\u003eextract usage → record cost\u003c/i\u003e]\n  P --\u003e|real key| U[Provider\u003cbr/\u003e\u003ci\u003eOpenAI · Anthropic\u003cbr/\u003eOpenRouter · Google · Ollama\u003c/i\u003e]\n  U --\u003e|response| P\n  P --\u003e|response| A\n  P --- D[:8081 dashboard\u003cbr/\u003e\u003ci\u003eproviders · pod · costs · api\u003c/i\u003e]\n```\n\n---\n\n## The Architecture\n\nA `cllama` proxy sits between the runner (the agent's application code) and the LLM provider. In the [Clawdapus](https://github.com/mostlydev/clawdapus) architecture, agents are treated as untrusted workloads — containers that can think, but whose compute access is a privilege granted by the operator, not a right assumed by the process.\n\n**Credential starvation** is the enforcement mechanism. The agent container is provisioned with a unique bearer token (`\u003cagent-id\u003e:\u003c48-hex-secret\u003e`). The proxy holds the real provider API keys. Because the agent lacks the credentials to call providers directly, all inference *must* transit the proxy — even if a compromised agent tries to bypass its configured base URL.\n\nThe \"passthrough\" reference performs no cognitive mutation. It verifies identity, routes to the correct upstream, swaps credentials, streams the response, extracts token usage, and records cost. Future proxy types (`cllama-policy`) will add bidirectional interception — evaluating outbound prompts against the agent's behavioral contract, and amending or dropping inbound responses that drift from purpose.\n\n### Request lifecycle\n\n```\n1.  Agent sends POST /v1/chat/completions\n    Authorization: Bearer tiverton:a1b2c3d4e5f6...\n    {\"model\": \"anthropic/claude-sonnet-4\", \"messages\": [...]}\n\n2.  Identity resolution\n    Parse bearer token → load /claw/context/tiverton/metadata.json\n    Validate secret (constant-time comparison)\n\n3.  Provider routing\n    Split model on \"/\" → provider=anthropic, model=claude-sonnet-4\n    Look up provider config → base_url, auth scheme, real API key\n\n4.  Credential swap\n    Strip agent's bearer token\n    Inject real key (Bearer, X-Api-Key, or none — per provider)\n\n5.  Forward + stream\n    Proxy request to upstream, stream response back transparently\n\n6.  Cost extraction\n    Parse usage from response body (JSON or SSE stream)\n    Multiply by pricing table → record per (agent, provider, model)\n\n7.  Audit log\n    Emit structured JSON to stdout: timestamp, agent, model,\n    latency, status, tokens_in, tokens_out, cost_usd, intervention\n```\n\n---\n\n## Building\n\n```bash\n# Binary\ngo build -o cllama ./cmd/cllama\n\n# Docker (~15 MB distroless)\ndocker build -t ghcr.io/mostlydev/cllama:latest .\n```\n\nZero external dependencies. Go standard library only.\n\n---\n\n## Running\n\n```bash\n./cllama\n```\n\nOr with Docker:\n\n```bash\ndocker run -p 8080:8080 -p 8081:8081 \\\n  -e ANTHROPIC_API_KEY=sk-ant-... \\\n  -e OPENROUTER_API_KEY=sk-or-... \\\n  -e GEMINI_API_KEY=sk-gemini-... \\\n  -v ./context:/claw/context:ro \\\n  ghcr.io/mostlydev/cllama:latest\n```\n\n---\n\n## Configuration\n\n### Environment\n\n| Variable | Default | Purpose |\n|---|---|---|\n| `LISTEN_ADDR` | `:8080` | API server |\n| `UI_ADDR` | `:8081` | Operator dashboard |\n| `CLAW_CONTEXT_ROOT` | `/claw/context` | Per-agent context mount |\n| `CLAW_AUTH_DIR` | `/claw/auth` | Provider credentials |\n| `CLAW_POD` | | Pod name (dashboard display) |\n| `CLAW_SESSION_HISTORY_DIR` | `/claw/session-history` | Per-agent JSONL session history base dir. When set, cllama appends one entry per successful 2xx upstream completion to `\u003cdir\u003e/\u003cagent-id\u003e/history.jsonl`. |\n| `OPENAI_API_KEY` | | Provider key override |\n| `ANTHROPIC_API_KEY` | | Provider key override |\n| `OPENROUTER_API_KEY` | | Provider key override |\n| `GEMINI_API_KEY` | | Primary Google Gemini provider key override |\n| `GOOGLE_API_KEY` | | Lower-priority alias for the Google Gemini provider key |\n| `GOOGLE_BASE_URL` | | Override for Google's OpenAI-compatible base URL |\n| `AI_GATEWAY_API_KEY` | | Vercel AI Gateway provider key override |\n| `AI_GATEWAY_BASE_URL` | `https://ai-gateway.vercel.sh/v1` | Override for Vercel AI Gateway's OpenAI-compatible base URL |\n\nEnvironment variables override keys saved via the web UI.\n\nFor Vercel AI Gateway routing, declare models as `vercel/\u003cprovider\u003e/\u003cmodel\u003e`,\nfor example `vercel/anthropic/claude-sonnet-4.6`. cllama forwards the model\nsuffix to Vercel's OpenAI-compatible endpoint. The Anthropic `/v1/messages`\npath remains native Anthropic-only.\n\n### Agent context\n\nEach agent is a subdirectory under `CLAW_CONTEXT_ROOT`:\n\n```\n/claw/context/\n├── tiverton/\n│   ├── metadata.json     # bearer token, pod, service, type\n│   ├── AGENTS.md         # behavioral contract\n│   └── CLAWDAPUS.md      # infrastructure map\n├── westin/\n│   └── ...\n└── allen/\n    └── ...\n```\n\n`metadata.json`:\n```json\n{\n  \"token\": \"tiverton:a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6\",\n  \"pod\": \"trading-desk\",\n  \"service\": \"tiverton\",\n  \"type\": \"openclaw\"\n}\n```\n\nWhen orchestrated by Clawdapus, `claw up` generates all of this — tokens via `crypto/rand`, context from the pod manifest, provider keys injected only into the proxy env.\n\n### Provider registry\n\n`providers.json` in `CLAW_AUTH_DIR`:\n\n```json\n{\n  \"providers\": {\n    \"anthropic\": {\n      \"base_url\": \"https://api.anthropic.com/v1\",\n      \"api_key\": \"sk-ant-...\",\n      \"auth\": \"x-api-key\"\n    },\n    \"openrouter\": {\n      \"base_url\": \"https://openrouter.ai/api/v1\",\n      \"api_key\": \"sk-or-...\",\n      \"auth\": \"bearer\"\n    },\n    \"google\": {\n      \"base_url\": \"https://generativelanguage.googleapis.com/v1beta/openai\",\n      \"api_key\": \"sk-gemini-...\",\n      \"auth\": \"bearer\"\n    },\n    \"ollama\": {\n      \"base_url\": \"http://ollama:11434/v1\",\n      \"auth\": \"none\"\n    }\n  }\n}\n```\n\nAuth schemes: `bearer` (OpenAI, OpenRouter, Google), `x-api-key` (Anthropic), `none` (Ollama, local models).\n\n---\n\n## Operator Dashboard\n\nBuilt-in web UI on `:8081`. No JavaScript frameworks, no build step — Go templates compiled into the binary.\n\n| Page | Path | Function |\n|---|---|---|\n| Providers | `/` | Manage upstream provider configs. Routing diagram. Add/update/delete. |\n| Pod | `/pod` | Agent cards — type, request count, cost, models used. |\n| Costs | `/costs` | Real-time spend. Total banner, per-agent breakdown, nested model detail. |\n| Costs API | `/costs/api` | JSON. Pipe to Grafana, alerting, `jq`, or the Master Claw. |\n\nCost state is in-memory — resets on restart. Structured logs on stdout are the durable audit record.\n\n---\n\n## API Surface\n\n**`:8080` — Proxy API**\n\n| Method | Path | Description |\n|---|---|---|\n| `POST` | `/v1/chat/completions` | OpenAI-compatible chat completions |\n| `POST` | `/v1/messages` | Anthropic Messages API (native format) |\n| `GET` | `/health` | `{\"ok\": true}` |\n\nBoth endpoints support streaming. The Anthropic endpoint forwards `Anthropic-Version` and `Anthropic-Beta` headers and uses `X-Api-Key` authentication automatically.\n\n---\n\n## Audit Logging\n\nEvery request/response pair emits a structured JSON log line to stdout:\n\n```json\n{\n  \"ts\": \"2026-02-27T15:23:45Z\",\n  \"claw_id\": \"tiverton\",\n  \"type\": \"response\",\n  \"model\": \"anthropic/claude-sonnet-4\",\n  \"latency_ms\": 1250,\n  \"status_code\": 200,\n  \"tokens_in\": 100,\n  \"tokens_out\": 50,\n  \"cost_usd\": 0.0105,\n  \"intervention\": null\n}\n```\n\n`intervention` is always `null` in passthrough mode. Policy proxies will populate it with the rule that triggered an amendment, drop, or reroute — the raw material for drift scoring.\n\nThese logs feed `docker compose logs`, fleet telemetry pipelines, and `claw audit`.\n\n---\n\n## Session History\n\nWhen `CLAW_SESSION_HISTORY_DIR` is set, cllama writes a durable JSONL session history for each agent. This is separate from the structured audit logs emitted to stdout.\n\n### Layout\n\n```\n/claw/session-history/\n├── tiverton/\n│   └── history.jsonl\n├── westin/\n│   └── history.jsonl\n```\n\nOne file per agent. Each line is one entry, appended on every successful upstream completion (HTTP 2xx). Non-2xx responses are not recorded here — they appear only in the stdout audit log.\n\n### Entry fields\n\n| Field | Description |\n|---|---|\n| `version` | Schema version (`1`). |\n| `id` | Stable source-event ID for replay/deduplication. |\n| `ts` | RFC3339 timestamp of when the response was received. |\n| `claw_id` | Agent ID. |\n| `path` | Request path (e.g., `/v1/chat/completions`). |\n| `requested_model` | Model string as sent by the agent. |\n| `effective_provider` | Provider name after routing. |\n| `effective_model` | Model forwarded to the upstream. |\n| `status_code` | HTTP status code from upstream. |\n| `stream` | Whether the response was streamed (SSE). |\n| `request_original` | Request body as received from the agent. |\n| `request_effective` | Request body as forwarded to the upstream (after credential swap and model rewrite). |\n| `response` | `{format, json?, text?}` — `format` is `\"json\"` or `\"sse\"`. JSON responses include `json` (parsed body); SSE responses include `text` (raw event stream). |\n| `usage` | `{prompt_tokens, completion_tokens}` extracted from the response. |\n| `usage.reported_cost_usd` | float | Provider-reported cost in USD; `omitempty` |\n\n### Clawdapus wiring\n\nWhen orchestrated by Clawdapus, `claw up` automatically bind-mounts `.claw-session-history/` (relative to the pod file) into the cllama container at `/claw/session-history` whenever cllama is enabled for any service in the pod. No manual volume configuration is required.\n\nSession history is infrastructure-owned. Agents do not write to it and have no read API against it in Phase 1. The JSONL files are accessible to operators via the host filesystem for offline analysis and auditing.\n\n---\n\n## The cllama Standard\n\n`cllama` is an open standard for context-aware LLM governance proxies. Any OpenAI-compatible proxy image that can consume Clawdapus context can act as a governance layer. The [full specification](https://github.com/mostlydev/clawdapus/blob/master/docs/CLLAMA_SPEC.md) defines:\n\n- **Bidirectional interception** — outbound prompt evaluation, inbound response amendment\n- **Multi-agent identity** — single proxy serves an entire pod, resolving callers by bearer token\n- **Compute metering** — per-agent budgets, model downgrading, rate limiting\n- **Structured telemetry** — intervention logs for independent drift scoring\n\nThe passthrough reference implements the transport layer: identity, routing, cost tracking, audit logging. It establishes the plumbing that policy proxies build on.\n\n```mermaid\nflowchart LR\n  subgraph today[Today]\n    direction LR\n    R1[runner] --\u003e PT1[passthrough\u003cbr/\u003e\u003ci\u003eroute · meter · log\u003c/i\u003e] --\u003e P1[provider]\n  end\n\n  subgraph future[Future]\n    direction LR\n    R2[runner] --\u003e PO[policy\u003cbr/\u003e\u003ci\u003escope · gate · amend\u003c/i\u003e] --\u003e PT2[passthrough\u003cbr/\u003e\u003ci\u003eroute · meter · log\u003c/i\u003e] --\u003e P2[provider]\n  end\n```\n\n---\n\n## Part of Clawdapus\n\nThis proxy is one component in [Clawdapus](https://github.com/mostlydev/clawdapus) — infrastructure-layer governance for AI agent containers. Docker on Rails for Claws.\n\n```\nClawfile            extended Dockerfile → OCI image\nclaw-pod.yml        extended docker-compose → governed fleet\nclaw up     transpile, enforce, wire cllama, deploy\ncllama              credential starvation + cost accounting + audit trail\n```\n\nStandalone operation is fully supported. Set up the context directory, write a `providers.json`, point your agents at `:8080`, and the proxy does the rest.\n\n---\n\n## Roadmap\n\n| Feature | Description |\n|---|---|\n| **Budget enforcement** | Hard spend caps per agent. `429` when exceeded. The agent's budget is a configuration concern, not a prompt concern. |\n| **Model allowlisting** | Per-agent model ACLs from `metadata.json`. |\n| **Persistent cost state** | Survive restarts. Rebuild from audit logs or external store. |\n| **`cllama-policy`** | Bidirectional interception. Reads the behavioral contract. Makes allow/deny/amend decisions on live LLM traffic. The passthrough is the plumbing; the policy proxy is the brain. |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmostlydev%2Fcllama","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmostlydev%2Fcllama","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmostlydev%2Fcllama/lists"}