{"id":50728184,"url":"https://github.com/esanmohammad/anvil","last_synced_at":"2026-06-10T06:01:35.906Z","repository":{"id":352167640,"uuid":"1213612492","full_name":"esanmohammad/Anvil","owner":"esanmohammad","description":"Provider-agnostic AI dev pipeline: clarify → plan → build → review → PR across your repos, mixing LLM providers per stage with your own keys. No vendor lock-in, no markup.","archived":false,"fork":false,"pushed_at":"2026-06-10T04:38:52.000Z","size":14462,"stargazers_count":16,"open_issues_count":0,"forks_count":3,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-10T06:01:10.368Z","etag":null,"topics":["agentic-ai","ai","ai-agents","ai-pipeline","automation","byok","claude","cli","code-generation","developer-tools","generative-ai","llm","llmops","local-llm","mcp","model-context-protocol","nodejs","ollama","openai","typescript"],"latest_commit_sha":null,"homepage":"https://drive.google.com/file/d/1IHTQiLEQ4tulpdbUzI2iD5dmy82jBK-m/view?usp=sharing","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/esanmohammad.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-17T15:13:43.000Z","updated_at":"2026-06-10T04:39:01.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/esanmohammad/Anvil","commit_stats":null,"previous_names":["esanmohammad/anvil"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/esanmohammad/Anvil","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/esanmohammad%2FAnvil","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/esanmohammad%2FAnvil/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/esanmohammad%2FAnvil/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/esanmohammad%2FAnvil/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/esanmohammad","download_url":"https://codeload.github.com/esanmohammad/Anvil/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/esanmohammad%2FAnvil/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34139182,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-10T02:00:07.152Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agentic-ai","ai","ai-agents","ai-pipeline","automation","byok","claude","cli","code-generation","developer-tools","generative-ai","llm","llmops","local-llm","mcp","model-context-protocol","nodejs","ollama","openai","typescript"],"created_at":"2026-06-10T06:00:29.512Z","updated_at":"2026-06-10T06:01:35.875Z","avatar_url":"https://github.com/esanmohammad.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003cbr /\u003e\n\n\u003cpicture\u003e\n  \u003cimg alt=\"Anvil\" src=\"https://img.shields.io/badge/A%20N%20V%20I%20L-1a1a1a?style=for-the-badge\u0026labelColor=1a1a1a\u0026color=1a1a1a\" height=\"80\" /\u003e\n\u003c/picture\u003e\n\n\u003cbr /\u003e\u003cbr /\u003e\n\n# The provider-agnostic AI development pipeline\n\n\u003ch3\u003e\n  \u003ci\u003eUse your own keys. Mix providers per stage. Pay per token, not per seat.\u003c/i\u003e\n\u003c/h3\u003e\n\n\u003cp\u003e\n  Anvil ships features end-to-end — clarify, plan, build, review, PR —\u003cbr /\u003e\n  across every repo in your project, on whatever model is cheapest for each stage.\u003cbr /\u003e\n  \u003cb\u003eNo vendor lock-in. No markup. No hosted plan.\u003c/b\u003e\n\u003c/p\u003e\n\n\u003cp\u003e\n  \u003csub\u003e\n  Anvil is an \u003cb\u003eopen-source, self-hosted AI coding agent\u003c/b\u003e — an end-to-end \u003cb\u003eLLM pipeline\u003c/b\u003e\n  (clarify → plan → build → review → PR) that runs on Claude, GPT, Gemini, and OpenRouter, or fully\n  local via Ollama \u0026amp; OpenCode. It speaks the \u003cb\u003eModel Context Protocol (MCP)\u003c/b\u003e and uses your own\n  API keys. Written in TypeScript.\n  \u003c/sub\u003e\n\u003c/p\u003e\n\n\u003cbr /\u003e\n\n\u003cp\u003e\n  \u003ca href=\"docs/getting-started.md\"\u003e\u003cimg src=\"https://img.shields.io/badge/-Get%20started-2563eb?style=for-the-badge\u0026logo=rocket\u0026logoColor=white\" alt=\"Get started\"\u003e\u003c/a\u003e\n  \u003ca href=\"#what-you-can-do-with-anvil\"\u003e\u003cimg src=\"https://img.shields.io/badge/-Features-1f2937?style=for-the-badge\" alt=\"Features\"\u003e\u003c/a\u003e\n  \u003ca href=\"#observability-opt-in\"\u003e\u003cimg src=\"https://img.shields.io/badge/-Observability-1f2937?style=for-the-badge\" alt=\"Observability\"\u003e\u003c/a\u003e\n  \u003ca href=\"examples/\"\u003e\u003cimg src=\"https://img.shields.io/badge/-Examples-1f2937?style=for-the-badge\" alt=\"Examples\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://drive.google.com/file/d/1IHTQiLEQ4tulpdbUzI2iD5dmy82jBK-m/view?usp=sharing\"\u003e\u003cimg src=\"https://img.shields.io/badge/-Demo-ef4444?style=for-the-badge\u0026logo=googledrive\u0026logoColor=white\" alt=\"Demo\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp\u003e\n  \u003cimg src=\"https://img.shields.io/badge/version-0.3.0-3b82f6.svg\" alt=\"Version 0.3.0\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/license-MIT-3b82f6.svg\" alt=\"MIT\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/node-%E2%89%A518-339933.svg\" alt=\"Node 18+\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/TypeScript-5.8-3178c6.svg\" alt=\"TypeScript\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/providers-8-a855f7.svg\" alt=\"8 providers\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/local%20models-Ollama%20%2B%20OpenCode-22c55e.svg\" alt=\"Local models\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/MVP%202-active-f97316.svg\" alt=\"MVP 2 active\" /\u003e\n\u003c/p\u003e\n\n\u003cbr /\u003e\n\n[![Anvil pipeline builder showcase](./assets/AI_Pipeline_Builder_Showcase.gif)](https://drive.google.com/file/d/1IHTQiLEQ4tulpdbUzI2iD5dmy82jBK-m/view?usp=sharing)\n\n\u003csub\u003e\u003ci\u003eDashboard preview \u0026mdash; pipeline orchestration, live agent activity, knowledge graph, cost ledger.\u003cbr /\u003e\n\u003cb\u003e\u003ca href=\"https://drive.google.com/file/d/1IHTQiLEQ4tulpdbUzI2iD5dmy82jBK-m/view?usp=sharing\"\u003eClick the gif to watch the full demo\u003c/a\u003e\u003c/b\u003e\u003c/i\u003e\u003c/sub\u003e\n\n\u003cbr /\u003e\u003cbr /\u003e\n\n\u003c/div\u003e\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\n### **Plan on Claude. Build on Ollama. Review on GPT. Ship on a local model.**\n### One pipeline. Eight providers. Whatever's cheapest for each stage.\n\n\u003c/div\u003e\n\n---\n\n## Why teams pick Anvil\n\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd width=\"50%\" valign=\"top\"\u003e\n\n#### Mix providers within a single pipeline\nRouting is per-stage, not per-run. A single feature can flow through\nthree different providers without you lifting a finger. The pipeline\ndoesn't care which one ran which step.\n\n\u003c/td\u003e\n\u003ctd width=\"50%\" valign=\"top\"\u003e\n\n#### Cheap by design\nRouting-by-stage means premium models only show up where premium\nmodels actually matter. Read-only research and tight fix loops stay\non the free tier — *always*. Live cost ledger per call.\n\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"50%\" valign=\"top\"\u003e\n\n#### No vendor SDK lock-in\nEvery HTTP adapter is hand-rolled `fetch()`. No `@anthropic-ai/sdk`,\nno `openai` package, no LangChain, no Vercel AI SDK. Drop a model —\nyour code keeps compiling.\n\n\u003c/td\u003e\n\u003ctd width=\"50%\" valign=\"top\"\u003e\n\n#### Bring your own keys, or don't\nOllama works fully offline. OpenCode's $10/mo Zen subscription\nreplaces the entire local tier — no GPU required. Cloud is for the\nfew stages that warrant it.\n\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n---\n\n## Quick start\n\n```sh\n# 1. Install\nnpm install -g @esankhan3/anvil-cli\n\n# 2. Set up a project (interactive — answers a handful of questions)\nanvil init\n\n# 3. Open the dashboard and ship\nanvil dashboard\n```\n\nThat's the whole onboarding. `anvil init` creates `~/.anvil/`,\nseeds `models.yaml`, scaffolds your project's `factory.yaml`, and\nruns a health check. `anvil dashboard` boots the WebSocket\ncontrol plane and opens the UI.\n\n\u003e **First time?** The full walk-through — prerequisites, where to\n\u003e get provider keys, what `anvil init` will ask you, troubleshooting\n\u003e — lives in [`docs/getting-started.md`](docs/getting-started.md).\n\n---\n\n## Provider-agnostic by design\n\nEight providers ship in the box. One config file picks them per\nstage. Each adapter speaks the same streaming format, the same\n`UpstreamError` retry shape, the same per-call cost calculation.\n\n\u003cdiv align=\"center\"\u003e\n\n| Provider | Tier slot | Best for |\n|:---|:---:|:---|\n| **OpenCode** (Zen) | `local` | Hosted open-coding models, $10/mo flat — replaces GPU-heavy Ollama |\n| **Ollama** | `local` | Fully offline, your own GPU, embeddings + reranking |\n| **Claude** (CLI) | `cheap` / `premium` | Best-in-class reasoning, native tool use |\n| **OpenAI** | `cheap` / `premium` | GPT-5, o-series reasoning |\n| **Gemini** | `cheap` / `premium` | Long context, Gemini 2.5 Pro |\n| **OpenRouter** | any | Single key, hundreds of models |\n| **Google ADK** | `premium` | When you need ADK's runner semantics |\n| **Gemini CLI** | utility | Subprocess fallback |\n\n\u003c/div\u003e\n\n### One run, three providers, fourteen cents\n\nRouting is per-stage, not per-run. The same feature can flow\nthrough three providers without you lifting a finger:\n\n```\n  clarify     →  Ollama / OpenCode   local           ~ $0.00\n  plan        →  Claude Sonnet       deep analysis   ~ $0.05\n  build       →  Ollama / OpenCode   local           ~ $0.00\n  test        →  Ollama / OpenCode   local           ~ $0.00\n  validate    →  Claude Haiku        cheap + fast    ~ $0.01\n  review      →  Claude Sonnet       judgment-heavy  ~ $0.08\n  ship        →  Ollama / OpenCode   local           ~ $0.00\n                                                  ──────────\n                                                    ~ $0.14\n```\n\nIt's just YAML in `~/.anvil/stage-policy.yaml`. Premium models only\nappear where premium models actually matter. **Read-only research\nand the fix-retry loop are locked to free tier — they cannot\nescalate, by design.** A typical run with Ollama or OpenCode burns\nsingle-digit dollars on cloud calls.\n\n### Auto-failover when a provider misbehaves\n\nIf a model 429s, 5xx's, hits a quota wall, or fails its liveness probe\nmid-run, Anvil's chain-walker **burns it for the rest of the run** and\nfalls through to the next entry in the same tier — same provider or\ndifferent, your call. The pipeline doesn't pause, doesn't surface a\nstack trace at the user, and doesn't double-charge by retrying the same\nbroken model. Every fallback hop emits a routing event so you can see\nexactly which model was skipped and why.\n\n```\nclarify   →  adk:gemini-2.5-flash   ❌  (provider liveness fail)\n          ↪  opencode/kimi-k2.6     ✅  (next in chain, same tier)\nbuild     →  opencode/qwen3.5-plus  ❌  (429 — Alibaba upstream)\n          ↪  opencode/glm-5.1       ✅  (model burned for run, fallback proceeds)\n```\n\nTwo layers of detection: a **proactive** liveness probe at run start\n(Ollama `/api/tags`, env-var presence for cloud) and a **reactive**\nduck-typed `UpstreamError` check on every adapter call. Configurable\nper-run cap on retry attempts in `models.yaml` (`walker.max_attempts`).\n\n### One GPU, many models — exclusive slot serialization\n\nBig local models can't all share a single GPU at the same time.\nIf your `clarify` and `build` stages both want a heavy Ollama\nmodel, naive concurrency ends in an OOM. Anvil's\n`exclusive_slot: true` flag puts those models behind a\n**process-local FIFO queue** so only one exclusive model is ever\nGPU-resident at a time.\n\nThe queue does the dance for you:\n\n- **Hard eviction on switch.** Going from model A → model B\n  explicitly tells Ollama to release A's weights, then polls until\n  the GPU has actually freed them before letting B load. No GPU\n  briefly holding both.\n- **Intruder detection.** An out-of-band Ollama session on the host\n  (e.g. you ran `ollama run` in another terminal) gets evicted\n  before the next exclusive load — so two big models can't sneak\n  in side by side.\n- **Embeddings + rerankers bypass.** They're small enough to\n  co-reside, so they never touch the queue.\n- **Same-model calls are free.** Consecutive calls to the same id\n  skip the eviction step entirely. A stage fanning out across repos\n  pays the model-load cost once.\n\nMark a model exclusive in `~/.anvil/models.yaml`:\n\n```yaml\n- id: ollama/qwen2.5-coder:14b\n  provider: ollama\n  tier: local\n  vram_gb: 9\n  exclusive_slot: true        # mandatory for any VRAM-heavy local\n```\n\nLets you mix multiple big local models on the same machine without\nmanual sequencing or OOM kills, regardless of how much VRAM you\nactually have.\n\n### Cost ledger, live\n\nEvery adapter call attaches a real `gen_ai.usage.cost` attribute\ncomputed from a vendored LiteLLM pricing snapshot. The dashboard\nshows you per-call, per-stage, per-run spend in real time. The\nOpenTelemetry export carries the same numbers if you want them in\nLangfuse, Tempo, or Honeycomb.\n\n**No estimates. No surprises.**\n\n---\n\n## What you can do with Anvil\n\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd width=\"33%\" valign=\"top\"\u003e\n\n### Pipeline\nNine-stage feature pipeline — clarify, plan, build, test, validate,\nship — fanned out across every repo in your project. Per-stage\ntool permissions, validate-fix retry loops, chain-fallback across\nmodels when a provider 429s.\n\n\u003c/td\u003e\n\u003ctd width=\"33%\" valign=\"top\"\u003e\n\n### Plan\nGenerates a structured markdown plan before any code is written.\nFiles touched, contracts crossed, risks flagged, cost estimated.\nPlan validators catch missing tests, missing rollback strategies,\nwrong stage routing. The agent can't skip planning.\n\n\u003c/td\u003e\n\u003ctd width=\"33%\" valign=\"top\"\u003e\n\n### PR Review\nMulti-pass automated review with evidence gates, incident binding,\nKB context, scope matching, dismissal filtering, and a verifier\nthat runs the produced tests. Posts inline comments + a summary\nto GitHub.\n\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"33%\" valign=\"top\"\u003e\n\n### Memory\nLong-term project memory with five types — working, episodic,\nsemantic, procedural, profile. Auto-learners propose; a sleeptime\nratifier decides. Code-fact drift detection keeps memories honest\nwhen the underlying file changes.\n\n**📊 83.4% on LoCoMo** (full 1,540-question set,\nvectorize-io's [Agent Memory Benchmark](packages/memory-core/benchmark)) —\nhybrid BM25 + vector + graph retrieval, runs fully local.\n\n\u003c/td\u003e\n\u003ctd width=\"33%\" valign=\"top\"\u003e\n\n### Project\nMulti-repo first. One `factory.yaml` describes your repos,\nlanguages, build commands, and cross-repo connections. Ships\nwith templates for TypeScript, Go, Python, Rust, monorepos, and\nDjango + Celery.\n\n\u003c/td\u003e\n\u003ctd width=\"33%\" valign=\"top\"\u003e\n\n### Knowledge Base\nAST-aware chunking via tree-sitter, hybrid retrieval (vector +\nBM25 + graph + rerank), project graph with 14 cross-repo edge\nstrategies. Same engine the dashboard uses also exposed as an MCP\nserver for any client.\n\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"33%\" valign=\"top\"\u003e\n\n### Settings\nProvider keys, model registry, stage policy, OTel endpoint — all\neditable in the dashboard UI. Writes to `~/.anvil/.env` with a\nstrict allowlist; no env-var injection.\n\n\u003c/td\u003e\n\u003ctd width=\"33%\" valign=\"top\"\u003e\n\n### Convention\nExtracts your codebase's real conventions — naming, imports,\ntests, error handling — formats them as living docs, and promotes\nrecurring violations into hard rules. The agent stops making the\nsame mistake twice.\n\n\u003c/td\u003e\n\u003ctd width=\"33%\" valign=\"top\"\u003e\n\n### History\nEvery run, replayable. Diffs, PR URLs, reviewer verdicts, cost\nbreakdown, model fallbacks taken. Resume any failed run from the\nfailed stage; rollback any shipped run with one click.\n\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"33%\" valign=\"top\"\u003e\n\n### Research\nRead-only investigation — *\"what does this service do?\"* or\n*\"why does this fail?\"* — that never escalates to premium models.\nStays free-tier no matter what, because read-only shouldn't cost\nmore.\n\n\u003c/td\u003e\n\u003ctd width=\"33%\" valign=\"top\"\u003e\n\n### Bug Fix\nTargeted fix workflow with a tight retry loop. Locked to local +\ncheap tier so a failing test doesn't burn premium tokens trying\nthe same thing five times.\n\n\u003c/td\u003e\n\u003ctd width=\"33%\" valign=\"top\"\u003e\n\n### Observability\nOpenTelemetry spans with GenAI semantic conventions. Plug in\nLangfuse, Tempo, Honeycomb, or anything OTLP-compatible. Off by\ndefault. Privacy-safe prompt redaction. Real per-call cost ledger.\n\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"33%\" valign=\"top\"\u003e\n\n### Auto-failover\nProvider goes down, hits a quota wall, or fails its liveness probe?\nThe chain-walker burns the model for the rest of the run and walks\nto the next entry in the same tier — proactive (liveness probe at\nrun start) plus reactive (`UpstreamError` duck-typing on every call).\nNo paused runs, no double charges.\n\n\u003c/td\u003e\n\u003ctd width=\"33%\" valign=\"top\"\u003e\n\n### Resume + rollback\nEvery run is checkpointed per stage. Resume from the failing stage\nwithout re-running the cheap stages before it. Roll back any shipped\nrun with one click — branch + PR delete, restored workspace, audit\nlog preserved.\n\n\u003c/td\u003e\n\u003ctd width=\"33%\" valign=\"top\"\u003e\n\n### MCP server\nAnvil's knowledge-base retriever ships as a standalone MCP server.\nUse it from Claude Code, Claude Desktop, Cursor, or any MCP client —\nsame hybrid retrieval (vector + BM25 + graph + rerank), same project\ngraph, no dashboard required.\n\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"33%\" valign=\"top\"\u003e\n\n### Durable execution \u0026nbsp;\u003csub\u003e\u003ci\u003enew in 0.3.0\u003c/i\u003e\u003c/sub\u003e\nPattern-2 durable execution. Every step and every side effect is\nrecorded to a SQLite event log at `~/.anvil/durable.db`. Kill the\ndashboard mid-run, relaunch, and the run picks up exactly where it\nleft off — recorded effects return their cached result, no double\nspawns, no re-asking the user the same question. `ctx.effect()`,\n`ctx.waitForSignal()`, deterministic clock and uuid.\n\n\u003c/td\u003e\n\u003ctd width=\"33%\" valign=\"top\"\u003e\n\n### Race arbitration \u0026nbsp;\u003csub\u003e\u003ci\u003enew in 0.3.0\u003c/i\u003e\u003c/sub\u003e\nMulti-process lease arbitration. Each run holds a TTL'd lease in\nthe durable store; the live process heartbeats it. Crash mid-run\nand a sibling process auto-takes-over on boot — orphan scan finds\nexpired leases, claims them, and the auto-resume queue replays\nthe workflow from the durable cursor. No manual intervention; no\nduplicated work.\n\n\u003c/td\u003e\n\u003ctd width=\"33%\" valign=\"top\"\u003e\n\n### Policy gates \u0026nbsp;\u003csub\u003e\u003ci\u003enew in 0.3.0\u003c/i\u003e\u003c/sub\u003e\nPolicy editor lives at `/policy` in the dashboard. Toggle pause\ngates per stage (plan, implement, test, ship), set auto-approve\nthresholds on risk + confidence, cap per-run and per-day cost,\nconfigure Q\u0026A budgets. Paused runs surface an in-app banner +\nmodal — approve, reject, modify the artifact, iterate with a\nnote, or rerun from a target stage.\n\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n---\n\n## Observability (opt-in)\n\nTelemetry is **off by default**. When you turn it on, every adapter\ncall emits an OpenTelemetry span with GenAI semantic conventions —\nprompt + completion tokens, cost, latency, model, provider, error\nclass. Plug in any OTLP-compatible backend.\n\n### Two switches, one env var each\n\n```sh\n# 1. Export to a real OTLP collector — Langfuse, Tempo, Honeycomb, …\necho 'OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:3000/api/public/otel/v1/traces' \u003e\u003e ~/.anvil/.env\necho 'OTEL_SERVICE_NAME=anvil-dashboard' \u003e\u003e ~/.anvil/.env\n\n# 2. Or dump spans to stderr — useful for debugging without a collector\necho 'ANVIL_OTEL_CONSOLE=1' \u003e\u003e ~/.anvil/.env\n```\n\nRestart the dashboard and traces start flowing.\n\n### Privacy + noise controls\n\n| Variable | Default | What it does |\n|:---|:---:|:---|\n| `ANVIL_OTEL_DISABLED` | unset | Hard kill-switch — set to `1` to disable everything |\n| `ANVIL_OTEL_RECORD_CONTENT` | `0` | Set `1` to include prompt + completion text on spans (truncated to 8 KB per attribute) |\n| `OTEL_LOG_LEVEL` | `NONE` | Set to `ERROR` / `INFO` / `DEBUG` to surface SDK errors when debugging |\n| `ANVIL_OTEL_BATCH` | unset | Set `1` to batch span exports (lower IO, slightly delayed arrival) |\n\nBy default, spans carry **structure but not content** — model, cost,\nlatency, error class, all attached. Prompts and completions stay on\ndisk only.\n\n### Quick local stack: Langfuse\n\nAnvil ships a tuned Langfuse compose file at\n[`infra/observability/`](infra/observability/) — Langfuse 3.x +\nPostgres + ClickHouse + Redis + MinIO, pre-wired for the OTLP HTTP\nendpoint Anvil exports to. No external clone needed:\n\n```sh\n# Spin up the bundled stack on http://localhost:3000\ndocker compose -f infra/observability/docker-compose.yml up -d\n\n# In ~/.anvil/.env (use the keys you create in the Langfuse UI)\nOTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:3000/api/public/otel/v1/traces\nOTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer pk-lf-...\nOTEL_SERVICE_NAME=anvil-dashboard\n```\n\nAnvil's dashboard auto-detects the local Langfuse on port 3000 — if\nit's running and you haven't set `OTEL_EXPORTER_OTLP_ENDPOINT`\nyourself, the dashboard wires it up automatically. Tear down with\n`docker compose -f infra/observability/docker-compose.yml down -v`.\n\n### What you'll see\n\n- One **`anvil.agent.session`** parent span per pipeline stage,\n  linking every adapter call and resume into a single trace.\n- A **`gen_ai.invoke`** child span per LLM call, with\n  `gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.input_tokens`,\n  `gen_ai.usage.output_tokens`, and a real **`gen_ai.usage.cost`**\n  in USD.\n- **`gen_ai.tool.\u003cname\u003e`** child spans for every tool call the agent\n  makes, closed when the matching `tool_result` arrives.\n- A **routing-decision** attribute group (`anvil.routing.*`) on the\n  invoke span so you can see why a particular model was picked, and\n  which models got burned mid-run.\n\nThe OTLP export carries the same numbers the dashboard's cost panel\nshows. One source of truth.\n\n---\n\n## How it all fits together\n\nAnvil is a TypeScript monorepo. Each package owns one concern; the\ndashboard ties them together.\n\n```\n                         ┌────────────────────────┐\n                         │     anvil dashboard    │  the control plane\n                         │   (React + WebSocket)  │\n                         └────────────┬───────────┘\n                                      │ orchestrates\n                                      ▼\n                  ┌───────────────────────────────────────┐\n                  │         pipeline-runner               │\n                  │  9-stage walker · per-repo fan-out    │\n                  │  validate-fix loop · chain-fallback   │\n                  └───┬──────┬──────┬──────────┬──────┬───┘\n                      │      │      │          │      │\n                      ▼      ▼      ▼          ▼      ▼\n                  ┌──────┐ ┌────┐ ┌──────┐ ┌──────┐ ┌──────────┐\n                  │agent-│ │core│ │knwldg│ │memory│ │convention│\n                  │ core │ │pipe│ │ core │ │ core │ │  -core   │\n                  └──┬───┘ └─┬──┘ └───┬──┘ └───┬──┘ └────┬─────┘\n                     │       │        │        │         │\n                     ▼       ▼        ▼        ▼         ▼\n                       ~/.anvil/  · models.yaml · stage-policy.yaml\n                                  · runs/\u003cid\u003e/  · features/\u003cslug\u003e/\n                                  · knowledge-base/\u003cproject\u003e/\n                                  · memories/  · conventions/\n```\n\nThree different fronts ride on the same engine:\n\n- **`anvil` CLI** — `init`, `doctor`, `dashboard` (the front door)\n- **Dashboard** — full pipeline control with live agent activity\n- **`code-search-mcp`** — the standalone code-search product:\n  MCP server, the `code-search` CLI (`index` / `query` / `status` /\n  `reset` / `daemon` / `serve` / `mcp`), and the\n  `code-search-daemon` long-running indexer. Three bins, one\n  install. Works without the Anvil agent stack.\n\n### Per-package deep dives\n\n| Package | What it owns |\n|:---|:---|\n| [`@esankhan3/anvil-cli`](packages/cli/) | CLI entry point + bundled dashboard |\n| [`@anvil-dev/dashboard`](packages/dashboard/) | React UI + WebSocket pipeline orchestrator (private — bundled into the CLI tarball) |\n| [`@esankhan3/anvil-agent-core`](packages/agent-core/) | 8 LLM adapters, router, cost, OTel |\n| [`@esankhan3/anvil-core-pipeline`](packages/core-pipeline/) | Typed `Step\u003cI,O\u003e` graph + EventBus + hooks |\n| [`@esankhan3/anvil-knowledge-core`](packages/knowledge-core/) | AST chunks, graph, hybrid retrieval |\n| [`@esankhan3/anvil-memory-core`](packages/memory-core/) | Five-type memory, bi-temporal, drift detection |\n| [`@esankhan3/anvil-convention-core`](packages/convention-core/) | Convention extractor + promotion ledger |\n| [`@esankhan3/code-search-mcp`](packages/code-search-mcp/) | Standalone code-search: MCP server + `code-search` CLI + `code-search-daemon` |\n\n---\n\n## Configuration\n\nThree files run the show, all in `~/.anvil/`:\n\n| File | What it does |\n|:---|:---|\n| `.env` | Provider keys + observability switches |\n| `models.yaml` | The model registry — local, cheap, premium tiers |\n| `stage-policy.yaml` | Which tier handles which pipeline stage |\n\nWorking examples live in [`examples/anvil-home/`](examples/anvil-home/).\nBootstrap with:\n\n```sh\ncp examples/anvil-home/.env.example      ~/.anvil/.env  \u0026\u0026 chmod 600 ~/.anvil/.env\ncp examples/anvil-home/models.yaml       ~/.anvil/models.yaml\ncp examples/anvil-home/stage-policy.yaml ~/.anvil/stage-policy.yaml\n```\n\n`anvil init` does the equivalent for `models.yaml` automatically.\n\n---\n\n## Project setup examples\n\nThree opinionated starters in [`examples/`](examples/):\n\n- **[TypeScript monorepo](examples/typescript-monorepo/)** — Next.js\n  storefront + Express API, Postgres, Redis\n- **[Go microservices](examples/go-microservices/)** — multi-service\n  Go workspace\n- **[Python ML](examples/python-ml/)** — training + serving split\n\nCopy a `factory.yaml`, adjust paths, and `anvil init` against your\nown workspace.\n\n---\n\n## Built with\n\nWe rely on the best of the open ecosystem:\n\n[`tree-sitter`](https://tree-sitter.github.io/) ·\n[`LanceDB`](https://lancedb.com/) ·\n[`graphology`](https://graphology.github.io/) ·\n[`OpenTelemetry`](https://opentelemetry.io/) ·\n[`Model Context Protocol`](https://modelcontextprotocol.io/) ·\n[`React`](https://react.dev/) ·\n[`Vite`](https://vitejs.dev/) ·\n[`commander`](https://github.com/tj/commander.js)\n\n---\n\n## Packages\n\nThe monorepo publishes a single user-facing CLI plus the building blocks\nit sits on top of. Every package below is published with **npm provenance**\n(sigstore attestation linking the tarball back to this repo) — `npm install`\nverifies the chain automatically.\n\n| Package | Purpose | npm |\n|---|---|---|\n| [**`@esankhan3/anvil-cli`**](packages/cli) | The user-facing CLI + bundled dashboard. Run `npx @esankhan3/anvil-cli` to start. | [![npm](https://img.shields.io/npm/v/@esankhan3/anvil-cli.svg?logo=npm\u0026label=\u0026color=cb3837)](https://www.npmjs.com/package/@esankhan3/anvil-cli) [![downloads](https://img.shields.io/npm/dm/@esankhan3/anvil-cli.svg?label=\u0026color=64748b)](https://www.npmjs.com/package/@esankhan3/anvil-cli) |\n| [`@esankhan3/anvil-agent-core`](packages/agent-core) | Shared LLM stack — unified `LanguageModel` interface, provider adapters, agent subprocess machinery, cost calc. | [![npm](https://img.shields.io/npm/v/@esankhan3/anvil-agent-core.svg?logo=npm\u0026label=\u0026color=cb3837)](https://www.npmjs.com/package/@esankhan3/anvil-agent-core) |\n| [`@esankhan3/anvil-knowledge-core`](packages/knowledge-core) | AST chunking, tree-sitter parsing, embeddings, LanceDB vector store, hybrid retrieval. | [![npm](https://img.shields.io/npm/v/@esankhan3/anvil-knowledge-core.svg?logo=npm\u0026label=\u0026color=cb3837)](https://www.npmjs.com/package/@esankhan3/anvil-knowledge-core) |\n| [`@esankhan3/anvil-memory-core`](packages/memory-core) | Long-term memory — five-type taxonomy, bi-temporal facts, drift detection, sleeptime ratification. **[83.4% on LoCoMo](packages/memory-core/benchmark).** | [![npm](https://img.shields.io/npm/v/@esankhan3/anvil-memory-core.svg?logo=npm\u0026label=\u0026color=cb3837)](https://www.npmjs.com/package/@esankhan3/anvil-memory-core) |\n| [`@esankhan3/anvil-convention-core`](packages/convention-core) | Convention extraction, rule engine, promotion ledger. | [![npm](https://img.shields.io/npm/v/@esankhan3/anvil-convention-core.svg?logo=npm\u0026label=\u0026color=cb3837)](https://www.npmjs.com/package/@esankhan3/anvil-convention-core) |\n| [`@esankhan3/anvil-core-pipeline`](packages/core-pipeline) | Typed `Step\u003cI,O\u003e` graph, `EventBus`, `StepRegistry`, lifecycle hooks. | [![npm](https://img.shields.io/npm/v/@esankhan3/anvil-core-pipeline.svg?logo=npm\u0026label=\u0026color=cb3837)](https://www.npmjs.com/package/@esankhan3/anvil-core-pipeline) |\n| [`@esankhan3/code-search-mcp`](packages/code-search-mcp) | Standalone code-search product — MCP server + `code-search` CLI + `code-search-daemon` (file-watcher + UDS JSON-RPC). | [![npm](https://img.shields.io/npm/v/@esankhan3/code-search-mcp.svg?logo=npm\u0026label=\u0026color=cb3837)](https://www.npmjs.com/package/@esankhan3/code-search-mcp) |\n\n\u003e The dashboard (`@anvil-dev/dashboard`) is bundled inside the cli — it is\n\u003e not published as a standalone npm package.\n\n---\n\n## Status\n\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd valign=\"top\" width=\"50%\"\u003e\n\n**MVP 2 — Active**\n\nThe dashboard is the canonical interface. The CLI ships\n`init`, `doctor`, `dashboard` today; more direct-scripting\ncommands are on deck.\n\n\u003c/td\u003e\n\u003ctd valign=\"top\" width=\"50%\"\u003e\n\n**Stable**\n\nPipeline orchestration · multi-provider routing · knowledge\nindexing · memory ratification · convention extraction ·\nPR review · OpenTelemetry · dashboard UI · **durable\nexecution (Pattern-2, v0.3.0)** · **multi-process race\narbitration (v0.3.0)** · **policy editor (v0.3.0)**.\n\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n**In flight:** richer plan validators · deeper RAG-eval ·\nadditional MCP tools · cost-policy enforcement (UI scaffolded —\nships in the next minor) · notification channels (Slack + email).\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\n## License\n\n[MIT](LICENSE) — bring it to your team, fork it, ship it.\n\n\u003cbr /\u003e\n\n\u003csub\u003e\u003cb\u003eNo hosted plan. No telemetry sent to us.\u003cbr /\u003e\nYour code, your keys, your budget. That's the deal.\u003c/b\u003e\u003c/sub\u003e\n\n\u003cbr /\u003e\u003cbr /\u003e\n\n\u003csub\u003eBuilt for engineers who want their AI tools to \u003cb\u003erespect their stack and their wallet\u003c/b\u003e.\u003c/sub\u003e\n\n\u003cbr /\u003e\u003cbr /\u003e\n\n\u003csub\u003eCrafted by \u003ca href=\"https://github.com/esanmohammad\"\u003e\u003cb\u003eEsan Mohammad\u003c/b\u003e\u003c/a\u003e\u003c/sub\u003e\n\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fesanmohammad%2Fanvil","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fesanmohammad%2Fanvil","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fesanmohammad%2Fanvil/lists"}