{"id":50374862,"url":"https://github.com/scope-creep-labs/drift","last_synced_at":"2026-06-25T19:01:24.878Z","repository":{"id":361217227,"uuid":"1232103116","full_name":"Scope-Creep-Labs/drift","owner":"Scope-Creep-Labs","description":"Drift: Observe, Deploy, Respond. From a Prompt.","archived":false,"fork":false,"pushed_at":"2026-05-29T16:36:34.000Z","size":1527,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-29T18:08:27.787Z","etag":null,"topics":["agentic-ai","docker","docker-compose","fleet-management","llm","observability"],"latest_commit_sha":null,"homepage":"https://scopecreeplabs.com/blog/drift-observe-deploy-respond---from-a-prompt/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Scope-Creep-Labs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":"CLA.md"}},"created_at":"2026-05-07T15:36:02.000Z","updated_at":"2026-05-29T16:36:38.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/Scope-Creep-Labs/drift","commit_stats":null,"previous_names":["scope-creep-labs/drift"],"tags_count":22,"template":false,"template_full_name":null,"purl":"pkg:github/Scope-Creep-Labs/drift","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Scope-Creep-Labs%2Fdrift","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Scope-Creep-Labs%2Fdrift/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Scope-Creep-Labs%2Fdrift/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Scope-Creep-Labs%2Fdrift/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Scope-Creep-Labs","download_url":"https://codeload.github.com/Scope-Creep-Labs/drift/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Scope-Creep-Labs%2Fdrift/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33686018,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-30T02:00:06.278Z","response_time":92,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agentic-ai","docker","docker-compose","fleet-management","llm","observability"],"created_at":"2026-05-30T09:01:18.165Z","updated_at":"2026-06-25T19:01:24.811Z","avatar_url":"https://github.com/Scope-Creep-Labs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Drift\n\n**Observe, deploy, respond. From a prompt.**\n\nDrift is a prompt-driven control plane for time-series systems and edge fleets. You ask questions or give instructions in plain language; an LLM agent picks the right tools, queries your VictoriaMetrics / Prometheus, runs statistical analysis, ships compose bundles to your devices, manages alert rules, and assembles a rich response — markdown, charts, tables, metric cards, timelines — that streams progressively into the UI as the work unfolds.\n\n```\nprompt → agent (tool use → metrics / fleet / alerts) → streaming render blocks → UI\n```\n## Observe\n\u003cimg width=\"1499\" height=\"736\" alt=\"image\" src=\"https://github.com/user-attachments/assets/f000ccbe-e8d6-4d7a-8c51-12089d492db8\" /\u003e\n\n## Query\n\u003cimg width=\"1492\" height=\"869\" alt=\"image\" src=\"https://github.com/user-attachments/assets/368b04c2-de48-43dd-9b5e-a432ed5430a2\" /\u003e\n\u003cimg width=\"1499\" height=\"867\" alt=\"image\" src=\"https://github.com/user-attachments/assets/f75a4f1b-0ea7-4059-951e-ca6a80c27d30\" /\u003e\n\n## Deploy\n\u003cimg width=\"926\" height=\"884\" alt=\"image\" src=\"https://github.com/user-attachments/assets/c79bc3c5-6152-472f-a51b-ce52404c9683\" /\u003e\n\n\n\n\n\n[Blog post](https://scopecreeplabs.com/blog/drift-observe-deploy-respond---from-a-prompt/) with details. \n\n\u003e 📐 [ARCHITECTURE.md](./ARCHITECTURE.md) — data flow, dataRef pattern, agent loop, tool catalog, extension points, file reference.\n\u003e\n\u003e 🚀 [DEPLOY.md](./DEPLOY.md) — Drift Deploy: fleet management, compose-app delivery, scenarios.\n\u003e\n\u003e 🚨 [ALERTING.md](./ALERTING.md) — vmalert + Alertmanager + the agent's 14 alert tools, end-to-end workflows.\n\u003e\n\u003e 📦 [deploy/README.md](./deploy/README.md) — single-server bundle (VM stack + Drift CP + Caddy/TLS on one box) with a guided installer.\n\n---\n\n## Install\n\nThe fast path is the single-server bundle. One Linux host with Docker, one public domain, two minutes of prompts:\n\n```bash\nVERSION=v0.1.41\ncurl -L \"https://github.com/Scope-Creep-Labs/drift/releases/download/${VERSION}/drift-deploy-${VERSION#v}.tar.gz\" | tar -xz\ncd \"drift-deploy-${VERSION#v}\"\n./install.sh\n```\n\n`install.sh` pulls `ghcr.io/kidproquo/drift-agent:latest` and `drift-frontend:latest`, so a fresh install lands directly on the current image versions regardless of which bundle tag you used. See [deploy/README.md](./deploy/README.md) for the full operator walkthrough (DNS, prompts, day-2 ops) and [deploy/UPDATES.md](./deploy/UPDATES.md) for the bundle-vs-image-only release model.\n\nWant to hack on the code instead? See [Quickstart](#quickstart) below.\n\n---\n\n## What you can do\n\nThree pillars, all driven from the same chat. The agent uses ~30 tools across them; you don't pick the tools, you describe the goal.\n\n### 🔍 Observe — investigate what's happening\n\nAsk anything about your telemetry. The agent discovers what metrics exist, picks the right query, fetches the data (which never enters the LLM context — see [the dataRef pattern](./ARCHITECTURE.md#the-dataref-pattern)), runs statistics, and assembles a streamed response with charts, tables, summaries, and timelines.\n\n```text\n\u003e Which hosts are reporting metrics, and what jobs are scraping?\n\u003e Show CPU usage on the host over the last 15 minutes.\n\u003e Which containers are using the most memory right now?\n\u003e Look for anomalies in network traffic over the last hour.\n\u003e Compare p95 request latency between dev-cloud and edge devices last week.\n\u003e Plot disk I/O on jetson-002 every 5 seconds.        ← live chart\n\u003e Now change the refresh rate to 1s.                  ← mutates the same chart in place\n\u003e Pull the last 200 error lines from dev-hetzner.     ← log search via VictoriaLogs\n```\n\nOutputs: streaming markdown narration, Plotly charts, sortable tables, metric cards with sparkline trends, event timelines, live-refreshing charts.\n\n### 🚀 Deploy — manage your fleet\n\nDrift Deploy registers each device with a small edge agent that polls the control plane every 30s, applies whatever compose bundles you've assigned, and reports back. You drive the whole thing from the same prompt UI — devices, apps, revisions, tagging, deploy-by-tag, rollback. RBAC + per-group scoping keeps non-admins out of devices that aren't theirs. See [DEPLOY.md](./DEPLOY.md) for the full scenario catalog.\n\n```text\n\u003e List devices and their groups.\n\u003e Show what's deployed to home-pi4-001 right now.\n\u003e Tag pi-riffpod-001 with edge, client-z.\n\u003e Fork the reporter app as reporter-jetson.\n\u003e Save a new revision of reporter-jetson — here's the compose: \u003cpaste\u003e\n\u003e Deploy reporter-jetson v3 to all devices tagged edge AND client-z.\n\u003e Roll home-pi4-001 back to reporter v2.\n\u003e Pull last 50 lines of the edge agent on dev-hetzner.\n```\n\nOutputs: propose-then-apply diffs in markdown, deployment status timelines, retry/conflict surfaces, terminal-action blocks, archive downloads (`.tar.gz` / `.zip`) of any revision.\n\n### 🛎️ Respond — close the loop\n\nInvestigations end in action. From the same chat, manage vmalert rules and Alertmanager routing, silence noise during planned work, or jump straight into a host shell. The agent uses the same propose-then-apply pattern as deploys so you see exactly what will change before it does. See [ALERTING.md](./ALERTING.md) for the alert subsystem details.\n\n```text\n\u003e List firing alerts.\n\u003e Create an alert when CPU \u003e 90% for 5 minutes on any edge device.\n\u003e Silence anything from jetson-002 for 2 hours — I'm rebooting it.\n\u003e Wire up a webhook so critical alerts ping https://ntfy.sh/drift-alerts.\n\u003e Show the receivers configured in alertmanager.\n\u003e Open a terminal to home-pi4-001.                    ← xterm.js, one click in the UI\n```\n\nOutputs: propose-then-apply rule/receiver diffs, alert state timelines, terminal-action blocks, and the in-browser terminal modal (full pty, mux-friendly with `TERM=xterm-256color`, audited per session).\n\n---\n\n## Motivation\n\nI wanted to observe and deploy docker-compose stacks across a fleet of Linux hosts — homelab, edge, cloud, corp — from one place, conversationally. The constraints came first; the architecture fell out of them.\n\n- **No inbound ports on target devices.** Edge agents poll out to the control plane every 30s; nothing listens on the device side. Works behind NAT, firewalls, residential routers, and corp networks without holepunching, port forwards, or VPNs.\n- **No SSH after the first install.** Once the device is commissioned (one `curl | bash` over SSH), everything happens through the CP: deploys, updates, tag changes, log queries, and even shell access (in-browser via xterm.js, audited per session). The agent script self-updates from the CP via SHA comparison on each check-in — no per-device upgrade chore. Image-baseline changes are the one exception and remain a deliberate, infrequent per-device step.\n- **Queue-based deploys, not push.** Desired state lives on the CP. Targets can be offline when you make a change — when they come back, they converge. No imperative \"ssh-and-run\" model that breaks when half the fleet is asleep.\n- **Compose is the contract.** Apps are versioned bundles of plain files (`compose.yaml` + `.env` + configs). If `docker compose up` runs it on your laptop, Drift can ship it. Rollback is \"deploy revision v2\" — no proprietary packaging, no special tooling.\n- **Groups and tags for dynamic filtering.** Groups are the RBAC/multi-tenant boundary (one per device); tags are free-form operational labels (`edge`, `client-z`, `low-power`) that overlap freely. Match-all rollouts (`deploy to tags=[\"edge\",\"client-z\"]`) handle the cross-cutting cases that groups alone can't.\n- **Lean on the proven observability stack.** VictoriaMetrics + VictoriaLogs + vmalert + Alertmanager + Grafana + node-exporter + cAdvisor + Vector — lightweight, replaceable, no homegrown protocols. Drift builds the *interaction layer*, not another TSDB.\n- **PromQL as the query language.** The agent generates and runs PromQL; the operator never has to see it. Anything that speaks the Prometheus query API plugs in (VM, Prometheus, Thanos, Mimir).\n- **Tool calling to extend the agent, not fine-tuning.** New capability = a function in `app/tools/*.py` plus a JSON schema. No retraining, no embeddings store, no RAG. Telemetry data flows through tools and stays out of the LLM context (the [dataRef pattern](./ARCHITECTURE.md#the-dataref-pattern)) — analysis is precise (numpy/scipy actually computes); the model orchestrates. Stops the \"LLM hallucinated a p95\" failure mode and keeps token cost flat regardless of fleet size.\n- **Propose-then-apply for every mutation.** The LLM never silently changes state. Creating an alert rule, deploying a bundle, editing a route — each goes through a `propose_*` tool that surfaces the diff before `apply_*` runs. This is how you let an LLM touch production.\n- **Watch the investigation, not just the answer.** Tool calls, narration, intermediate charts, results — all painted progressively as the agent works. No 30-second blank wait followed by a wall of text. Trust comes from seeing how the result was reached.\n- **Self-hosted, self-owned.** One Caddy + the Drift CP + a TSDB on a single Linux box. Your devices, your data, your model key. No SaaS phone-home, no per-device subscription, no vendor.\n- **Bring-your-own model.** Claude Opus 4.7 is the default for its quality on agentic loops, but `MODEL=…` + the engine adapter pattern let you point at Sonnet, Haiku, or anything else. The frontend doesn't know which model is running.\n- **RBAC + per-group scoping out of the box.** Three roles (`observe \u003c deploy \u003c admin`), per-user group membership scopes which devices a user can see/touch, separate registry credentials per group. Multi-tenant from day one rather than retrofit.\n- **Host-CA injection for corp networks.** `install.sh` detects the host's combined CA bundle and propagates it to the agent plus every deployed app (mounted at the standard Debian + Alpine paths, plus `SSL_CERT_FILE` / `CURL_CA_BUNDLE` in env). Ship to devices sitting behind a TLS-intercepting corp proxy without per-app workarounds.\n\nThe same constraints rule out a lot of common shapes: no PaaS-style \"give us your code\", no per-device daemon you upgrade by hand, no log-aggregator-as-a-service, no \"let the LLM read all your data\" RAG, no listening sockets on target devices.\n\n---\n\n## What the LLM sees (and doesn't)\n\nThe agent operates on metadata — names, labels, summaries, configs by reference — not on raw secrets or raw bulk data. The boundary is enforced in code, not by prompting the model to behave.\n\n**What the LLM has access to:**\n\n- Names and metadata: metric / label / job names, device names + groups + tags + statuses, app / revision metadata, alert rule names + expressions + labels, receiver names + webhook URLs, session metadata.\n- File contents of compose bundles when explicitly fetched via `get_app_revision` — typically `${VAR}` references; the actual values come from device-side env.\n- Time-series *summaries* (n, mean, p50, p95, min, max, …) computed server-side from each query. Raw arrays stay server-side under a `prom://\u003cuuid\u003e` dataRef and are pushed straight to the UI via SSE (the [dataRef pattern](./ARCHITECTURE.md#the-dataref-pattern)).\n- Log lines returned by `query_logs` — the same content you'd see in `docker logs` on the device.\n\n**What the LLM never has access to:**\n\n- API keys (`ANTHROPIC_API_KEY`, etc.) and any other env-var credentials — env vars don't enter the prompt or the tool-result surface.\n- Drift's database password (`DRIFT_PG_PASSWORD`) and Fernet key (`DRIFT_SECRET_KEY`).\n- Auth secrets for the TSDB / vmalert / Alertmanager (`VM_BASIC_AUTH`, `VMALERT_BASIC_AUTH`, `ALERTMANAGER_BASIC_AUTH`, etc.) — tool handlers attach these directly to outbound `httpx` calls.\n- Registry credentials — encrypted at rest with `DRIFT_SECRET_KEY`, decrypted only per device check-in, shipped over TLS straight to the edge agent. Operators set them via a UI modal that bypasses the LLM entirely.\n- Alertmanager receiver secrets (bearer tokens, webhook auth) — the agent only calls `Path.exists()` on `am-secrets/*` filenames and emits a *path reference* (`bearer_token_file: /etc/alertmanager/secrets/\u003cname\u003e`). Alertmanager opens the file at notify time; the LLM never sees the bytes.\n- Raw time-series arrays — kept under server-side dataRefs, streamed to the UI out-of-band.\n- Web-terminal bytes — pty stdio flows agent ↔ edge over a dedicated WebSocket and never the LLM.\n- User passwords — set + verify happen server-side via `passlib`; the LLM has no read path to the password column.\n\n**Three places where sensitive content briefly touches the chat surface:**\n\n- `create_user` / `reset_user_password` return a server-generated password ONCE in the tool response, which renders into the chat trace. Hand it to the user out-of-band and clear the investigation afterwards. The self-service \"change my password\" sidebar flow keeps the new password off the chat entirely.\n- `commission_device` returns a one-shot bootstrap token in the curl line it generates. The token is single-use — once a device claims it, it's exhausted — and acts as a device-commissioning credential, not a long-lived secret.\n- If you paste compose contents with literal secrets in `.env` into the prompt, the LLM sees what you typed. Use `${VAR}` references resolved on the device, or the registry-credentials modal for image-pull tokens — both keep secrets off the chat.\n\n---\n\n## What you get\n\n- **Frontend** — React + Material UI dark theme, Plotly time-series charts, real-time streaming UI that surfaces the agent's thinking and tool calls. Sidebar lists devices and apps in your groups; xterm.js opens a host shell in one click.\n- **Backend** — FastAPI agent powered by Claude Opus 4.7 (default; configurable via `MODEL`) with adaptive thinking, prompt caching, and ~30 tools across discovery / query / analysis / fleet / alerts / render-block emission.\n- **Multi-user RBAC** — login + cookie sessions, three roles (`observe` \u003c `deploy` \u003c `admin`), user-group scoping for devices, audit log for terminal sessions. Bootstrap an admin via env vars; manage the rest from chat or the admin API.\n- **Drift Deploy** — promote a compose bundle as an \"app\", push to one device or every device matching a tag set, watch the edge agents reconcile in real time. Per-group registry credentials, edge-agent self-update, retry budgets, conflict detection, host-CA injection for corp PKI.\n- **Live charts** — `make_live_chart` polls a server-side PromQL passthrough on a timer and `Plotly.react`-diffs in place; mutating one keeps zoom/hover.\n- **Compose stack** — slim Docker images for both services. Brings its own TSDB? No — point it at any Prometheus-compatible source via `VM_URL`. The bundled single-server install adds VictoriaMetrics, VictoriaLogs, vmalert, Alertmanager, Grafana, and Caddy/TLS.\n\n---\n\n## Prerequisites\n\n| Tool         | Version  | Why                                            |\n| ------------ | -------- | ---------------------------------------------- |\n| Docker       | ≥ 24.0   | Recommended path for running everything.       |\n| Docker Compose | ≥ 2.20 | Bundled with Docker Desktop.                   |\n| Node.js      | ≥ 20     | Local frontend dev (alternative to Docker).    |\n| Python       | ≥ 3.12   | Local backend dev (alternative to Docker).     |\n| Anthropic API key | — | Required for the agent to actually call the LLM. Get one at https://console.anthropic.com. |\n\nYou also need a **Prometheus-compatible time-series source** the agent can reach:\n- Your VictoriaMetrics (single-node or vmselect cluster) via `VM_URL`.\n- Any Prometheus-API-compatible store (Prometheus, Thanos, Grafana Mimir, etc.).\n\n\u003e On this host, a VM stack lives at `/root/setup/victoria/` (single-node VM on `:8428`, vmauth basic-auth proxy on `:8427`, Grafana on `:3000`) with a vmagent + cadvisor reporter at `/root/setup/victoria/reporter/`. `deploy/install.sh` walks you through pointing Drift at it via the public vmauth URL.\n\n---\n\n## Quickstart\n\nTwo paths, pick whichever fits.\n\n### Option A — single-server install (full stack)\n\nThe supported full-stack install path is `deploy/install.sh`. It generates `/var/lib/drift-cp/.env` with random secrets, renders config templates, brings up the compose stack from `deploy/docker-compose.yml`, and (optionally) issues TLS via a bundled Caddy. See [DEPLOY.md](./DEPLOY.md) for the walk-through.\n\n```bash\ngit clone \u003cthis repo\u003e\ncd drift/deploy\nsudo ./install.sh\n```\n\nTry:\n\n- *\"Which hosts are reporting metrics, and what jobs are scraping?\"*\n- *\"Show CPU usage on the host over the last 15 minutes.\"*\n- *\"Which containers are using the most memory right now?\"*\n- *\"Look for anomalies in network traffic over the last hour.\"*\n\nFor VM cluster (vmselect): set `VM_TENANT_PATH=/select/\u003caccountID\u003e/prometheus`. For auth: set `VM_BASIC_AUTH=user:pass` or `VM_BEARER_TOKEN=...`. Both go in `/var/lib/drift-cp/.env` after install.\n\n### Option B — local dev (no Docker)\n\nRun the backend in a venv and the frontend in Vite's dev server. Best for iterating on code.\n\n**Backend:**\n\n```bash\ncd drift-agent\npython3 -m venv .venv\nsource .venv/bin/activate\npip install -e .\ncp .env.example .env       # set ANTHROPIC_API_KEY + VM_URL (must be reachable from your machine)\nuvicorn app.main:app --reload --host 127.0.0.1 --port 8000\n```\n\n**Frontend (in another terminal):**\n\n```bash\nnpm install\ncat \u003e .env.local \u003c\u003cEOF\nVITE_ENGINE=agent\nVITE_AGENT_DEV_URL=http://localhost:8000\nEOF\nnpm run dev\n```\n\nOpen \u003chttp://localhost:5173\u003e. Vite's dev proxy forwards `/api/*` to the backend, so the same code path works in dev as in Docker.\n\nTo iterate on the UI without spending API credits or running a backend, set `VITE_ENGINE=mock` instead. The frontend ships 5 hard-coded scenarios with synthetic data.\n\n---\n\n## Configuration\n\nFor a **full-stack install**, env vars live in `/var/lib/drift-cp/.env` — generated by `deploy/install.sh` and read by `docker compose` (including the in-app **Software Updates → Apply** path). Edit there and trigger an apply to take effect.\n\nFor **local dev**:\n\n- **`drift-agent/.env`** — read by uvicorn (backend).\n- **`.env.local`** at the repo root — read by Vite (frontend; `VITE_*` only).\n\n### Agent env vars\n\n| Variable             | Required | Default                                  | Notes                                      |\n| -------------------- | -------- | ---------------------------------------- | ------------------------------------------ |\n| `ANTHROPIC_API_KEY`  | yes      | —                                        | Claude API key.                            |\n| `VM_URL`             | yes      | —                                        | Base URL of your VictoriaMetrics / Prometheus. |\n| `VM_TENANT_PATH`     | no       | `\"\"`                                     | `/select/\u003cid\u003e/prometheus` for vmselect; empty for single-node. |\n| `VM_BASIC_AUTH`      | no       | `\"\"`                                     | `user:pass`. Sent as `Authorization: Basic`. |\n| `VM_BEARER_TOKEN`    | no       | `\"\"`                                     | Sent as `Authorization: Bearer \u003ctoken\u003e`.    |\n| `MODEL`              | no       | `claude-opus-4-7`                        | Any current Claude model ID.               |\n| `EFFORT`             | no       | `high`                                   | `low / medium / high / xhigh / max`.       |\n| `MAX_TOKENS`         | no       | `64000`                                  | Per-iteration `max_tokens`.                |\n| `ALLOWED_ORIGINS`    | no       | `http://localhost:5173,http://127.0.0.1:5173` | Comma-separated CORS allowlist for the FastAPI app. `install.sh` sets this to the frontend origin. |\n\n### Frontend env vars\n\n| Variable              | Default               | Notes                                                   |\n| --------------------- | --------------------- | ------------------------------------------------------- |\n| `VITE_ENGINE`         | `mock`                | `agent` for the real backend, `mock` for synthetic.     |\n| `VITE_API_BASE`       | `/api`                | Base URL the AgentAdapter POSTs to.                     |\n| `VITE_AGENT_DEV_URL`  | `http://localhost:8000` | Where Vite's dev proxy forwards `/api/*`. Dev only.   |\n\nThe frontend image is built with `VITE_ENGINE=agent` and `VITE_API_BASE=/api` (via build args in the Dockerfile). Override at build time with `--build-arg VITE_ENGINE=mock` if you want the UI without the backend.\n\n---\n\n## Project structure\n\nTop level:\n\n```\ndrift/\n├── README.md                  this file\n├── ARCHITECTURE.md            deep dive: data flow, agent loop, dataRef pattern, tool catalog\n├── ALERTING.md                vmalert + Alertmanager subsystem; alert/silence/receiver tools\n├── DEPLOY.md                  Drift Deploy user guide; deploy/commission/migrate scenarios\n├── deploy/                    install.sh + docker-compose.yml + config templates for the full stack\n├── Dockerfile                 frontend: alpine node builder + nginx alpine runtime\n├── nginx.conf                 SPA + SSE-friendly /api proxy\n├── package.json               frontend dependencies\n├── tsconfig.json\n├── vite.config.ts             dev proxy /api → VITE_AGENT_DEV_URL\n├── index.html\n├── src/                       React frontend\n├── drift-agent/               Python backend (FastAPI + agent + tools)\n└── spec/                      original product specs (reference only)\n```\n\nFor a full file-by-file breakdown, see [ARCHITECTURE.md → File reference](./ARCHITECTURE.md#file-reference).\n\n---\n\n## Common dev tasks\n\n### Add a new tool the agent can call\n\nEdit one of `drift-agent/app/tools/{metrics,analysis,emit}.py`:\n\n1. Define an `async def my_tool(ctx, args)` returning a JSON-serializable dict.\n2. Add an entry to that file's `*_TOOLS` list (JSON Schema describing inputs).\n3. Register the handler in `*_HANDLERS`.\n\nThe agent picks it up automatically on next request — system prompt and tools list rebuild from the registries on import.\n\nSee [ARCHITECTURE.md → Extension points](./ARCHITECTURE.md#extension-points).\n\n### Add a new render block type\n\n1. Add the variant to `src/types/blocks.ts` and `drift-agent/app/schemas.py`.\n2. Write a React component under `src/components/blocks/`.\n3. Register it in `src/components/blocks/BlockRenderer.tsx`.\n4. Add an emit tool in `drift-agent/app/tools/emit.py`.\n\n### Iterate on the UI without burning tokens\n\nSet `VITE_ENGINE=mock` in `.env.local`. The Mock adapter synthesizes a fake event stream from 5 hard-coded scenarios (`gateway-17 instability`, `fleet thermal`, `dispatch optimization`, `v2.8 regression`, `latency correlation`). The streaming UI works the same.\n\n### Type-check and build\n\n```bash\n# Frontend\nnpx tsc --noEmit\nnpm run build\n\n# Backend\ncd drift-agent \u0026\u0026 .venv/bin/python -c \"from app.main import app; print('OK')\"\n```\n\nThere are no automated tests yet — verification is manual end-to-end via the UI.\n\n### Switch to a different LLM model\n\nSet `MODEL=claude-sonnet-4-6` (or any current Claude ID) in `.env`. Adjust `EFFORT` for the cost/quality balance you want. Restart the agent.\n\nTo use a different LLM provider entirely, refactor `drift-agent/app/agent.py:run_agent`. The SSE protocol stays the same, so no frontend changes are needed.\n\n---\n\n## Troubleshooting\n\n**Agent fails to start with \"1 validation error for Settings: anthropic_api_key\".**\nYou haven't set `ANTHROPIC_API_KEY` in `.env`. The Settings class requires it.\n\n**Agent starts but `/investigate` returns an error: \"anthropic_api_error: ...\".**\nEither the key is invalid, the model ID is wrong, or you've hit a rate limit. Check the agent's logs (`docker compose logs drift-agent` or the uvicorn terminal).\n\n**Agent runs but every tool call fails with HTTP timeout / connection refused.**\nThe agent can't reach `VM_URL`. From inside the agent container: `docker compose exec drift-agent curl -s \"$VM_URL/api/v1/labels\"`. Common causes:\n- VM is on the docker host but `VM_URL=http://localhost:8428` — containers can't see the host's `localhost`. Use `http://host.docker.internal:8428` with an `extra_hosts: host-gateway` mapping, or attach drift to the VM stack's docker network.\n- Auth required but `VM_BASIC_AUTH` / `VM_BEARER_TOKEN` not set (e.g. you're hitting `vmauth` on `:8427`, not `vm:8428`).\n- Firewall / Tailscale not connected.\n- vmselect cluster but you forgot `VM_TENANT_PATH=/select/0/prometheus`.\n\n**Agent fetches data but charts in the UI show \"Chart data is no longer in cache\".**\nYou reloaded the page. The dataRegistry is in-memory only — re-run the prompt to refetch.\n\n**`cache_read_input_tokens` shows 0 across consecutive turns.**\nSomething invalidated the prompt cache prefix. Look in `drift-agent/app/agent.py` for non-deterministic content in `SYSTEM_PROMPT` or the tools list (timestamps, UUIDs, varying tool order). The prefix must be byte-stable across calls.\n\n**\"Failed to load chart data: dataRef not found: prom://...\"**\nThe agent emitted a chart referencing a ref that wasn't pushed via a `data` event. Check the agent logs; usually means an emit tool fired before the underlying `query_range` succeeded. File a bug.\n\n**Vite dev server won't start with port-in-use error.**\n`lsof -ti:5173 | xargs kill` to free the port.\n\n**Docker build fails on `npm ci`.**\nDelete `node_modules/` locally before building (Docker's COPY may have picked up a partial install).\n\n**Frontend serves but `/api/*` returns 502 in nginx.**\nThe agent container isn't healthy. `docker compose ps` should show it `running` and `healthy`. If not, `docker compose logs drift-agent`.\n\n**Agent runs slowly / takes 30+ seconds.**\nNormal for complex investigations — `claude-opus-4-7` with `effort=high` is thorough. Lower `EFFORT=medium` if you need faster, less exhaustive responses.\n\n---\n\n## Notes\n\n- **Persistence**: investigation history is in `localStorage` under key `drift.investigations.v2`. Chart trace data is in-memory only — see [ARCHITECTURE.md → The dataRef pattern](./ARCHITECTURE.md#the-dataref-pattern). User auth, devices, apps, registry creds, terminal session metadata live in Postgres (the `drift-postgres` service in compose). Token usage is reported as metrics into VictoriaMetrics so the sidebar's per-user \"$X used\" survives drift-agent restarts.\n- **Agent loop cap**: the loop is bounded at 20 LLM iterations; most investigations finish in 4–8.\n- **No automated tests yet.** Verification is manual via the UI.\n- **Bootstrap admin**: set `DRIFT_ADMIN_USERNAME` + `DRIFT_ADMIN_PASSWORD` in `.env` for first-run admin creation. Subsequent users are created from chat or the admin API.\n\n---\n\n## License\n\nDrift is licensed under the [Apache License 2.0](./LICENSE). Copyright 2026 Scope Creep Labs LLC.\n\n## Contributing\n\nContributions are welcome — bug fixes, features, docs, edge-agent ports to new platforms. See **[CONTRIBUTING.md](./CONTRIBUTING.md)** for the development setup and PR guidelines.\n\nAll contributors must sign the **[Individual Contributor License Agreement](./CLA.md)**. Our CLA Assistant bot posts a one-click signing link on your first pull request; sign once and it covers every future PR. The CLA permits Scope Creep Labs LLC to relicense future versions of the project under different terms — Apache 2.0 on existing releases is permanent.\n\nFor security reports, please email **support@scopecreeplabs.com** rather than opening a public issue.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscope-creep-labs%2Fdrift","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscope-creep-labs%2Fdrift","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscope-creep-labs%2Fdrift/lists"}