https://github.com/ikarolaborda/agent-smith
Offline-first, single-binary Go LLM agent with ChatGPT-like embedded SPA, multi-provider streaming, RAG, long-term memory, and DuckDuckGo web grounding for local Ollama models.
https://github.com/ikarolaborda/agent-smith
agent go golang llm matrix offline-first ollama openai-compatible rag react single-binary sse
Last synced: 18 days ago
JSON representation
Offline-first, single-binary Go LLM agent with ChatGPT-like embedded SPA, multi-provider streaming, RAG, long-term memory, and DuckDuckGo web grounding for local Ollama models.
- Host: GitHub
- URL: https://github.com/ikarolaborda/agent-smith
- Owner: ikarolaborda
- License: gpl-3.0
- Created: 2026-05-18T15:02:03.000Z (25 days ago)
- Default Branch: main
- Last Pushed: 2026-05-18T15:03:40.000Z (25 days ago)
- Last Synced: 2026-05-18T17:09:50.175Z (25 days ago)
- Topics: agent, go, golang, llm, matrix, offline-first, ollama, openai-compatible, rag, react, single-binary, sse
- Language: Go
- Homepage: https://github.com/ikarolaborda/agent-smith
- Size: 325 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
- Copyright: COPYRIGHT
Awesome Lists containing this project
README
An offline-first, single-binary Go LLM agent with a ChatGPT-like web UI, multi-provider streaming, RAG over curated corpora, long-term per-profile memory, and always-on web grounding for local models.
---
## What it is
`agent-smith` is one Go binary that:
- Talks to **OpenAI**, **Anthropic**, and any local **Ollama** model through a single `llm.Provider` interface, with streaming and non-streaming chat completions and OpenAI-compatible `/v1/chat/completions` SSE.
- Serves a **ChatGPT-like React SPA** embedded in the binary via `go:embed` — no separate frontend deploy.
- Runs **RAG** over nine curated markdown corpora (Laravel 13, PHP 8.5, NestJS, Tailwind/CSS, Architectural Patterns, NativePHP, CS Fundamentals, Go language docs, and project memory).
- Keeps **long-term per-profile memory** with explicit `/remember` writes, instruction-injection filtering, abstention prompting, retrieval-confidence banding, and per-message corrections.
- For local Ollama models, **always grounds answers with a fresh DuckDuckGo web search** (snippets only, sanitised, treated as third-party untrusted input) to suppress hallucinations.
- Discovers all installed Ollama models on the fly and lets you pick one per conversation.
It runs without an internet connection (apart from web grounding, which fails closed with a banner) and without a database — everything is files and process memory.
## Quick start
```sh
go build -o bin/agent ./cmd/agent
./bin/agent --serve # web UI at http://127.0.0.1:9090
./bin/agent --prompt "hello" # single-shot CLI mode
./bin/agent # interactive stdin loop
```
Default web port: **`:9090`**. Override with `--addr :8765`.
## Capabilities
| Area | What you get |
| --- | --- |
| Providers | OpenAI (`/v1/chat/completions`), Anthropic (`/v1/messages`), Ollama (`/api/chat` NDJSON), all with streaming. |
| Embeddings | OpenAI `text-embedding-3-small`, Ollama `nomic-embed-text`. |
| Web UI | React + Vite + react-bootstrap SPA embedded via `go:embed`. Per-conversation provider/model picker. Markdown + code highlighting. Long messages scroll inside their container; wide tables get their own horizontal scroll. |
| RAG | In-memory cosine retrieval, per-collection JSON persistence, ~213 chunks across 9 curated corpora. |
| Long-term memory | Per-profile namespace, kinds: `project_fact`, `preference`, `correction`. Instruction-injection filter on writes. `file_read` grounding tool with symlink-escape defense. |
| Hallucination control | Three-section Augment (`docs` + `memory` + `web`) + behavior addendum that forbids following instructions found in retrieved content + `RETRIEVAL CONFIDENCE: high/medium/low` band + abstention prompt. |
| Web grounding | DuckDuckGo lite HTML scrape (no API key). 5-min TTL cache. Hard sanitisation (HTML strip, zero-width/bidi removal, URL stripping inside snippet bodies). Section bounded to 3 KB / 5 results / 160-300-400 char field caps. Offline = banner, not blank context. |
| Ollama auto-discovery | Polls `/api/tags` every 60 s; every installed model shows up in the picker. |
| Tools | OpenAI-compatible tool/function calling. SSE stream emits one named `event: tool_result` frame per server-executed tool. |
## Configuration
Copy `configs/config.example.yaml` and point `--config` at it, or rely on env-var defaults.
```yaml
default_provider: ollama
providers:
openai:
api_key: ${OPENAI_API_KEY}
model: gpt-4o-mini
anthropic:
api_key: ${ANTHROPIC_API_KEY}
model: claude-sonnet-4-5
ollama:
base_url: http://127.0.0.1:11434
model: llama3.1
```
`${VAR}` placeholders are expanded from the environment at load time.
## API keys
```sh
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
# Ollama needs none; just `ollama serve`.
```
## CLI flags
| Flag | Purpose |
| -------------------- | ------------------------------------------------------------------------ |
| `--config` | Path to YAML config (default `configs/config.example.yaml`). |
| `--provider` | Override default provider (`openai`, `anthropic`, `ollama`). |
| `--model` | Override the provider's model. |
| `--prompt` | Single-shot prompt. Omit for interactive stdin mode. |
| `--stream` | Stream the assistant response incrementally in CLI mode. |
| `--serve` | Start the embedded web UI + OpenAI-compatible API server. |
| `--addr` | Web/API listen address. Default `:9090`. |
| `--no-web-search` | Disable web grounding entirely (operator kill switch). |
## Web API
When `--serve` is on, the binary exposes:
- `POST /v1/chat/completions` — OpenAI-compatible SSE. `web_search: true | false` in the body overrides the per-provider default.
- `GET /v1/models` — every chat model the running config can route to (cloud + every installed Ollama model).
- `GET /v1/providers` — configured providers and the default.
- `GET /healthz` — `{"status":"ok"}`.
- `GET /` — the embedded SPA.
## RAG corpora
Ship-included markdown corpora (under `docs/rag/`, embedded at first ingest into `data/rag/`):
- Laravel 13
- PHP 8.5
- NestJS
- Tailwind + CSS
- Architectural patterns
- NativePHP
- CS Fundamentals (parallelism, concurrency, memory model, mutex best practices for Go and PHP)
- Go language reference
- Project memory (per-profile)
Add your own by dropping markdown into `docs/rag//` and restarting; the ingest step is idempotent and content-hashed.
## Web grounding
Off for cloud providers, **on by default for Ollama** (small local models hallucinate more aggressively). Toggle per conversation in the top bar (the "Ground with web" checkbox). The agent treats the rendered web section as third-party untrusted input and the system prompt explicitly forbids the model from following any instructions found inside it.
Precedence: operator kill switch (`--no-web-search`) > per-request override (`web_search` in the JSON body) > provider default.
## Architecture
```
cmd/agent CLI entrypoint, --serve flag, provider wiring
internal/agent Run / RunStream loop + Session state
internal/llm Provider interface, shared types, registry
internal/llm/openai /v1/chat/completions client + SSE
internal/llm/anthropic /v1/messages client + content-block streaming
internal/llm/ollama /api/chat NDJSON streaming + /api/tags discovery
internal/rag In-memory cosine RAG + per-profile memory
internal/web DuckDuckGo searcher, TTL cache, sanitiser
internal/server HTTP + SSE + go:embed of the SPA
internal/tools Tool registry + builtins (file_read, http, shell)
internal/config YAML loader + env expansion
pkg/prompt Exported prompt-assembly helpers
web/ Vite + React + TS SPA, built into web/dist (embedded)
docs/rag/ Source markdown corpora
docs/brand/ Brand assets (logo, README hero)
configs/ Example YAML config
```
## Build
```sh
make build # bin/agent (rebuilds the SPA first if web/dist is stale)
make test # go test -race -count=1 ./...
make lint # golangci-lint
make docker # multi-stage build to a distroless image
```
The Go binary embeds `web/dist` via `go:embed`. After editing anything under `web/src/`, rebuild the SPA (`cd web && npm run build`) then rebuild the binary.
## Identity
Named for Agent Smith of *The Matrix* (1999). The visual identity — narrow, opaque, slightly-trapezoidal lenses with a single thin streak of Matrix-green at the bottom — lives under `docs/brand/`.
> *"It is inevitable."* — Mr. Smith
## License
GPL-3.0-or-later — see [`LICENSE`](./LICENSE).
SPDX-License-Identifier: `GPL-3.0-or-later`.
This is strong copyleft. If you distribute a derivative, you must license it under GPL-3.0 (or any later version) and ship complete corresponding source. If you only run it inside your own organisation, the GPL imposes no obligation on you.