https://github.com/ikarolaborda/agent-smith

Offline-first, single-binary Go LLM agent with ChatGPT-like embedded SPA, multi-provider streaming, RAG, long-term memory, and DuckDuckGo web grounding for local Ollama models.
https://github.com/ikarolaborda/agent-smith

agent go golang llm matrix offline-first ollama openai-compatible rag react single-binary sse

Last synced: 18 days ago
JSON representation

Offline-first, single-binary Go LLM agent with ChatGPT-like embedded SPA, multi-provider streaming, RAG, long-term memory, and DuckDuckGo web grounding for local Ollama models.

Host: GitHub
URL: https://github.com/ikarolaborda/agent-smith
Owner: ikarolaborda
License: gpl-3.0
Created: 2026-05-18T15:02:03.000Z (25 days ago)
Default Branch: main
Last Pushed: 2026-05-18T15:03:40.000Z (25 days ago)
Last Synced: 2026-05-18T17:09:50.175Z (25 days ago)
Topics: agent, go, golang, llm, matrix, offline-first, ollama, openai-compatible, rag, react, single-binary, sse
Language: Go
Homepage: https://github.com/ikarolaborda/agent-smith
Size: 325 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 8
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
- Copyright: COPYRIGHT

Awesome Lists containing this project

README

An offline-first, single-binary Go LLM agent with a ChatGPT-like web UI, multi-provider streaming, RAG over curated corpora, long-term per-profile memory, and always-on web grounding for local models.

---

## What it is

`agent-smith` is one Go binary that:

- Talks to **OpenAI**, **Anthropic**, and any local **Ollama** model through a single `llm.Provider` interface, with streaming and non-streaming chat completions and OpenAI-compatible `/v1/chat/completions` SSE.
- Serves a **ChatGPT-like React SPA** embedded in the binary via `go:embed` — no separate frontend deploy.
- Runs **RAG** over nine curated markdown corpora (Laravel 13, PHP 8.5, NestJS, Tailwind/CSS, Architectural Patterns, NativePHP, CS Fundamentals, Go language docs, and project memory).
- Keeps **long-term per-profile memory** with explicit `/remember` writes, instruction-injection filtering, abstention prompting, retrieval-confidence banding, and per-message corrections.
- For local Ollama models, **always grounds answers with a fresh DuckDuckGo web search** (snippets only, sanitised, treated as third-party untrusted input) to suppress hallucinations.
- Discovers all installed Ollama models on the fly and lets you pick one per conversation.

It runs without an internet connection (apart from web grounding, which fails closed with a banner) and without a database — everything is files and process memory.

## Quick start

```sh
go build -o bin/agent ./cmd/agent
./bin/agent --serve # web UI at http://127.0.0.1:9090
./bin/agent --prompt "hello" # single-shot CLI mode
./bin/agent # interactive stdin loop
```

Default web port: **`:9090`**. Override with `--addr :8765`.

## Capabilities

| Area | What you get |
| --- | --- |
| Providers | OpenAI (`/v1/chat/completions`), Anthropic (`/v1/messages`), Ollama (`/api/chat` NDJSON), all with streaming. |
| Embeddings | OpenAI `text-embedding-3-small`, Ollama `nomic-embed-text`. |
| Web UI | React + Vite + react-bootstrap SPA embedded via `go:embed`. Per-conversation provider/model picker. Markdown + code highlighting. Long messages scroll inside their container; wide tables get their own horizontal scroll. |
| RAG | In-memory cosine retrieval, per-collection JSON persistence, ~213 chunks across 9 curated corpora. |
| Long-term memory | Per-profile namespace, kinds: `project_fact`, `preference`, `correction`. Instruction-injection filter on writes. `file_read` grounding tool with symlink-escape defense. |
| Hallucination control | Three-section Augment (`docs` + `memory` + `web`) + behavior addendum that forbids following instructions found in retrieved content + `RETRIEVAL CONFIDENCE: high/medium/low` band + abstention prompt. |
| Web grounding | DuckDuckGo lite HTML scrape (no API key). 5-min TTL cache. Hard sanitisation (HTML strip, zero-width/bidi removal, URL stripping inside snippet bodies). Section bounded to 3 KB / 5 results / 160-300-400 char field caps. Offline = banner, not blank context. |
| Ollama auto-discovery | Polls `/api/tags` every 60 s; every installed model shows up in the picker. |
| Tools | OpenAI-compatible tool/function calling. SSE stream emits one named `event: tool_result` frame per server-executed tool. |

## Configuration

Copy `configs/config.example.yaml` and point `--config` at it, or rely on env-var defaults.

```yaml
default_provider: ollama
providers:
openai:
api_key: ${OPENAI_API_KEY}
model: gpt-4o-mini
anthropic:
api_key: ${ANTHROPIC_API_KEY}
model: claude-sonnet-4-5
ollama:
base_url: http://127.0.0.1:11434
model: llama3.1
```

`${VAR}` placeholders are expanded from the environment at load time.

## API keys

```sh
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
# Ollama needs none; just `ollama serve`.
```

## CLI flags

| Flag | Purpose |
| -------------------- | ------------------------------------------------------------------------ |
| `--config` | Path to YAML config (default `configs/config.example.yaml`). |
| `--provider` | Override default provider (`openai`, `anthropic`, `ollama`). |
| `--model` | Override the provider's model. |
| `--prompt` | Single-shot prompt. Omit for interactive stdin mode. |
| `--stream` | Stream the assistant response incrementally in CLI mode. |
| `--serve` | Start the embedded web UI + OpenAI-compatible API server. |
| `--addr` | Web/API listen address. Default `:9090`. |
| `--no-web-search` | Disable web grounding entirely (operator kill switch). |

## Web API

When `--serve` is on, the binary exposes:

- `POST /v1/chat/completions` — OpenAI-compatible SSE. `web_search: true | false` in the body overrides the per-provider default.
- `GET /v1/models` — every chat model the running config can route to (cloud + every installed Ollama model).
- `GET /v1/providers` — configured providers and the default.
- `GET /healthz` — `{"status":"ok"}`.
- `GET /` — the embedded SPA.

## RAG corpora

Ship-included markdown corpora (under `docs/rag/`, embedded at first ingest into `data/rag/`):

- Laravel 13
- PHP 8.5
- NestJS
- Tailwind + CSS
- Architectural patterns
- NativePHP
- CS Fundamentals (parallelism, concurrency, memory model, mutex best practices for Go and PHP)
- Go language reference
- Project memory (per-profile)

Add your own by dropping markdown into `docs/rag//` and restarting; the ingest step is idempotent and content-hashed.

## Web grounding

Off for cloud providers, **on by default for Ollama** (small local models hallucinate more aggressively). Toggle per conversation in the top bar (the "Ground with web" checkbox). The agent treats the rendered web section as third-party untrusted input and the system prompt explicitly forbids the model from following any instructions found inside it.

Precedence: operator kill switch (`--no-web-search`) > per-request override (`web_search` in the JSON body) > provider default.

## Architecture

```
cmd/agent CLI entrypoint, --serve flag, provider wiring
internal/agent Run / RunStream loop + Session state
internal/llm Provider interface, shared types, registry
internal/llm/openai /v1/chat/completions client + SSE
internal/llm/anthropic /v1/messages client + content-block streaming
internal/llm/ollama /api/chat NDJSON streaming + /api/tags discovery
internal/rag In-memory cosine RAG + per-profile memory
internal/web DuckDuckGo searcher, TTL cache, sanitiser
internal/server HTTP + SSE + go:embed of the SPA
internal/tools Tool registry + builtins (file_read, http, shell)
internal/config YAML loader + env expansion
pkg/prompt Exported prompt-assembly helpers
web/ Vite + React + TS SPA, built into web/dist (embedded)
docs/rag/ Source markdown corpora
docs/brand/ Brand assets (logo, README hero)
configs/ Example YAML config
```

## Build

```sh
make build # bin/agent (rebuilds the SPA first if web/dist is stale)
make test # go test -race -count=1 ./...
make lint # golangci-lint
make docker # multi-stage build to a distroless image
```

The Go binary embeds `web/dist` via `go:embed`. After editing anything under `web/src/`, rebuild the SPA (`cd web && npm run build`) then rebuild the binary.

## Identity

Named for Agent Smith of *The Matrix* (1999). The visual identity — narrow, opaque, slightly-trapezoidal lenses with a single thin streak of Matrix-green at the bottom — lives under `docs/brand/`.

> *"It is inevitable."* — Mr. Smith

## License

GPL-3.0-or-later — see [`LICENSE`](./LICENSE).
SPDX-License-Identifier: `GPL-3.0-or-later`.

This is strong copyleft. If you distribute a derivative, you must license it under GPL-3.0 (or any later version) and ship complete corresponding source. If you only run it inside your own organisation, the GPL imposes no obligation on you.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ikarolaborda/agent-smith

Awesome Lists containing this project

README