https://github.com/hallelx2/omni
A self-improving agent harness for open models — engine, adapters, tools, third-brain, self-improvement, surfaces
https://github.com/hallelx2/omni
Last synced: 19 days ago
JSON representation
A self-improving agent harness for open models — engine, adapters, tools, third-brain, self-improvement, surfaces
- Host: GitHub
- URL: https://github.com/hallelx2/omni
- Owner: hallelx2
- Created: 2026-05-14T21:12:51.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-05-25T23:48:48.000Z (23 days ago)
- Last Synced: 2026-05-26T01:25:17.247Z (23 days ago)
- Language: TypeScript
- Size: 557 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
```
██████╗ ███╗ ███╗███╗ ██╗██╗
██╔═══██╗████╗ ████║████╗ ██║██║
██║ ██║██╔████╔██║██╔██╗ ██║██║
██║ ██║██║╚██╔╝██║██║╚██╗██║██║
╚██████╔╝██║ ╚═╝ ██║██║ ╚████║██║
╚═════╝ ╚═╝ ╚═╝╚═╝ ╚═══╝╚═╝
```
### A self-improving agent harness for open models
*Brain, hands, and super legs for any LLM — frontier or local.*
[](https://github.com/hallelx2/omni/actions/workflows/ci.yml)
[](#license)
[](https://bun.sh)
[](https://www.typescriptlang.org/)
[](#status)
[](#status)
[**Quick start**](#-quick-start) •
[**Docs**](./docs/architecture.md) •
[**Authoring guides**](#-extending-omni) •
[**Roadmap**](#-roadmap--honest-debt)
---
Omni gives **any** language model — frontier or open, big or small — a body to act through, a memory to learn from, and an evolving sense of how to use itself. It was designed for the open-model wave (MiMo, Qwen, GLM, DeepSeek, Kimi, Llama via Ollama) and works just as well with Claude, GPT, and Gemini.
The thesis is simple: **weaker models become useful when the harness around them is strong.** Instead of asking a 7B model to plan, execute, and reflect on its own, Omni layers in a planner, a critic, and a memory; probes the model on first contact; and adapts its prompts, tools, and loop strategy to fit. The result is an agent that punches above the model's weight class.
```
┌─ third brain ────────────┐ ┌─ hands ─────────────────┐ ┌─ super legs ────────────┐
│ Planner decomposes │ │ bash, read/write, │ │ Probe capabilities │
│ Critic reviews │ + │ edit (find/replace), │ + │ Adapt prompts to model │
│ Memory recalls │ │ multi_edit, glob, │ │ Trace every session │
│ │ │ grep, web_fetch, MCP │ │ Evolve prompt variants │
└──────────────────────────┘ └─────────────────────────┘ └─────────────────────────┘
│
▼
┌─────────────────────────┐
│ Engine (the loop) │
│ AsyncIterable │
└─────────────────────────┘
│
┌──────────────────┬──────────────┴──────────────┬─────────────────┐
▼ ▼ ▼ ▼
CLI Server Web VS Code
(readline) (HTTP + WS) (browser) (extension)
```
---
## ✨ Why Omni
Most agent frameworks assume a frontier model. Run them on a local 7B and they crumble — the model hallucinates tool names, drops formatting, goes in circles. The convenient answer is to wait for open models to catch up. Omni takes the other route:
> The model is interchangeable. The harness *is* the agent.
If the harness compensates intelligently for what the model can't do — by probing capabilities, choosing the right system prompt, decomposing tasks, criticising results, and learning across sessions — then a 7B running on your laptop can do real work.
---
## 🎯 Features
### 🧠 Third Brain
- **Planner** decomposes user tasks before execution
- **Critic** reviews assistant turns and tool results
- **Memory** persists facts, preferences, and skills across sessions
### ✋ Hands
- `bash` (cross-platform: bash/pwsh, ANSI-stripped, 256KB cap)
- `read_file`, `write_file` with size limits and slicing
- `edit` (find/replace, ambiguity-safe) + `multi_edit` (atomic)
- `glob` (Bun.Glob, mtime-sorted) + `grep` (rg-accelerated)
- `web_fetch` (HTML→markdown via turndown)
- **MCP client** for any Model Context Protocol server
### 🦵 Super Legs (self-improvement)
- **Probe** classifies a model on first contact (~600 tokens)
- **Adapt** picks system prompt, ReAct fallback, iteration budget
- **Traces** persist every session as JSONL + SQLite
- **Evolve** mutates prompt variants and ranks by trace fitness
### 🔌 Surfaces
- Interactive **CLI** (readline, slash commands, sessions)
- **HTTP + WebSocket server** with permission forwarding
- **Browser client** (minimal vanilla HTML/TS)
- **Tauri desktop app** (React UI + bundled engine sidecar); minimal browser client
### 🤖 Subagents & Modes
- **Prebuilt subagents** — `explore`, `test`, `critique` ship in-repo as `AGENT.md` definitions, each a sandboxed child engine with its own model, tool subset, and **enforced** permission rules (regex allow/deny per tool — e.g. `test` may run `bun test` but not `rm -rf`, and write only `*.test.ts`). Override or add your own in `~/.omni/agents//AGENT.md` or workspace `.omni/agents/`.
- **Parallel dispatch** — the agent calls a subagent directly as a tool, or fans several out at once with `dispatch_agents` (bounded concurrency, partial-failure isolation, abort propagation) and collects every result when they finish.
- **Plan / Auto / Build modes** — `plan` = read-only tools + the **Planner** first; `build` = all tools + the **Critic** after + permission prompts; `auto` = all tools with prompts auto-allowed (unattended — safety guards still apply). Switch with `/plan` · `/auto` · `/build` · `/mode`; an opt-in classifier can pick the mode per turn, and the agent can request plan→build via `request_build_mode` (human-gated).
- **Per-role models** — the planner, critic, and each subagent may run on a different model than the main agent (e.g. plan on Claude, execute on a local model), defaulting to the main model.
### Engine guarantees
True streaming • abort propagation through model + tools • loop detection by tool-call signature • bounded retries on retryable errors • parallel tool calls with interleaved event streams • session snapshot/restore preserving identity • cost tracking when per-1k rates are known • 22 discriminated `EngineEvent` variants — the only public observation channel.
---
## 🚀 Install
**One-liner** (downloads the prebuilt binary, installs to `~/.omni/bin`, wires PATH):
```bash
# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/hallelx2/omni/main/install.sh | bash
```
```powershell
# Windows (PowerShell)
irm https://raw.githubusercontent.com/hallelx2/omni/main/install.ps1 | iex
```
Then drop your key in `~/.omni/.env` and run `omni`:
```bash
echo 'MIMO_API_KEY=tp-...' >> ~/.omni/.env
omni # full TUI
omni --plain # plain readline REPL (pipes / CI)
```
With `MIMO_API_KEY` present, Omni auto-selects the MiMo adapter — no
`OMNI_ADAPTER` needed.
### Build & install from source
```bash
git clone https://github.com/hallelx2/omni.git && cd omni
bun install
bun run setup # build host binary → ~/.omni/bin/omni + wire PATH
```
Or just build without installing:
```bash
bun run build # → packages/cli/dist/omni[.exe] (host platform)
bun run build:all # all platforms (win/linux/mac × x64/arm64)
```
### Run from source (no build)
```bash
bun run dev # interactive TUI
OMNI_ADAPTER=mimo bun run dev
bun run dev --plain # readline REPL
bun test # test suite
bun run typecheck # all packages
```
### Publish a release
**GitHub release (binaries for the `curl | bash` installers):**
```bash
gh auth login
bun run release v0.1.0 # build:all + create GitHub release with binaries
```
This uploads assets named `omni--` so the install one-liners
above can fetch them.
**npm release (all platforms, via CI):** push a tag and the
[`Release (npm)`](./.github/workflows/release.yml) workflow builds a native
binary on a runner per OS, then publishes the `omni-harness` launcher plus
each `omni-harness-` package. Cross-compiling can't be done from one
host (opentui ships per-platform native modules), so the matrix is required.
```bash
git tag v0.1.0-beta.1 && git push origin v0.1.0-beta.1 # → npm tag "beta"
git tag v0.1.0 && git push origin v0.1.0 # → npm tag "latest"
# or run the workflow manually (Actions → Release (npm) → Run workflow)
```
Testers then install with:
```bash
npm i -g omni-harness@beta # or @latest
omni
```
**One-time setup:** add an npm automation token as the repo secret
`NPM_TOKEN` (Settings → Secrets → Actions). The Linux-arm64 leg uses the
`ubuntu-24.04-arm` runner (GitHub-hosted arm64; available on public repos).
---
## ⚙ Configuration: `~/.omni/`
Omni keeps per-user state in `~/.omni/`:
```
~/.omni/
├── config.json # default adapter, model, provider keys, UI prefs
├── db.sqlite # sessions, messages, events, audit, profiles, variants
├── traces/ # one JSONL file per session run
├── agents/ # custom subagent defs (AGENT.md) — override the shipped ones
├── memory.json # long-term memory entries
└── settings.json # surface-specific settings (theme, etc.)
```
Every path is env-overridable (`OMNI_HOME`, `OMNI_DB`, `OMNI_TRACES`, `OMNI_MEMORY`, `OMNI_CONFIG`). The CLI's `/paths` command shows what resolved.
Example ~/.omni/config.json
```json
{
"adapter": "mimo",
"model": "mimo-v2.5-pro",
"maxIterations": 12,
"enableReActFallback": true,
"providers": {
"mimo": {
"apiKey": "tp-...",
"baseURL": "https://token-plan-sgp.xiaomimimo.com/v1"
},
"anthropic": { "apiKey": "sk-ant-..." }
},
"permissions": {
"mode": "ask",
"denyDestructive": true
},
"ui": { "theme": "dark", "showThinking": true },
"storage": { "tracesEnabled": true },
"modes": { "default": "build" },
"agents": {
"planner": { "model": "anthropic:claude-sonnet-4-5" },
"critic": { "enabled": true, "autoRetry": false }
}
}
```
**Precedence for every value:** explicit argument **>** env var **>** config file **>** built-in default.
---
## 🔌 Provider matrix
| Adapter | Endpoint | Env var | Notes |
|---|---|---|---|
| `mimo` | `https://token-plan-sgp.xiaomimimo.com/v1` | `MIMO_API_KEY` | Lowercase model ids (`mimo-v2.5-pro`); reasoning content auto-roundtripped |
| `mimo-anthropic` | `/anthropic/v1` | `MIMO_API_KEY` | Same key, Anthropic protocol |
| `ollama` | `http://localhost:11434/v1` | *(none)* | Any tag Ollama serves locally |
| `anthropic` | api.anthropic.com | `ANTHROPIC_API_KEY` | Extended thinking supported |
| `openai` | api.openai.com | `OPENAI_API_KEY` | gpt-4o, gpt-4o-mini, o1, o-series |
| `google` | generativelanguage.googleapis.com | `GOOGLE_API_KEY` | gemini-2.0-flash, 1.5-pro |
| `mock` | *(none)* | *(none)* | Scripted; for tests and offline dev |
All non-mock adapters go through **Vercel AI SDK 6**. Adding a new provider is roughly 80 lines — see [`docs/authoring-an-adapter.md`](./docs/authoring-an-adapter.md).
---
## 📚 Slash commands (in the CLI)
```
/help list commands
/paths show resolved ~/.omni/ paths
/usage cumulative token usage and cost
/session current session ID
/model active model
/mode show or switch run mode (/mode plan | auto | build)
/plan switch to plan mode (read-only + planner)
/auto switch to auto mode (full tools, no permission prompts)
/build switch to build mode (full tools + critic)
/skill pin/unpin a skill (/skill )
/history compact view of conversation so far
/quit exit (also: /exit)
```
---
## 🧪 What "self-improving" actually means
Three concrete mechanisms, in increasing autonomy:
**1. Adaptive prompts.** On first contact with a model, Omni runs `probeModel` — a small battery of cheap prompts (~600 tokens) that classify the model (native tool calls?, instruction-following?, verbosity?). `adapt(profile)` maps that to a strategy (which system prompt, ReAct fallback, iteration budget). **This is wired into startup** and cached per model in `~/.omni/db.sqlite`.
**2. Session traces.** Every run writes a JSONL trace to `~/.omni/traces/` plus rows to SQLite. `scoreTrace` ranks completed sessions; `replayTrace` + `checkTrace` re-run a trace against invariants — a library for regression-testing agent behavior.
**3. Prompt evolution** *(experimental)*. A genetic variant pool (`tournamentSelect`, `mutatePrompt`) is built and unit-tested, but **not** wired into the default loop — it ships as a programmatic API, not a v1 feature.
---
## 🏗 Architecture
The engine is a closed-loop controller:
```
┌─────────────┐ tool call ┌──────────────┐
│ Model │──────────────▶│ Engine │
│ (adapter) │ │ - validate │
│ │◀──────────────│ - permission │
└─────────────┘ result │ - execute │
│ - feed back │
└──────┬───────┘
│ events
▼
┌──────────────────┐
│ EngineEvent │
│ AsyncIterable │
└──────────────────┘
```
See [docs/architecture.md](./docs/architecture.md) for the full event taxonomy (22 types), lifecycle guarantees, and per-subsystem internals.
---
## 📦 Packages
| Package | Purpose |
|---|---|
| [`@omni/core`](./packages/core) | Engine loop, types, context, permissions, validator, tokenizer, paths/config |
| [`@omni/adapters`](./packages/adapters) | Vercel AI SDK adapters — openai-compatible (MiMo, Ollama…), Anthropic, OpenAI, Google + cost helper |
| [`@omni/tools`](./packages/tools) | `bash`, `read_file`, `write_file`, `edit`, `multi_edit`, `glob`, `grep`, `web_fetch`, MCP client |
| [`@omni/improve`](./packages/improve) | Planner, Critic, Memory, Probe, Adapt, FileTracer, replay, prompt evolution |
| [`@omni/storage`](./packages/storage) | `bun:sqlite` with versioned migrations + 7 repositories |
| [`@omni/cli`](./packages/cli) | Interactive terminal — slash commands, permission prompts, session persistence |
| [`@omni/server`](./packages/server) | HTTP + WebSocket server with WS-bridged permission requests |
| [`@omni/web`](./packages/web) | Minimal browser client |
| [`@omni/desktop`](./packages/desktop) | Tauri desktop app — React UI + bundled engine sidecar |
| [`@omni/vscode`](./packages/vscode) | VS Code extension *(unfinished — not in this release)* |
| [`@omni/cli-driver`](./packages/cli-driver) | Smoke-test driver |
---
## 🛠 Extending Omni
- **[Authoring a tool](./docs/authoring-a-tool.md)** — write a tool the model can use (contract, schemas, progress events, anti-patterns)
- **[Authoring an adapter](./docs/authoring-an-adapter.md)** — plug in a new model provider (translation utilities, provider-specific gotchas)
- **[Architecture](./docs/architecture.md)** — engine internals, event taxonomy, lifecycle
- **API reference** — generated with `bun run --cwd packages/core docs` (TypeDoc)
A minimal tool looks like this:
```ts
import { z } from "zod"
import type { Tool, ToolContext } from "@omni/core"
export const shout: Tool<{ text: string }, { result: string }> = {
name: "shout",
description: "Return the input in upper case.",
permission: "auto",
schema: z.object({ text: z.string() }),
async execute(args, ctx: ToolContext) {
return { result: args.text.toUpperCase() }
},
}
```
---
## 📊 Status
| | |
|---|---|
| **Tests** | 571 passing across 65 files |
| **Packages** | 11, all typecheck clean |
| **Source** | ~24,000 lines of TypeScript |
| **Verified live** | MiMo-V2.5-Pro driven end-to-end on a test project (read → edit → run tests, self-corrected via the verifier loop); prompt caching measured **−80%** tokens/cost on a repeat task |
| **Surfaces** | CLI (TUI + plain REPL), HTTP/WS server, web client, **Tauri desktop app** |
| **Not in this release** | VS Code extension (kept private); prompt-evolution loop (experimental API) |
---
## 🗺 Roadmap & honest debt
What's solid (click to expand)
| Aspect | Done |
|---|---|
| **1. Engine** | streaming, abort, loops, retries, parallel tools, snapshot, tracer hook, 4 fuzz-style property tests |
| **2. Types & API** | TSDoc on every public symbol, JSONSchema7 for tool params, tiered exports, TypeDoc generates clean docs |
| **3. Adapters** | 7 providers via Vercel AI SDK 6, reasoning_content roundtrip, cost computation, fake-fetch e2e tests |
| **4. Tools** | 8 built-ins + MCP (in-memory + real stdio tested), cross-platform shell with ANSI strip, path safety |
| **5. Context** | tiktoken tokenizer, **summarize-by-default compaction** (older turns compacted, not dropped), tool-result chunking, prompt-cache token surfacing |
| **6. Permissions** | gate types + audit + rule patterns; **destructive-bash denied by default**, opt-in workspace confinement, mode-aware (auto) gate |
| **7. Third brain** | Planner, Critic, keyword + embedding (`VectorMemory`) memory — all unit-tested and wired |
| **8. Self-improvement** | probe + adapt **wired into startup**; FileTracer on every run; scoreTrace/replay library; variant pool experimental |
| **9. Storage** | bun:sqlite, versioned migrations, 8 repos, FK cascades |
| **10. CLI** | TUI + plain REPL, slash commands, **plan/auto/build modes** + intent classifier, **long-term memory**, session persistence |
| **11. Surfaces** | server WS with permission forwarding tested 3 ways; web client with permission UI |
| **12. Testing** | typecheck script, CI workflow, trace replay, 4 property tests |
| **13. Documentation** | architecture + 2 author guides + .env.example + TypeDoc |
What's still imperfect (the honest list)
- **Engine** — the executor could be split out of `engine.ts`; no property test proving aborted runs never emit `tool.result`
- **Adapters** — rate-limit retry untested live; Google adapter not run end-to-end; Anthropic extended thinking not e2e-verified
- **Tools** — `grep`'s ripgrep path is exercised only when `rg` is installed; `web_fetch` markdown untested on complex layouts
- **Permissions** — destructive-bash deny + opt-in workspace confinement (`restrictToWorkspace`) + `allowlistGate` ship, but the bash path-confinement is heuristic, not a true per-command sandbox
- **Self-improvement** — probe/adapt are wired; the prompt-**evolution** loop is built + tested but intentionally **not** wired for v1 (experimental API)
- **Storage** — no backup/restore command; forward-only migrations
- **CLI** — slash arg parsing is space-split (no quoted strings); no `/sessions` continue command yet
- **Desktop** — ships per-OS installers via CI, but no auto-update yet; the **VS Code** extension is unfinished and excluded from this release
- **Prompt caching** — surfaced + billed for OpenAI-compatible (incl. MiMo) and Anthropic; cached-token accounting on other providers depends on what their API reports
- **Testing** — no load test; no chaos/fuzz on the adapter translation layer
- **Docs** — no `examples/` directory; no FAQ yet
---
## 🤝 Contributing
Issues and PRs welcome. The workflow is straightforward:
```bash
bun install
bun test # ensure baseline is green
# ... make changes
bun run typecheck # all packages
bun test
```
A few conventions: tools belong in `@omni/tools` and follow the `Tool` contract; new model providers belong in `@omni/adapters` and use the AI SDK translation helpers; never throw from a permission gate (return deny); every public symbol gets a TSDoc comment.
---
## 🙏 Acknowledgments
Omni stands on the shoulders of:
- [**Vercel AI SDK**](https://github.com/vercel/ai) — provider translation, streaming, tool calling
- [**Bun**](https://bun.sh) — runtime, package manager, SQLite, test runner, all of it
- [**opencode**](https://github.com/sst/opencode) — studied for monorepo structure and the `reasoning_content` roundtrip pattern
- [**Model Context Protocol**](https://modelcontextprotocol.io) — the standard for extending agents with external tools
- [**ripgrep**](https://github.com/BurntSushi/ripgrep) — accelerates `grep` when present
- [**Xiaomi MiMo**](https://platform.xiaomimimo.com) — the model that proved this approach on a real open weights stack
---
## 📄 License
[MIT](./LICENSE) © Halleluyah Oludele
Built with care. Designed to outlast whatever model comes next.