https://github.com/noamsto/agent-smith
An Agent that propagates improvements into other agents β mines session history + audits freshness, then PRs fixes.
https://github.com/noamsto/agent-smith
ai-agents claude-code developer-tools duckdb llm meta-agent nix self-improving
Last synced: 6 days ago
JSON representation
An Agent that propagates improvements into other agents β mines session history + audits freshness, then PRs fixes.
- Host: GitHub
- URL: https://github.com/noamsto/agent-smith
- Owner: noamsto
- License: mit
- Created: 2026-06-01T12:01:57.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-06-22T06:49:33.000Z (10 days ago)
- Last Synced: 2026-06-22T07:23:51.985Z (10 days ago)
- Topics: ai-agents, claude-code, developer-tools, duckdb, llm, meta-agent, nix, self-improving
- Language: Go
- Size: 455 KB
- Stars: 3
- Watchers: 0
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# πΆοΈ agent-smith
**An Agent whose only purpose is propagating improvements into other agents.**
*It patrols the agents-matrix, finds the glitches where reality stuttered, and rewrites the rules so the dΓ©jΓ vu stops happening.*
[](docs/HANDOFF.md)
[](#the-two-ways-it-sees)
[](https://docs.claude.com/en/docs/claude-code)
[](https://nixos.org)
[](LICENSE)
---
> *"Never send a human to do a machine's job."* β Agent Smith
>
> You keep telling your agents the same thing. *Read skeletons, not whole files.
> That flag was renamed. Don't retry the failing command.* They nod, and three
> sessions later they do it again. That repetition **is** the glitch in the
> matrix β and agent-smith exists to patch it at the source: the instructions
> themselves.
agent-smith reads how your Claude Code agents *actually behaved* across hundreds
of past sessions, checks whether what their instructions *claim* is still true in
the world, and opens pull requests that fix the agent β not the symptom.
It edits the things that steer agents: subagent definitions, skills, `CLAUDE.md`
files, slash commands. Then it writes down **why**, and later checks whether the
glitch actually stopped recurring.
---
## πΆοΈ The two ways it sees
agent-smith runs two intelligence tracks into one mind. One looks *backward* at
what happened; the other looks *outward* at what's still true.
```mermaid
flowchart TD
subgraph A["π TRACK A Β· corpus mining β what did the agents do wrong?"]
direction LR
sh["session history
~hundreds of .jsonl"] -->|"duckdb Β· jq Β· cheap"| ext["extractor"] --> inc(["incidents"])
end
subgraph B["π TRACK B Β· freshness audit (planned) β is what they claim still true?"]
direction LR
art["the artifacts
tools Β· flags Β· APIs Β· URLs"] --> claim["claim
extractor"] -->|"web Β· docs Β· context7"| exp["explorers"] --> ver(["claim verdicts"])
end
inc --> AN
ver --> AN
AN{{"π§ ANALYST Β· Opus
clusters glitches Β· diagnoses the fix Β· writes the reason"}}
AN -->|"proposals + reason logs"| AP["π€ APPLIER
finds the repo that owns the artifact β opens a PR there"]
AP -->|"after the PR merges"| DV["πΆοΈ DΓJΓ-VU
re-mines later sessions β did the glitch stop?"]
classDef track fill:#0d1117,stroke:#30363d,color:#8b949e;
classDef node fill:#0d1117,stroke:#00ff41,color:#c9d1d9;
classDef brain fill:#0d2818,stroke:#00ff41,color:#00ff41;
class sh,ext,inc,art,claim,exp,ver,AP,DV node;
class AN brain;
class A,B track;
```
**Track A β corpus mining.** Pure SQL (DuckDB) over your `.jsonl` session logs.
No model, no token cost, runs over everything. It hunts four kinds of glitch:
| Signal | The tell |
|--------|----------|
| **tool_error** | a tool call came back as an error |
| **retry** | the identical call re-issued within a few turns β and the earlier attempt **failed** (intentional successful re-runs don't count) |
| **user_correction** | you said *"no"*, *"actuallyβ¦"*, *"revert that"* β or interrupted |
| **inefficiency** | unbounded whole-file reads where a skeleton would do |
*Repeated guidance* β the same glitch across **β₯3 distinct sessions** β is the
analyst's clustering threshold, applied on top of every signal: a pattern, not a
fluke. Big clusters are fed to the Oracle as a **session-stratified sample**
(breadth across sessions before depth) with truthful totals, so even a
2,000-incident cluster fits in one diagnosis.
**Track B β freshness audit** *(planned)*. The backward look can't catch a rule that was right
when written and rotted since. So agent-smith reads the artifacts, extracts every
external claim β a tool name, a CLI flag, a library API, a URL, a "best practice" β
and **fans out one explorer per claim** to check it against the live world
(`context7`, web search, changelogs). `changed` and `dead` claims become fixes.
The design bet: **the extractor is dumb and cheap, the analyst is smart and narrow.**
Cost scales with the number of glitches, not the size of your history.
---
## What it actually changes
Every fix is one of six moves. The interesting one is the last.
| Fix | When | What it does |
|-----|------|--------------|
| **add** | no guidance exists | write the missing rule |
| **strengthen** | the rule exists but gets ignored | raise it, sharpen it, make it imperative |
| **fix-stale** | a flag/API/file the rule names has changed | correct the reference |
| **remove** | the guidance contradicts itself or causes the glitch | cut it |
| **escalate-out-of-instructions** | a *prose* rule keeps failing no matter how loud | stop asking nicely β propose a **hook** |
| **skip** | the current harness/system prompt already enforces it | decline β redundant instructions are their own glitch source |
That last move is the whole philosophy: when a rule is reliably ignored, the
answer isn't a louder rule, it's **defining the error out of existence** β
converting a suggestion the model can rationalize past into deterministic
enforcement the harness runs. agent-smith is allowed to propose that.
Nothing lands silently. Every change ships as a **draft pull request** against
whichever repo owns the artifact β gated by a deterministic **preflight** (title
lint, exactly one commit over `origin/`, no files beyond what the editor
reported) and a subagent **verify** pass β with a **reason log** entry: the
diagnosis, the evidence, the expected effect. You merge; nothing merges itself.
Later, `dΓ©jΓ -vu` re-mines and records whether the glitch rate actually dropped.
*Cause, effect, receipts.*
---
## π¦ Install
agent-smith is a **Claude Code plugin** (this repo doubles as its own
single-plugin marketplace):
```
/plugin marketplace add noamsto/agent-smith
/plugin install agent-smith@agent-smith
```
Then run it:
```
/agent-smith:run # the whole loop, autonomously β draft PRs
/agent-smith:mine # extractor β clusters
/agent-smith:propose # Oracle per cluster β proposals (review-only)
/agent-smith:apply [] # editor β verify β draft PR
/agent-smith:status # where things stand
```
First run bootstraps everything: the `extractor`/`analyst`/`applier` binaries
(and the `duckdb` CLI, if you don't have one) download automatically for your
OS/arch into `~/.cache/agent-smith/bin`. The only assumptions are `git` and an
authenticated `gh`. Binaries already on PATH β e.g. nix-installed β are used
as-is, never downloaded over.
Declarative install (settings.json)
```json
{
"extraKnownMarketplaces": {
"agent-smith": { "source": { "source": "github", "repo": "noamsto/agent-smith" } }
},
"enabledPlugins": { "agent-smith@agent-smith": true }
}
```
Nix / Home Manager (engine on PATH)
Add the flake as an input and import the Home Manager module; it puts the
`extractor`/`analyst`/`applier` binaries on PATH with their deps (duckdb, git,
gh) already wrapped, so the plugin never has to download them:
```nix
# flake.nix
inputs.agent-smith.url = "github:noamsto/agent-smith";
# home.nix (or any Home Manager module)
imports = [ inputs.agent-smith.homeManagerModules.default ];
programs.agent-smith.enable = true;
```
Enable the plugin itself via the `settings.json` keys above.
## π§ Develop
```bash
nix develop # devshell: go, duckdb, jq, git, gh, goreleaser
go test ./... # full suite
nix build .#default # β result/bin/{extractor,analyst,applier}
goreleaser release --snapshot --clean # local release dry-run β dist/
```
---
## Status
> *"It is inevitable."*
**Phase 1 is live.** The loop β extractor β analyst (Oracle) β applier β is built,
tested, and proven end-to-end on a real corpus. Fittingly, the acceptance-bar
glitch itself (agents ignoring the skeleton-first reading rule: 147 incidents
across 87 sessions) came out the other side as the **first real pull request** β
proposing a PreToolUse hook, the `escalate-out-of-instructions` move, exactly as
designed.
π **Design:** [`docs/specs/2026-06-01-agent-smith-design.md`](docs/specs/2026-06-01-agent-smith-design.md) Β· **Working state:** [`docs/HANDOFF.md`](docs/HANDOFF.md)
**Roadmap**
- β
**Phase 1 β MVP.** Track A β analyst β draft PR + reason logs; the
`/agent-smith` plugin; pre-PR preflight. Acceptance bar met β and shipped as a
real PR.
- β **Next.** Declarative install wiring ([#3](https://github.com/noamsto/agent-smith/issues/3)) Β·
HTML status dashboard ([#2](https://github.com/noamsto/agent-smith/issues/2)) Β·
**Track B** freshness audit.
- π **Phase 2 β the loop.** `dΓ©jΓ -vu` trend validation; scheduled runs;
auto-commit for self-owned artifacts. The self-improving flywheel.
- πͺ **Phase 3 β the hook.** Inline capture so future mining gets even cheaper.
---
πΆοΈ
*There is no spoon. There is only the diff.*