https://github.com/boshu2/agentops
The missing DevOps layer for coding agents. Give it a goal, it ships validated code and gets smarter.
https://github.com/boshu2/agentops
ai-agents claude-code claude-code-plugins claude-marketplace codex codex-plugin cursor devops opencode-plugin vibe-coding
Last synced: 2 days ago
JSON representation
The missing DevOps layer for coding agents. Give it a goal, it ships validated code and gets smarter.
- Host: GitHub
- URL: https://github.com/boshu2/agentops
- Owner: boshu2
- License: other
- Created: 2025-11-05T19:18:56.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-02-16T21:12:38.000Z (23 days ago)
- Last Synced: 2026-02-16T21:24:02.643Z (23 days ago)
- Topics: ai-agents, claude-code, claude-code-plugins, claude-marketplace, codex, codex-plugin, cursor, devops, opencode-plugin, vibe-coding
- Language: Go
- Homepage: https://github.com/boshu2/12-factor-agentops
- Size: 30.1 MB
- Stars: 55
- Watchers: 1
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS
- Security: SECURITY.md
- Agents: AGENTS.md
Awesome Lists containing this project
README
# AgentOps
[](https://github.com/boshu2/agentops/actions/workflows/validate.yml)
[](https://github.com/boshu2/agentops/actions/workflows/nightly.yml)
### The local DevOps layer for coding agents.
AgentOps is the operating system around your coding agent: it tracks the work, validates the plan and code, and feeds what was learned into the next session.
[How It Works](#how-it-works) · [Install](#install) · [See It Work](#see-it-work) · [Skills](#skills) · [CLI](#the-ao-cli) · [FAQ](#faq) · [Newcomer Guide](docs/newcomer-guide.md)
---
## Why AgentOps Exists
Most coding-agent tools improve the session. AgentOps improves the repo around the session.
Without extra machinery, coding agents usually:
- start each task with partial memory
- rely on review culture instead of explicit gates
- execute work without durable issue state
- leave behind chat logs instead of reusable artifacts
AgentOps adds the missing layer:
- **Repo-native memory** via `.agents/` artifacts plus `ao` retrieval and injection
- **Validation before merge** with `/pre-mortem`, `/vibe`, and `/council`
- **Tracked execution** with beads issues, dependency waves, worktrees, and `/crank`
- **Compounding operation** where every cycle leaves behind artifacts the next one can use
---
## What Makes It Different
| If you want... | Most tools give you... | AgentOps gives you... |
|----------------|------------------------|-----------------------|
| Better prompting | A workflow or command pack | A repo-level operating model |
| Better review | CI or human review after the fact | `/pre-mortem` before build and `/vibe` before commit |
| Better memory | Notes or ad hoc docs | Retrieval, freshness weighting, and injection from `.agents/` |
| Better execution | Untracked agent runs | Issues, waves, worktrees, and audit trails |
---
## Install
```bash
# Claude Code (recommended): marketplace + plugin install
claude plugin marketplace add boshu2/agentops
claude plugin install agentops@agentops-marketplace
# Codex CLI (0.110.0+ native plugin; installs the compatibility skills bundle in ~/.agents/skills, enables the plugin, then open a fresh Codex session)
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-codex.sh | bash
# OpenCode
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-opencode.sh | bash
# Other Skills-compatible agents (agent-specific, install only what you need)
# Example (Cursor):
npx skills@latest add boshu2/agentops --cursor -g
```
### Install ao CLI (optional)
Skills work standalone — no CLI required. The `ao` CLI is what unlocks the full repo-native layer: knowledge extraction, retrieval and injection, maturity scoring, goals, and control-plane style workflows.
#### Homebrew (recommended)
```bash
brew tap boshu2/agentops https://github.com/boshu2/homebrew-agentops
brew install agentops
which ao
ao version
```
Or install via [release binaries](https://github.com/boshu2/agentops/releases) or [build from source](cli/README.md).
Then type `/quickstart` in your agent chat.
---
## How It Works
AgentOps is built around five commands. They are not just prompts. They create tracked state, validation artifacts, and reusable knowledge.
| Step | Command | What Happens | Calls Internally |
|------|---------|--------------|------------------|
| 1 | `/research` | Explore the codebase and load prior repo knowledge | — |
| 2 | `/plan` | Break work into tracked issues with dependency waves | `/beads` |
| 3 | `/pre-mortem` | Challenge the plan before code exists | `/council` |
| 4 | `/crank` | Execute unblocked work in waves with validation and commits | `/implement`, `/vibe` |
| 5 | `/post-mortem` | Validate what shipped and extract reusable learnings | `/vibe`, `/retro` |
`/rpi` chains all five. `/evolve` loops `/rpi` overnight with fitness-gated regression checks.
### What those commands actually add
- `/research` keeps the agent from starting cold
- `/plan` turns work into a graph instead of a chat todo list
- `/pre-mortem` catches failure modes before implementation
- `/crank` makes multi-step work executable in dependency order
- `/post-mortem` turns one finished task into better future context
This is why AgentOps feels different in practice: the output is not just code. It is code + state + memory + gates.
| Pattern | Chain | When |
|---------|-------|------|
| **Quick fix** | `/implement` | One issue, clear scope |
| **Validated fix** | `/implement` → `/vibe` | One issue, want confidence |
| **Planned epic** | `/plan` → `/pre-mortem` → `/crank` → `/post-mortem` | Multi-issue, structured |
| **Full pipeline** | `/rpi` (chains all above) | End-to-end, autonomous |
| **Evolve loop** | `/evolve` (chains `/rpi` repeatedly) | Fitness-scored improvement |
| **PR contribution** | `/pr-research` → `/pr-plan` → `/pr-implement` → `/pr-validate` → `/pr-prep` | External repo |
| **Knowledge query** | `/knowledge` → `/research` (if gaps) | Understanding before building |
| **Standalone review** | `/council validate ` | Ad-hoc multi-judge review |
### The three systems underneath it
- **Execution system**: `/plan`, beads, worktrees, `/crank`, `/evolve`
- **Validation system**: `/pre-mortem`, `/vibe`, `/council`
- **Memory system**: `.agents/`, `ao lookup`, `ao forge`, `/retro`, `/knowledge`
That is the real architecture. Not just a skill pack. A local operating layer around the agent.
### Why the memory matters
This is the compounding part, but now in mechanical terms. Your agent validates a PR, the decisions and patterns are written to `.agents/`, and the next relevant task starts with that context already loaded:
```text
> /research "retry backoff strategies"
[lookup] 3 prior learnings found (freshness-weighted):
- Token bucket with Redis (established, high confidence)
- Rate limit at middleware layer, not per-handler (pattern)
- /login endpoint was missing rate limiting (decision)
[research] Found prior art in your codebase + retrieved context
Recommends: exponential backoff with jitter, reuse existing Redis client
```
Session 5 did not start from scratch. It started with scored, retrieved, repo-specific context. Stale insights [decay automatically](docs/the-science.md). Useful ones keep getting cited, reinforced, and reused.
### Why engineers buy in
- **Local-only** — no telemetry, no cloud, no accounts. Nothing phones home.
- **Auditable** — plans, verdicts, learnings, and patterns are plain files on disk.
- **Multi-runtime** — Claude Code, Codex CLI, Cursor, OpenCode.
- **Harder to drift** — tracked issues and validation gates mean the repo is less dependent on agent mood or memory.
Everything is [open source](cli/) — audit it yourself.
---
The ao CLI — powers the knowledge flywheel
See [The ao CLI](#the-ao-cli) for full reference.
OpenCode — plugin + skills
Installs 7 hooks (tool enrichment, audit logging, compaction resilience) and symlinks all skills. Restart OpenCode after install. Details: [.opencode/INSTALL.md](.opencode/INSTALL.md)
Configuration — environment variables
All optional. AgentOps works out of the box with no configuration.
**Council / validation:**
| Variable | Default | What it does |
|----------|---------|-------------|
| `COUNCIL_TIMEOUT` | 120 | Judge timeout in seconds |
| `COUNCIL_CLAUDE_MODEL` | sonnet | Claude model for judges (`opus` for high-stakes) |
| `COUNCIL_CODEX_MODEL` | (user's Codex default) | Override Codex model for `--mixed` |
| `COUNCIL_EXPLORER_MODEL` | sonnet | Model for explorer sub-agents |
| `COUNCIL_EXPLORER_TIMEOUT` | 60 | Explorer timeout in seconds |
| `COUNCIL_R2_TIMEOUT` | 90 | Debate round 2 timeout in seconds |
**Hooks:**
| Variable | Default | What it does |
|----------|---------|-------------|
| `AGENTOPS_HOOKS_DISABLED` | 0 | `1` to disable all hooks (kill switch) |
| `AGENTOPS_SESSION_START_DISABLED` | 0 | `1` to disable session-start hook |
| `AGENTOPS_STARTUP_CONTEXT_MODE` | `lean` | Startup mode: `lean` (default, auto-shrinks), `manual`, `legacy` |
| `AGENTOPS_AUTO_PRUNE` | 1 | `0` to disable automatic empty-learning pruning |
| `AGENTOPS_EVICTION_DISABLED` | 0 | `1` to disable knowledge eviction |
| `AGENTOPS_GITIGNORE_AUTO` | 1 | `0` to skip auto-adding `.agents/` to `.gitignore` |
Full reference with examples and precedence rules: [docs/ENV-VARS.md](docs/ENV-VARS.md)
**What AgentOps touches:**
| What | Where | Reversible? |
|------|-------|:-----------:|
| Skills | Global skill homes (`~/.agents/skills` for Codex/OpenAI-documented installs, plus compatibility caches outside your repo; for Claude Code: `~/.claude/skills/`) | `rm -rf ~/.claude/skills/ ~/.agents/skills/ ~/.codex/skills/ ~/.codex/plugins/cache/agentops-marketplace/agentops/` |
| Knowledge artifacts | `.agents/` in your repo (git-ignored by default) | `rm -rf .agents/` |
| Hook registration | `.claude/settings.json` | Delete entries from `.claude/settings.json` |
Nothing modifies your source code.
Troubleshooting: [docs/troubleshooting.md](docs/troubleshooting.md)
---
## See It Work
Start simple. Each level builds on the last.
**1. Dip your toe** — one skill, one command:
```text
> /council validate this PR
[council] 3 judges spawned (independent, no anchoring)
[judge-1] PASS — token bucket implementation correct
[judge-2] WARN — rate limiting missing on /login endpoint
[judge-3] PASS — Redis integration follows middleware pattern
Consensus: WARN — add rate limiting to /login before shipping
```
**2. Full pipeline** — research through post-mortem, one command:
```text
> /rpi "add retry backoff to rate limiter"
[research] Found 3 prior learnings on rate limiting (injected)
[plan] 2 issues, 1 wave → epic ag-0058
[pre-mortem] Council validates plan → PASS (knew about Redis choice)
[crank] Parallel agents: Wave 1 ██ 2/2
[vibe] Council validates code → PASS
[post-mortem] 2 new learnings → .agents/
[flywheel] Next: /rpi "add circuit breaker to external API calls"
```
**3. The endgame** — `/evolve`: define goals, walk away, come back to a better codebase.
Every skill composes into one loop: measure the worst goal gap, run `/rpi` to fix it, validate nothing regressed, extract what was learned, repeat. Goals give it intent. Regression gates give it safety. The knowledge flywheel means cycle 50 knows what cycle 1 learned.
```text
> /evolve
[evolve] GOALS.md: 18 gates loaded, score 77.0% (14/18 passing)
[cycle-1] Worst: wiring-closure (weight 6) + 3 more
/rpi "Fix failing goals" → score 93.3% (25/28) ✓
── the agent naturally organizes into phases ──
[cycle-2-35] Coverage blitz: 17 packages from ~85% → ~97% avg
Table-driven tests, edge cases, error paths
[cycle-38-59] Benchmarks added to all 15 internal packages
[cycle-60-95] Complexity annihilation: zero functions >= 8
(was dozens >= 20 — extracted helpers, tested independently)
[cycle-96-116] Modernization: sentinel errors, exhaustive switches,
Go 1.26-compatible idioms (slices, cmp.Or, range-over-int)
[teardown] 203 files changed, 20K+ lines, 116 cycles
All tests pass. Go vet clean. Avg coverage 97%.
/post-mortem → 33 learnings extracted
Ready for next /evolve — the floor is now the ceiling.
```
That ran overnight — ~7 hours, unattended. Regression gates auto-reverted anything that broke a passing goal. Severity-based selection naturally produced the right order: build a safety net (tests), use it to refactor aggressively (complexity), then polish. Nobody told it that.
More examples — swarm, session continuity, different workflows
**Parallelize anything** with `/swarm`:
```text
> /swarm "research auth patterns, brainstorm rate limiting improvements"
[swarm] 3 agents spawned — each gets fresh context
[agent-1] /research auth — found JWT + session patterns, 2 prior learnings
[agent-2] /research rate-limiting — found token bucket, middleware pattern
[agent-3] /brainstorm improvements — 4 approaches ranked
[swarm] Complete — artifacts in .agents/
```
**Session continuity across compaction or restart:**
```text
> /handoff
[handoff] Saved: 3 open issues, current branch, next action
Continuation prompt written to .agents/handoffs/
--- next session ---
> /recover
[recover] Found in-progress epic ag-0058 (2/5 issues closed)
Branch: feature/rate-limiter
Next: /implement ag-0058.3
```
**Different developers, different setups:**
| Workflow | Commands | What happens |
|----------|----------|-------------|
| **PR reviewer** | `/council validate this PR` | One command, actionable feedback, no setup |
| **Team lead** | `/research` → `/plan` → `/council validate` | Compose skills manually, stay in control |
| **Solo dev** | `/rpi "add user auth"` | Research through post-mortem, walk away |
| **Platform team** | `/swarm` + `/evolve` | Parallel pipelines + fitness-scored improvement loop |
Not sure which skill to run? See the [Skill Router](docs/SKILL-ROUTER.md).
---
## Skills
Every skill works alone. Compose them however you want.
**Judgment** — the foundation everything validates against:
| Skill | What it does |
|-------|-------------|
| `/council` | Independent judges (Claude + Codex) debate, surface disagreement, converge. `--preset=security-audit`, `--perspectives`, `--debate` for adversarial review |
| `/vibe` | Code quality review — complexity analysis + council. Gates on 0 CRITICAL findings. |
| `/pre-mortem` | Validate plans before implementation — council simulates failures |
| `/post-mortem` | Wrap up completed work — council validates + retro extracts learnings |
What /vibe checks
Semantic (does code match spec?) · Security (injection, auth bypass, secrets) · Quality (smells, dead code, magic numbers) · Architecture (layer violations, coupling, god classes) · Complexity (CC > 10, deep nesting) · Performance (N+1, resource leaks) · Slop (AI hallucinations, cargo cult code) · Accessibility (ARIA, keyboard nav)
1+ CRITICAL findings blocks until fixed.
**Execution** — research, plan, build, ship:
| Skill | What it does |
|-------|-------------|
| `/research` | Deep codebase exploration — produces structured findings |
| `/plan` | Decompose a goal into trackable issues with dependency waves |
| `/implement` | Full lifecycle for one task — research, plan, build, validate, learn |
| `/crank` | Parallel agents in dependency-ordered waves, fresh context per worker |
| `/swarm` | Parallelize any skill — run research, brainstorms, implementations in parallel |
| `/rpi` | Full pipeline: discovery (research + plan + pre-mortem) → implementation (crank) → validation (vibe + post-mortem) |
| `/evolve` | The endgame: measure goals, fix the worst gap, regression-gate everything, learn, repeat overnight |
**Knowledge** — the flywheel that makes sessions compound:
| Skill | What it does |
|-------|-------------|
| `/knowledge` | Query learnings, patterns, and decisions across `.agents/` |
| `/retro` | Manually capture a decision, pattern, or lesson |
| `/retro` | Extract learnings from completed work |
| `/flywheel` | Monitor knowledge health — velocity, staleness, pool depths |
**Supporting skills:**
| | |
|---|---|
| **Onboarding** | `/quickstart`, `/using-agentops` |
| **Session** | `/handoff`, `/recover`, `/status` |
| **Traceability** | `/trace`, `/provenance` |
| **Product** | `/product`, `/goals`, `/release`, `/readme`, `/doc` |
| **Utility** | `/brainstorm`, `/bug-hunt`, `/complexity` |
Full reference: [docs/SKILLS.md](docs/SKILLS.md)
Cross-runtime orchestration — mix Claude, Codex, OpenCode
AgentOps orchestrates across runtimes. Claude can lead a team of Codex workers. Codex judges can review Claude's output.
| Spawning Backend | How it works | Best for |
|-----------------|-------------|----------|
| **Native teams** | `TeamCreate` + `SendMessage` — built into Claude Code | Tight coordination, debate |
| **Background tasks** | `Task(run_in_background=true)` — last-resort fallback | When no team APIs available |
| **Codex sub-agents** | `/codex-team` — Claude orchestrates Codex workers | Cross-vendor validation |
Custom agents — why AgentOps ships its own
AgentOps includes two small, purpose-built agents that fill gaps between Claude Code's built-in agent types:
| Agent | Model | Can do | Can't do |
|-------|-------|--------|----------|
| `agentops:researcher` | haiku | Read, search, **run commands** (`gocyclo`, `go test`, etc.) | Write or edit files |
| `agentops:code-reviewer` | sonnet | Read, search, run `git diff`, produce structured findings | Write or edit files |
**The gap they fill:** Claude Code's built-in `Explore` agent can search code but can't run commands. Its `general-purpose` agent can do everything but uses the primary model (expensive) and has full write access. The custom agents sit in between — read-only discipline with command execution, at lower cost.
| Need | Best agent |
|------|-----------|
| Find a file or function | `Explore` (fastest) |
| Explore + run analysis tools | `agentops:researcher` (haiku, read-only + Bash) |
| Make changes to files | `general-purpose` (full access) |
| Review code after changes | `agentops:code-reviewer` (sonnet, structured review) |
Skills spawn these agents automatically — you don't pick them manually. `/research` uses the researcher, `/vibe` uses the code-reviewer, `/crank` uses general-purpose for workers.
---
## Deep Dive
### The Knowledge Ledger
`.agents/` is an append-only ledger with cache-like semantics. Nothing gets overwritten — every learning, council verdict, pattern, and decision is a new dated file. Freshness decay prunes what's stale. The cycle:
```
Session N ends
→ ao forge: mine transcript for learnings, decisions, patterns
→ ao notebook update: merge insights into MEMORY.md
→ ao memory sync: sync to repo-root MEMORY.md (cross-runtime)
→ ao maturity --expire: mark stale artifacts (freshness decay ~17%/week)
→ ao maturity --evict: archive what's decayed past threshold
→ ao feedback-loop: citation-to-utility feedback (MemRL)
Session N+1 starts
→ ao lookup (on demand): score artifacts by recency + utility
├── Local .agents/ learnings & patterns (1.0x weight)
├── Global ~/.agents/ cross-repo knowledge (0.8x weight)
├── Work-scoped boost: active issue gets 1.5x (--bead)
├── Predecessor handoff: what the last session was doing (--predecessor)
└── Trim to ~1000 tokens — lightweight, not encyclopedic
→ Agent starts where the last one left off
```
**Injection philosophy: sessions compound instead of reset.** Each session starts with a small, curated packet — not a data dump. Three tiers, descending priority: local `.agents/` → global `~/.agents/` → legacy `~/.claude/patterns/`. If the task needs deeper context, the agent searches `.agents/` on demand.
Write once, score by freshness, inject the best, prune the rest. The [formal model](docs/the-science.md) is cache eviction with freshness decay and limits-to-growth controls.
```
/rpi "goal"
│
├── /research → /plan → /pre-mortem → /crank → /vibe
│
▼
/post-mortem
├── validates what shipped
├── extracts learnings → .agents/
└── suggests next /rpi command ────┐
│
/rpi "next goal" ◄──────────────────┘
```
The post-mortem analyzes each learning, asks "what process would this improve?", and writes improvement proposals. It hands you a ready-to-copy `/rpi` command. Paste it, walk away.
Phase details — what each step does
1. **`/research`** — Explores your codebase. Produces a research artifact with findings and recommendations.
2. **`/plan`** — Decomposes the goal into issues with dependency waves. Derives scope boundaries and conformance checks. Creates a [beads](https://github.com/steveyegge/beads) epic (git-native issue tracking).
3. **`/pre-mortem`** — Judges simulate failures before you write code, including a spec-completeness judge. FAIL? Re-plan with feedback (max 3 retries).
4. **`/crank`** — Spawns parallel agents in dependency-ordered waves. Each worker gets fresh context. Lead validates and commits. Runs until every issue is closed. `--test-first` for spec-first TDD.
5. **`/vibe`** — Judges validate the code. FAIL? Re-crank with failure context and re-vibe (max 3).
6. **`/post-mortem`** — Council validates the implementation. Retro extracts learnings. **Suggests the next `/rpi` command.**
`/rpi "goal"` runs all six end to end. Use `--interactive` for human gates at research and plan.
Phased RPI — fresh context per phase for larger goals
`ao rpi phased "goal"` runs each phase in its own session — no context bleed between phases. Use `/rpi` when context fits in one session. Use `ao rpi phased` when you need phase-level resume control. For autonomous control-plane operation, use the canonical path `ao rpi loop --supervisor`. See [The ao CLI](#the-ao-cli) for examples.
Parallel RPI — run N epics concurrently in isolated worktrees
`ao rpi parallel` runs multiple epics at the same time, each in its own git worktree. Every epic gets a full 3-phase RPI lifecycle (discovery → implementation → validation) with zero cross-contamination, then merges back sequentially.
```
ao rpi parallel --manifest epics.json # Named epics with merge order
ao rpi parallel "add auth" "add logging" # Inline goals (auto-named)
ao rpi parallel --no-merge --manifest m.json # Leave worktrees for manual review
```
```
ao rpi parallel
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ worktree │ │ worktree │ │ worktree │
│ epic/A │ │ epic/B │ │ epic/C │
├───────────┤ ├───────────┤ ├───────────┤
│ 1 discover│ │ 1 discover│ │ 1 discover│
│ 2 build │ │ 2 build │ │ 2 build │
│ 3 validate│ │ 3 validate│ │ 3 validate│
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
└───────────────┼───────────────┘
▼
merge A → B → C (in order)
│
gate script (CI)
```
Each phase spawns a fresh Claude session — no context bleed. Worktree isolation means parallel epics can touch the same files without conflicts. The merge order is configurable (manifest `merge_order` or `--merge-order` flag) so dependency-heavy epics land first.
Setting up /evolve — GOALS.md and the fitness loop
Bootstrap with `ao goals init` — it interviews you about your repo and generates mechanically verifiable goals. Or write them by hand:
```markdown
# GOALS.md
## test-pass-rate
- **check:** `make test`
- **weight:** 10
All tests pass.
## code-complexity
- **check:** `gocyclo -over 15 ./...`
- **weight:** 6
No function exceeds cyclomatic complexity 15.
```
Migrating from GOALS.yaml? Run `ao goals migrate --to-md`. Manage goals with `ao goals steer add/remove/prioritize` and prune stale ones with `ao goals prune`.
`/evolve` measures them, picks the worst gap by weight, runs `/rpi` to fix it, re-measures ALL goals (regressed commits auto-revert), and loops. It commits locally — you control when to push. Kill switch: `echo "stop" > ~/.config/evolve/KILL`
**Built for overnight runs.** Cycle state lives on disk, not in LLM memory — survives context compaction. Every cycle writes to `cycle-history.jsonl` with verified writes, a regression gate that refuses to proceed without a valid fitness snapshot, and a watchdog heartbeat for external monitoring. If anything breaks the tracking invariant, the loop stops rather than continuing ungated. See `skills/SKILL-TIERS.md` for the two-tier execution model that keeps the orchestrator visible while workers fork.
Maintain over time: `/goals` shows pass/fail status, `/goals prune` finds stale or broken checks.
References — science, systems theory, prior art
Built on [Darr 1995](docs/the-science.md) (decay rates), Sweller 1988 (cognitive load), [Liu et al. 2023](docs/the-science.md) (lost-in-the-middle), [MemRL 2025](https://arxiv.org/abs/2601.03192) (RL for memory).
AgentOps concentrates on the high-leverage end of [Meadows' hierarchy](https://en.wikipedia.org/wiki/Twelve_leverage_points): information flows (#6), rules (#5), self-organization (#4), goals (#3). The bet: changing the loop beats tuning the output.
Deep dive: [docs/how-it-works.md](docs/how-it-works.md) — Brownian Ratchet, Ralph Wiggum Pattern, agent backends, hooks, context windowing.
---
## The `ao` CLI
Skills work standalone — no CLI required. The `ao` CLI adds the knowledge flywheel (extract, inject, decay, maturity) and terminal-based RPI that runs without an active chat session. Each phase gets its own fresh context window, so large goals don't hit context limits.
```bash
ao seed # Plant AgentOps in any repo (auto-detects project type)
ao rpi loop --supervisor --max-cycles 1 # Canonical autonomous cycle (policy-gated landing)
ao rpi loop --supervisor "fix auth bug" # Single explicit-goal supervised cycle
ao rpi phased --from=implementation "ag-058" # Resume a specific phased run at build phase
ao rpi parallel --manifest epics.json # Run N epics concurrently in isolated worktrees
ao rpi status --watch # Monitor active/terminal runs
```
Walk away, come back to committed code + extracted learnings.
Recovery and hygiene
Supervisor determinism contract: task failures mark queue entries failed, infrastructure failures leave queue entries retryable, and `ao rpi cancel` ignores stale supervisor lease metadata. Pair with `ao rpi cleanup --all --prune-worktrees --prune-branches` for full hygiene.
```bash
ao search "query" # Search knowledge across files and chat history
ao lookup --query "topic" # Retrieve specific knowledge artifacts by ID or relevance
ao notebook update # Merge latest session insights into MEMORY.md
ao memory sync # Sync session history to MEMORY.md (cross-runtime: Codex, OpenCode)
ao context assemble # Build 5-section context briefing for a task
ao feedback-loop # Close the MemRL feedback loop (citation → utility → maturity)
ao metrics health # Flywheel health: sigma, rho, delta, escape velocity
ao dedup # Detect near-duplicate learnings (--merge for auto-resolution)
ao contradict # Detect potentially contradictory learnings
ao demo # Interactive demo
```
`ao search` (built on [CASS](https://github.com/Dicklesworthstone/coding_agent_session_search)) indexes every chat session from every runtime — Claude Code, Codex, Cursor, OpenCode, anything that writes to `.agents/ao/sessions/`. Extraction is best-effort; indexing is unconditional.
Second Brain + Obsidian vault — semantic search over all your sessions
`.agents/` is a plain-text directory. Open it directly as an Obsidian vault — every learning, council verdict, research artifact, plan, and session transcript is instantly browsable and linkable.
For semantic search, pair it with [Smart Connections](https://github.com/brianpetro/obsidian-smart-connections):
- **Local embeddings** — runs on CPU, no API key. GPU-accelerated with llama.cpp for large vaults.
- **Smart Connections MCP** — expose your vault through the MCP server so agents do semantic retrieval directly, without leaving the chat.
Agentic search (`ao search` or the MCP) handles most retrieval. Open Obsidian when you want to explore, build a knowledge map, or trace a decision back through its history. Either way, nothing is lost — every session is indexed.
Full reference: [CLI Commands](cli/docs/COMMANDS.md)
---
## Architecture
Five pillars, one recursive shape. The same pattern — lead decomposes work, workers execute atomically, validation gates lock progress, next wave begins — repeats at every scale:
```
/implement ── one worker, one issue, one verify cycle
└── /crank ── waves of /implement (FIRE loop)
└── /rpi ── research → plan → crank → validate → learn
└── /evolve ── fitness-gated /rpi cycles
```
Each level treats the one below as a black box: spec in, validated result out. Workers get fresh context per wave ([Ralph Wiggum Pattern](https://ghuntley.com/ralph/)), never commit (lead-only), and communicate through the filesystem — not accumulated chat context. Parallel execution works because each unit of work is **atomic**: no shared mutable state with concurrent workers.
**Two-tier execution model.** Skills follow a strict rule: *the orchestrator never forks; the workers it spawns always fork.* Orchestrators (`/evolve`, `/rpi`, `/crank`, `/vibe`, `/post-mortem`) stay in the main session so you can see progress and intervene. Worker spawners (`/council`, `/codex-team`) fork into subagents where results merge back via the filesystem. This was learned the hard way — orchestrators that forked became invisible, losing cycle-by-cycle visibility during long runs. See [`SKILL-TIERS.md`](skills/SKILL-TIERS.md) for the full classification.
Validation is mechanical, not advisory. [Multi-model councils](docs/ARCHITECTURE.md#pillar-2-brownian-ratchet) judge before and after implementation. The [knowledge flywheel](docs/ARCHITECTURE.md#pillar-4-knowledge-flywheel) extracts learnings, scores them, and re-injects them at session start so each cycle compounds.
Full treatment: [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) — all five pillars, operational invariants, component overview.
---
## How AgentOps Fits With Other Tools
These are fellow experiments in making coding agents work. Use pieces from any of them.
| Alternative | What it does well | Where AgentOps focuses differently |
|-------------|-------------------|-------------------------------------|
| **[GSD](https://github.com/glittercowboy/get-shit-done)** | Clean subagent spawning, fights context rot | Cross-session memory (GSD keeps context fresh *within* a session; AgentOps carries knowledge *between* sessions) |
| **[Compound Engineer](https://github.com/EveryInc/compound-engineering-plugin)** | Knowledge compounding, structured loop | Multi-model councils and validation gates — independent judges debating before and after code ships |
[Detailed comparisons →](docs/comparisons/)
---
## FAQ
[docs/FAQ.md](docs/FAQ.md) — comparisons, limitations, subagent nesting, PRODUCT.md, uninstall.
---
Built on — Ralph Wiggum, Multiclaude, beads, CASS, MemRL
[Ralph Wiggum](https://ghuntley.com/ralph/) (fresh context per agent) · [Multiclaude](https://github.com/dlorenc/multiclaude) (validation gates) · [beads](https://github.com/steveyegge/beads) (git-native issues) · [CASS](https://github.com/Dicklesworthstone/coding_agent_session_search) (session search) · [MemRL](https://arxiv.org/abs/2601.03192) (cross-session memory)
## Contributing
Issue tracking — Beads / bd
Git-native issues in `.beads/`. `bd onboard` (setup) · `bd ready` (find work) · `bd show ` · `bd close ` · `bd vc status` (optional Dolt state check; JSONL auto-sync is automatic). More: [AGENTS.md](AGENTS.md)
See [CONTRIBUTING.md](docs/CONTRIBUTING.md). If AgentOps helped you ship something, post in [Discussions](https://github.com/boshu2/agentops?tab=discussions).
## License
Apache-2.0 · [Docs](docs/INDEX.md) · [How It Works](docs/how-it-works.md) · [FAQ](docs/FAQ.md) · [Glossary](docs/GLOSSARY.md) · [Architecture](docs/ARCHITECTURE.md) · [Configuration](docs/ENV-VARS.md) · [CLI Reference](cli/docs/COMMANDS.md) · [Changelog](docs/CHANGELOG.md)