https://github.com/jagmarques/company-skill
Fable 5 / Mythos style orchestration: a multi-agent company that verifies its own work and cannot stop until every criterion passes.
https://github.com/jagmarques/company-skill
agent-orchestration agentic-workflow ai-agents autonomous-agents claude-code claude-fable claude-mythos claude-skills fable llm-orchestration model-agnostic multi-agent mythos stop-hook
Last synced: 13 days ago
JSON representation
Fable 5 / Mythos style orchestration: a multi-agent company that verifies its own work and cannot stop until every criterion passes.
- Host: GitHub
- URL: https://github.com/jagmarques/company-skill
- Owner: jagmarques
- License: mit
- Created: 2026-04-07T08:54:21.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-06-14T13:12:41.000Z (15 days ago)
- Last Synced: 2026-06-14T13:21:09.173Z (15 days ago)
- Topics: agent-orchestration, agentic-workflow, ai-agents, autonomous-agents, claude-code, claude-fable, claude-mythos, claude-skills, fable, llm-orchestration, model-agnostic, multi-agent, mythos, stop-hook
- Language: JavaScript
- Homepage:
- Size: 2.36 MB
- Stars: 3
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# /company
[](https://www.npmjs.com/package/company-skill) [](https://www.npmjs.com/package/company-skill) [](https://github.com/jagmarques/company-skill/actions/workflows/check.yml) [](LICENSE)
**The agent company that can't stop until the work is verified done**
Your agent stops when it feels done. This makes it stop only when the work is actually done.
company-skill is a Claude Code skill for multi-agent orchestration. It runs your agents as a company with a verification gate, so a `/company` run cannot stop until every acceptance criterion is reproduced.
## What it is
You give it a goal. It splits the work across sub-agents, runs them in dependency waves, and keeps going across cycles until the result is proven instead of just claimed. It runs unattended. Each cycle re-runs every check, a reviewer attacks anything marked passing, and a stop guard blocks exit while any criterion still fails. Sub-agents get scoped delegation contracts rather than the whole problem. A digest writer compresses each finished cycle so the orchestrator stays under its context window, and work is split across models to keep cost down. Nothing is done until it is verified.
## Who it's for
Reach for it when "looks done" is not good enough and you want the loop to finish the job on its own:
- Build a REST API and prove the tests pass before it stops.
- Run an overnight refactor that won't quit half-done.
- Ship a feature where every acceptance criterion is reproduced, not asserted.
- Hand off a long task and come back to verified output instead of a half-finished draft.
Compared to a single agent or a plain prompt loop, the difference is the gate. A single agent stops when it decides it is finished. A company of agents keeps running, re-checking its own claims each cycle, until the evidence is there. It does not invent work outside the goal, and it tells you plainly when it is blocked.
Live dashboard: a tokens box and a separate cost box (cache savings, hit rate and cycles live with the dollars), a horizontal cycles-and-memory row, context gauge, agent table, delegation tree, and criteria checklist - auto-starts with every /company run.
```bash
npx company-skill install
```
```
/company "Build a REST API for user management with tests"
```
Optionally define your team first in `COMPANY.md` (skip it and a minimal company is created):
```markdown
## Engineering
- Backend Lead, API design and database architecture
- Frontend Dev, React components and state management
```
## How it works
Every criterion starts failing. Workers run in dependency waves under delegation contracts. At the end of each cycle, the Internal Reviewer re-runs every VERIFY-WITH command and the Devil's Advocate attacks everything marked passing. The stop guard physically blocks exit until every criterion has `passes: true` with reproduced evidence. Once done, `STATUS.md` and a `playbook.md` update are written for the next session.
```mermaid
flowchart TD
GOAL --> THINK
THINK --> EXECUTE["EXECUTE (parallel waves)"]
EXECUTE --> VERIFY
VERIFY -->|all criteria pass| DONE["Done (STATUS.md + playbook)"]
VERIFY -->|not done| COMPRESS
COMPRESS --> NEXT["THINK (next cycle)"]
NEXT --> EXECUTE
```
**Roles:** CEO orchestrator, Internal Reviewer, Devil's Advocate, Digest Writer. The orchestrator reads `COMPANY.md`, activates only the roles the goal needs, and writes delegation contracts in dependency order. Workers append FINDING + SOURCE lines to findings files. The Digest Writer compresses each finished cycle into the next cycle's briefing so the orchestrator never carries raw worker output in its own context.
## Dashboard
The dashboard starts automatically when you run `/company` and prints its URL in the cycle banner and the Claude Code status line. Each session gets its own port (7000-7999, derived from the session id). Open it in any browser.
```
http://127.0.0.1:7421 <- your session's link, printed at startup and in the status bar
```
A per-session identity header sits at the top: the project, the session id, the active model, and an "All projects" link to the cross-project roll-up. The project name is read from your `.company` directory, so two companies launched from the same folder stay labelled apart.
Every block is a card with the same tile look, so cost, cycles, agents, and criteria all read the same way. What you see, panel by panel:
**Cost and usage** - two stacked bands so dollars never interleave with token counts. The Tokens band shows the volumes (input, output, cache-write, this session) as plain counts. The Cost band shows every dollar figure together (today, this session, and "Saved by cheaper models vs all-opus"), with the list-price "not billed" note attached only to the dollars. The dollars come from ccusage at public list prices and are notional on a subscription plan, which the card says plainly. A model-policy toggle sits on the savings tile, reusing the same Apple-style pill as the auto-restart control: off is adaptive (cheaper models run sub-tasks for more savings), on forces the best model everywhere. It writes `.company/MODEL_POLICY` (`TIERED` or `FORCE_BEST`), the file the orchestrator reads at the start of each cycle, so the change applies next cycle.
**Context fill** - the live fill percentage, computed with the same formula the context-guard uses. When the session hits the restart threshold (default 50%), the gauge marks the gate before it fires. Next to it sits an Apple-style auto-restart pill, locked on - the restart block is always enforced and the toggle cannot turn it off.
**Delegation tree** - SVG tree of orchestrator, department leads, and workers, showing only the agents running right now. COMPANY.md is the source pool and the activated roster decides which departments are eligible, but a role node paints only when a currently-active agent maps to it, and a department appears only when at least one of its roles is live, so a finished or stale agent leaves no node behind. The orchestrator (CEO) root always shows. Long role names wrap inside their node instead of spilling out. Click any node to expand its current task and status. Zoom only with the pill-shaped +/- buttons, or use the expand button to blow the tree up to fill the screen and the contract button to bring it back. Drag to pan. A refresh resets the tree to its default view. Zero external JS libraries.
**Cycles and memory** - a savings card that shows cycle and memory counts plus the cache-read volume and the dollars saved by prompt caching, marked approximate. The model-tiering saving lives in the Cost band above, next to the policy toggle.
**Active agents** - centered live table of every agent the orchestrator has spawned this session, with model, status, and token count.
**Criteria** - compact progress view with a click-to-expand toggle for the full pass/fail list and reproduced evidence.
**All projects** - the "All projects" link near the session header opens `/all`, a cross-project roll-up. It reads `~/.claude/company-dashboards.json`, a small index every dashboard writes itself into, and shows aggregate cost, tokens, and cache reuse grouped by project. It lists only dashboards seen in the last 5 minutes, costs come from ccusage and are notional on a subscription plan, and a session whose usage row is missing shows `?` rather than zero. Each row links back to that session's own dashboard. A dashboard whose owning session has ended drops itself from the index, so `/all` never lists a stale one.
The dashboard binds 127.0.0.1 only, reads local files, and sends nothing anywhere. Override the port with `COMPANY_DASHBOARD_PORT`.
**You get this automatically.** Installing company-skill (npm or the install script) copies `dashboard.js` and `statusline.js` alongside the rest of the skill. Running `/company` then starts the dashboard for you and prints its localhost URL at startup and in the status line. No extra setup.
### Status line
`scripts/statusline.js` appends a labelled link to the per-session dashboard so the URL is one glance away on every turn. The appended segment is `📊 company dashboard `.
With no status line of your own configured, it also renders the model and context fill. The context fill draws as a progress bar using Claude's native used_percentage so the number matches the session indicator, so the full line reads:
```
Opus 4.8 (1M context) | [████░░░░░░] 25% | 📊 company dashboard http://127.0.0.1:7421
```
If you already have a status line, the setup step stores your prior command in `.company/statusline-base.json` and runs it first. Your line leads, the script appends only the link, and it does not re-print model or context since your own line owns those:
```
| 📊 company dashboard http://127.0.0.1:7421
```
## Unattended auto-restart loop
`scripts/company-autoloop.js` runs `/company` unattended across many sessions. It restarts into a fresh context automatically when a work session approaches the context threshold, so a long goal keeps going without a human pasting the restart prompt each time.
A script is needed because a fully unattended fresh-context restart cannot be done with Claude Code native features alone. Hooks cannot run `/clear` or start a fresh turn, a Stop-hook block keeps the same context, and auto-compaction only summarizes. The supervisor is the external driver that owns the restart decision. This is the validated finding.
Each turn runs a headless `claude -p` session while the supervisor watches the session context fill. At the threshold it drives `/company restart` to emit the `NEXT.md` continuation, then launches a fresh session seeded from it.
```bash
node scripts/company-autoloop.js --max-turns 100 "/company GOAL: "
```
Key flags and env:
- `--project-dir ` the project the run targets (default: current dir)
- `--company-dir ` override the `.company` dir (default: `/.company`)
- `--max-turns ` hard cap on work turns across all sessions (default: 100)
- `--restart-timeout-secs ` max wait for the restart markers (default: 420)
- `COMPANY_CONTEXT_THRESHOLD` fill fraction or percent that triggers a restart (default: 0.50, set a low value for testing)
It runs with `--permission-mode bypassPermissions` for the autonomy an unattended loop needs. The threshold is configurable, so you can drop it low to exercise the restart path during testing.
## Cost and quality
Multi-agent orchestration buys quality with tokens. /company's answer to the token cost: spend strong-model tokens only where they buy quality, and report the bill every cycle.
**Tiered model delegation** - each delegation contract carries a `MODEL: cheap|mid|strong` tag. The orchestrator maps the tag to a model at spawn time. Effort scales with both ROI and stakes - high-stakes or high-value work gets heavier spawn. Override every sub-agent with `CLAUDE_CODE_SUBAGENT_MODEL` at launch, or write `FORCE_BEST` into `.company/MODEL_POLICY` mid-run.
**Per-cycle cost reporting** - every cycle produces a `COST:` line in the briefing and a `cycles/cycle-{N}-cost.json` artifact.
**Prompt caching** - agent prompts are laid out stable-first so repeated spawns hit a shared cache prefix.
**Fable 5 / adaptive thinking** - on models that support adaptive thinking (Fable 5 and later), the orchestrator and verify layers run with thinking enabled. No `budget_tokens` param - reflection depth is model-controlled.
## Key features
**Stop guard** - blocks session exit until every criterion has `passes: true` and reproduced evidence. Malformed state blocks rather than fails open. Deleting a hard criterion blocks instead of unlocking. [42-check test](tests/stop-guard.test.js).
**Context-fill guard** - a second Stop hook forces `/company restart` once context reaches the threshold (default 50%). Reads the model id from the transcript to detect the context window. The restart block is always on and the dashboard shows it as a locked Apple-style pill toggle. [63-check test](tests/context-guard.test.js).
**Delegation contracts** - a task does not exist without a filled contract. `check-contracts.js` rejects missing fields, vacuous VERIFY-WITH commands, invalid MODEL tiers, and cyclic dependencies. [29-check test](tests/check-contracts.test.js).
**Multi-level verification** - the Internal Reviewer re-runs every VERIFY-WITH command independently. The Devil's Advocate attacks everything marked passing. For criteria tagged `stakes: "high"` in `criteria.json` (irreversible action, security surface, or public-facing claim), the critic runs in three fresh contexts with distinct lenses - correctness, security, reproducibility - and unanimous ACCEPT is required. Normal criteria keep the single critic. The completeness probe enumerates every surface the GOAL names and auto-rejects any unchecked one.
**Design judge-panel** - for criteria tagged `kind: design`, the lead may emit up to three contracts from materially different angles plus one synthesis contract, reserved for genuine design forks.
**Git isolation** - workers never push to main and never merge. Every code change lands as a draft PR. The merge gate is yours.
**Pre-push secret scan** - workers run `scripts/secret-scan.js` before any `git push`. Exit 1 blocks the push.
**Codebase graph** - on repos with >200 tracked files, `scripts/codegraph.js` builds a commit-keyed ranked symbol map into `.company/codegraph/` for lead prompts.
**Status-line link** - `scripts/statusline.js` appends a labelled `📊 company dashboard ` segment to the Claude Code status bar, enforced idempotently on every `/company` run. The link resolves in any session through a cwd-independent global registry, with an `/all` fallback. It chains any status line you already have and never re-prints model or context when one is present. See [Status line](#status-line).
## Commands
```
/company "Build X" Run until X is done
/company Run using COMPANY.md priorities
/company restart Emit a verified continuation prompt for a fresh session
/company:status Show last status
/company:resume Continue from last session
```
## What gets created
State lives in `./.company/` (relocate with `COMPANY_DIR`):
```
.company/
GOAL.md criteria.json criteria.lock
playbook.md active-roster.md active-tasks.md
STATUS.md OWNER MODEL_POLICY
CANCEL (persistent human exit)
cycles/ per-cycle briefing, contracts, review, cost
{dept}/ per-employee findings, persist across sessions
codegraph/ commit-keyed symbol map (large repos only)
```
## Examples
[`startup.md`](examples/startup.md), [`research-lab.md`](examples/research-lab.md), [`dev-team.md`](examples/dev-team.md).
## Contributing
```bash
bash scripts/check.sh
```
CI runs the same script on every pull request. Pull requests welcome. Every change lands as a draft PR.
## License
MIT