https://github.com/pathcosmos/codex-on-claude
Bridge that lets Claude Code call OpenAI Codex CLI as an MCP sub-agent. Installable Skills/Agent/Plugin + continuous improvement loop.
https://github.com/pathcosmos/codex-on-claude
Last synced: 21 days ago
JSON representation
Bridge that lets Claude Code call OpenAI Codex CLI as an MCP sub-agent. Installable Skills/Agent/Plugin + continuous improvement loop.
- Host: GitHub
- URL: https://github.com/pathcosmos/codex-on-claude
- Owner: pathcosmos
- License: mit
- Created: 2026-05-19T17:56:47.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-05-19T18:53:11.000Z (about 1 month ago)
- Last Synced: 2026-05-19T21:57:31.381Z (about 1 month ago)
- Language: JavaScript
- Size: 78.1 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# codex-on-claude
> Call OpenAI **Codex CLI** as a sub-agent from inside Claude Code.
> MCP transport + curated Skills + an optional isolated review Agent + a continuous-improvement loop with optional PostToolUse hook auto-logging + a persistent thread catalog โ installed by one CLI.
[](https://www.npmjs.com/package/codex-on-claude)
[](LICENSE)
> ๐ฐ๐ท Korean mirror: [README.ko.md](README.ko.md)
---
## TL;DR
```sh
# Prerequisites already installed: Codex CLI + Claude Code, both logged in.
codex --version && claude --version
# Canonical: install or update + reconfigure in one shot
npx --yes codex-on-claude@latest
```
The installer:
1. Runs preflight โ Node 18.17+, `codex`, `claude`, `codex doctor`, `claude auth status`, and the `codex` MCP server's `Connected` status.
2. Walks you through **eight questions** with arrow-key navigation (โ/โ, Space to toggle, Enter to confirm). v0.5.0 adds question ยง7 (`usageMode`) โ Codex invocation policy (`none / synergy / auto / max`). v0.5.1 adds question ยง8 (`cerberus`) โ opt-in 3-head planning consensus mode (default `off`). Existing installs migrate silently (`usageMode=synergy`, `cerberus=off`).
3. Shows a final review screen with `Apply / Edit again / Cancel`.
4. Copies the selected Skills (and optional Agent / hooks / PreToolUse gate when `usageMode=none`) under `~/.claude/`.
Run the same command any time to update + reconfigure โ if prior state is found, the installer prints a `vPREV โ vCURR` banner and walks you through previous answers as defaults. Or use `codex-on-claude reconfigure` explicitly.
---
## Why this exists
The bare wiring (Claude Code โ Codex CLI via MCP) is a one-line `claude mcp add`, but day-to-day use needs more:
- A **Skill** so you don't re-type the same options every call.
- An optional **Agent** so a 30KB Codex response doesn't eat the main context.
- A **continuous-improvement loop** that watches your usage and proposes smarter defaults.
- A **persistent thread catalog** so Codex sessions you started yesterday don't vanish into a "Session not found" error.
`codex-on-claude` installs the whole layered stack, and lets you turn each layer on/off.
---
## Prerequisites
| Item | Minimum | Check |
|---|---|---|
| Node.js | 18.17 | `node --version` |
| Claude Code (`claude`) | 2.1.x | `claude --version` |
| Codex CLI (`codex`) | 0.13.x | `codex --version`, `codex doctor` |
| `codex` MCP server | `Connected` | `claude mcp get codex` |
| zsh / bash | โ | the installer uses one of these |
Both CLIs must be logged in.
```sh
claude auth status --text
codex doctor --summary
claude mcp get codex # Status: โ Connected
codex-on-claude doctor # runs all of the above in one shot
```
---
## Install
### A. via npx (recommended)
```sh
npx --yes codex-on-claude@latest # always pulls latest + auto-reconfigures if state exists
```
The `--yes` flag skips the "install package" confirmation; `@latest` cache-busts npx so you never hit a stale `_npx//` cache. The bare form `npx codex-on-claude` also works but is more prone to corrupted-cache `ENOENT` errors (see [Troubleshooting](#troubleshooting)).
### B. global install
```sh
npm install -g codex-on-claude
codex-on-claude
```
### C. from source
```sh
git clone https://github.com/pathcosmos/codex-on-claude.git
cd codex-on-claude
node install/install.mjs
```
### D. non-interactive (CI / scripted)
```sh
codex-on-claude \
--patterns=review,followup,fix,routine \
--context-policy=mixed \
--improvement-loop=manual \
--threads=basic \
--usage-mode=synergy \
--auto-tier2-llm-probe=on \
--subscription-claude=max --subscription-codex=pro \
--codex-model-primary=gpt-5.5 --codex-reasoning-primary=xhigh \
--reviewer-model-primary=opus --reviewer-reasoning-primary=xhigh \
--yes
```
The `--subscription-*` flags declare what you actually have access to. The `--*-model-primary` / `--*-reasoning-primary` flags pick the model + effort level you want to run by default. **Fallback is auto-locked** to each tier's base โ if you exhaust quota on the primary, Skill prose retries once on the fallback (gpt-5/medium for Codex, sonnet/medium for the Claude reviewer agent at default tiers). Pass `--codex-model-fallback=...` etc. if you really need to override the lock; the installer warns and re-snaps to base.
`--share-scope=...` is accepted but **deprecated since v0.3 and ignored** (a one-line notice prints). The plugin-bundle workflow was removed in 0.3.0; if you need team distribution, share this GitHub repo and have everyone run the installer.
---
## Updating codex-on-claude
- **npx users** โ one command does both. `npx --yes codex-on-claude@latest` fetches the newest release **and** auto-runs reconfigure when prior state is found (you'll see a `vPREV โ vCURR` banner). No need to chain a separate `reconfigure` call.
- **Global install** โ bump via npm, then re-run:
```sh
npm install -g codex-on-claude@latest # `npm update -g` also works
hash -r # zsh/bash: refresh the command hash so the new binary is picked up
codex-on-claude # auto-reconfigures because prior state is detected
```
If `codex-on-claude` still says "command not found" after `npm update -g`, that's a stale shell hash (see [Troubleshooting](#troubleshooting)).
- **Re-run reconfigure after updating.** Whether via npx or global, your previous answers come pre-selected, you can change any of the four options via arrow keys, and the final review screen shows exactly what's about to be saved.
- **Roll back a release** if needed:
```sh
npm install -g codex-on-claude@0.3.1
codex-on-claude reconfigure
```
State files are forward/backward-compatible across patch releases; alias migrations (e.g. `on-demand โ manual`) are one-way, and an older binary will just ignore unknown keys.
- **What gets touched by `npm update` itself?** Only the CLI binary. Files under `~/.claude/skills/codex-*`, `~/.claude/agents/codex-reviewer.md`, `~/.claude/settings.json` (hook entries), and `~/.claude/codex-on-claude/` are modified **only** when you run `reconfigure`, `uninstall`, or another explicit subcommand.
- Deprecated versions on npm carry a one-line `Use codex-on-claude@^X.Y.Z` pointer that `npm install` will surface.
---
## The install questions
Each answer is also a CLI flag for non-interactive use. Reconfigure later with `codex-on-claude reconfigure` โ your previous answers come back pre-selected.
> v0.4.1 raises the count from 4 to 6 (subscription + primary model / reasoning per side). All new questions live below the original four; the originals still work the same way.
>
> **v0.5.0** adds question ยง7 (`usageMode`) โ Codex invocation policy. Existing installs migrate silently to `synergy` (no prompt); explicit `reconfigure` surfaces the new question.
### 1. patterns (multi-select)
| Choice | Skills installed |
|---|---|
| One-shot read-only review | `codex-review` |
| Long multi-turn session | `codex-followup`, `codex-resume` |
| Workspace-write delegation | `codex-fix` (with strict file allowlist guardrails) |
| Templated repeating job | `codex-routine` |
Flag: `--patterns=review,followup,fix,routine`
### 2. contextPolicy (single)
| Choice | Effect |
|---|---|
| `direct` | Skills only โ Codex responses land in the main context |
| `summarize` | Adds `codex-reviewer` subagent; Skills route via the agent so only the summary reaches main |
| `mixed` (recommended) | Both โ Skills decide per call |
Flag: `--context-policy=direct|summarize|mixed`
### 3. improvementLoop (single)
| Choice | Behavior |
|---|---|
| `off` | No `codex-analyze` / `codex-improve` / `codex-log` Skills, no hooks |
| `manual` (default) | Skill prose triggers `codex-on-claude log`; no OS hook (was: `on-demand`, kept as alias) |
| `auto-on-skill` | Adds two PostToolUse hooks in `~/.claude/settings.json` so every `mcp__codex__codex` / `codex-reply` call is auto-logged โ no Skill-prose dependence |
| `periodic` | `auto-on-skill` plus cron/loop-friendly periodic analysis guidance |
Flag: `--improvement-loop=off|manual|auto-on-skill|periodic` (alias: `on-demand โ manual`)
Logging is **metadata only** โ prompt / response bodies are never written to disk. The hooks we install carry a `_coc.marker = "codex-on-claude:auto-log"` sentinel so `uninstall` / `reconfigure --improvement-loop=manual` remove only the entries we added (hook-level filter as of 0.3.4 โ user hooks that happen to coexist in the same group are preserved).
**Runtime config drift (v0.3.4+)**: if you manually edit `~/.claude/codex-on-claude/config.json` to set `improvementLoop` to `off` or `manual` without re-running `reconfigure`, the installed hook now respects the new value on the next call (no more silent telemetry from a stale hook). Corrupt or missing config fail-opens so existing installs don't break unexpectedly.
**Hook payload extraction (v0.3.4+)**: `threadId` is now correctly extracted from Claude Code 2.1.x's PostToolUse payload (where `tool_response` is a JSON-encoded string), and `elapsedMs` is populated from `duration_ms`. Previously every auto-hook log entry had `threadId: null`, which silently broke analyzer rules that join logs to threads.
### 4. threads (single, v0.2+)
Persistent thread catalog at `~/.claude/codex-on-claude/threads/.json`.
| Choice | Captured per thread |
|---|---|
| `off` | nothing |
| `basic` (default when improvement loop is on) | threadId + title + tags + originatingSkill + lastUsed + turnCount |
| `full` | above plus `goal / outcome / decision / note` summaries, `incidents` (issue/resolution/outcome), and `fallbackStrategy` (`auto-resume / ask / new`) |
Flag: `--threads=off|basic|full`
External transport: **none**. Bodies of prompts / responses are never stored โ only short text you (or a Skill) explicitly write via `threads outcome | decision | note | incident`.
### 5. subscription tiers (v0.4.1)
You declare which Claude + Codex subscription tier you actually have. The installer uses this **only** to constrain the model + reasoning choices in questions 6 below โ it never sends anything to Anthropic/OpenAI to verify.
| Side | Tiers offered (highest โ lowest) | Default |
|---|---|---|
| Claude | `enterprise / team / max / pro / free` | `max` |
| Codex | `team / pro / plus / free` | `pro` |
Flags: `--subscription-claude=...` and `--subscription-codex=...`
A declared tier that doesn't match reality is fine โ but expect the **frequent-fallback analyzer rule** to fire as your primary tier exhausts quota repeatedly. Reconfigure to a lower tier (and lower primary reasoning) when that happens.
### 6. model / reasoning โ primary + fallback (v0.4.1)
For each side (Codex calls, Claude reviewer subagent) you pick:
- **Primary**: the model + reasoning effort you want by default
- **Fallback**: **locked** to the subscription tier's base โ used automatically when Skill prose detects a `rate_limit_exceeded` / `quota` / `429` / `usage_limit_reached` error
| Side | Primary flags |
|---|---|
| Codex | `--codex-model-primary=gpt-5.5 --codex-reasoning-primary=xhigh` |
| Reviewer (Claude subagent) | `--reviewer-model-primary=opus --reviewer-reasoning-primary=xhigh` |
Reasoning vocabulary (shared by Claude `--effort` and Codex `model_reasoning_effort`): `low / medium / high / xhigh / max`.
How fallback fires:
- **Codex side**: each `mcp__codex__codex` call's Skill prose ends with a "if you see rate_limit/quota, retry **once** with the locked fallback" block.
- **Reviewer subagent route**: the primary `codex-reviewer` agent emits the sentinel line `CODEX_QUOTA_FALLBACK_NEEDED`; `/codex-review` Skill prose re-launches via `codex-reviewer-fallback`.
- **`claude -p` direct callers** (cron / bench): pass `--fallback-model ` โ Claude's CLI handles it natively without prose. Note: `--effort` only applies to primary; Claude doesn't have a `--fallback-effort` flag.
You can pass `--codex-model-fallback=...` etc. but values that don't match the matrix base are ignored with a warning โ the lock is intentional. To "unlock" the fallback, raise your subscription tier (which raises the base).
Fallback events accrue in the usage log (`outcome=fallback`, `errorKind=quota|rate_limit|...`). `codex-on-claude analyze` surfaces a `ruleFrequentFallback` candidate at โฅ5 events in the window โ your hint to upgrade or de-tune the primary.
### 7. usage-mode โ Codex invocation policy (v0.5.0)
How aggressively Claude consults Codex. Pick one of four:
| Mode | One-line definition | Default? | Behavior summary |
|---|---|---|---|
| `none` | Codex calls blocked at PreToolUse gate | โ | ฮฑ-only. Privacy / cost-sensitive use. The PreToolUse hook denies `mcp__codex__codex` with a structured reason. |
| `synergy` | Follow v9 guidance (Quick-Ref 3-Q tree + R1โR6 recipes) | **default for new installs + silent migration default** | 90% sweet spot. Codex fires only when matched recipes apply (adversarial review, sweet-spot partial-fail, hard reasoning). |
| `auto` | Heuristic signal detection + optional Tier 2 LLM probe | for power users | Tier 1: heuristic regex over prompt. Tier 2 (`--auto-tier2-llm-probe=on`): $0.01โ0.02 Codex meta-classifier on ambiguous tasks. |
| `max` | Quality-first bounded automation | for critical tasks | R1 default ON; R5 always probe; ฮณ hot-swap auto on P5 catastrophe. **Hard DO-NOT rules still enforced** โ max โ override. |
Flags: `--usage-mode=none|synergy|auto|max` and (auto-only) `--auto-tier2-llm-probe=on|off`.
**Hard guardrails (every mode)** โ surfaced as `{{guardrail*}}` placeholders in every SKILL.md:
- `chainJsonTrap`: hard-block (avoid Chain-JSON Trap, route to R6 Format-Safe Handoff)
- `subagentStrict`: hard-block (don't ask subagent for strict-JSON when adversarial output is also needed)
- `turnBurn`: 3-turn-stop (stop multi-turn followups at turn 3 unless new context arrived)
- `ceilingNoUpside`: warn-and-skip (when ฮฑ is โ100% already, ฮฒ can only equal it)
Detailed spec + decision tree: [`docs/usage-mode-config.md`](docs/usage-mode-config.md). Quick decision card: [`docs/guidance-quick-ref.md`](docs/guidance-quick-ref.md). Recipe details: [`docs/synergy-playbook.md`](docs/synergy-playbook.md).
Existing installs (pre-0.5.0) migrate silently to `synergy` on the next `npx --yes codex-on-claude@latest` โ no prompt, no behavior change. Run `codex-on-claude reconfigure` if you want to surface the new question.
#### v0.5.0 hardening (post-L6 review โ 14 follow-up fixes)
Beyond the headline 4-mode policy, v0.5.0 ships a comprehensive set of fixes from a Codex peer review pass + adversarial security review. User-visible changes:
- **PreToolUse gate is race-free.** The hook command embeds `--enforce-mode=none` at install time so the gate decision never depends on a separate `config.json` read. Mode toggles do not race in-flight calls.
- **Gate covers Bash CLI bypasses.** `mode=none` blocks not only `mcp__codex__codex` MCP calls but also any `Bash` invocation whose command contains `codex exec`, `codex-on-claude threads resume`, `npx ...codex...`, or path-qualified codex binaries. Wildcard regex (`mcp__codex__.*`) handles current + future MCP tool variants. Case-insensitive matching.
- **Gate fails CLOSED for Codex-shaped tools on corrupt state.** When `config.json` is unreadable or the hook payload is malformed AND the tool looks Codex-shaped, the gate denies (was: fail-OPEN). Non-Codex tools still fail-open to avoid breaking unrelated calls.
- **Gate state reconciles automatically.** Every reconfigure inspects `~/.claude/settings.json` itself (not just `config.json`'s claim) and removes orphan gate entries from manual edits / stale state.
- **Hook decision JSON emitted in BOTH legacy and new shapes.** `{decision, reason}` + `hookSpecificOutput.permissionDecision` so the gate works against current Claude Code 2.1.x and any future version that drops legacy support. Hard-denies additionally `exit(2)` with stderr backstop.
- **State directories are 0700.** Logs (which may capture prompt sizes / thread IDs) + thread catalog + improvements are now restricted to user-only access. Best-effort on filesystems that ignore chmod.
- **Atomic config writes.** `config.json`, `settings.json`, and thread catalog files are written via `temp+rename`; concurrent readers (analyzer, status, gate) never see a torn JSON document.
- **CLI parsing hardening.** `--usage-mode max` (space-separated, common mistake) now errors with a hint to use `--usage-mode=max`. Use `=` to assign flag values; positional args are rejected against an explicit known-subcommand allowlist.
- **Silent-fill info line.** Upgrading users see `usageMode: silent default 'synergy' applied for upgrade` on auto-detected reconfigures (npx upgrade path). Explicit `reconfigure` still surfaces the ยง7 prompt.
- **`auto` mode classifier sharper.** Tier 1 heuristics now handle: chain step variants (numbered / lettered / first-then-finally / inline "Step N:"); mixed adversarial+style prompts ("find security bugs and naming issues" keeps adversarial signal); unquoted field-list syntax ("Return fields: status, risk, file, line"); "table" only with output-intent context; plural forms ("race conditions", "failing tests"); excluded "edge cases" alone from TDD signal.
- **Every log row carries `usageMode`.** `usage-*.jsonl` entries include the active mode at log time, powering the `ruleUsageModeDrift` analyzer rule.
- **`ruleUsageModeDrift` filters by `config.updatedAt`.** Logs from before the last reconfigure no longer trigger false-positive drift right after a mode switch.
Full details: [`docs/release-notes-0.5.0.md`](docs/release-notes-0.5.0.md) + [`docs/security-review-0.5.0.md`](docs/security-review-0.5.0.md).
**Final verification (post 5 Codex peer review cycles)**: **168 automated test cases** (112 unit + 37 integration + 17 installer-flow + 2 regression with 7 internal cases) all PASS. npm tarball **115 kB / 32 files** (99% reduction from initial build via explicit `files` allowlist + `.npmignore`). **28 total fixes** applied pre-ship across 5 review cycles (F1-F8 โ G1-G7 โ H1-H7 โ B1-B6 โ H1-H4 Bash precision โ M1-M3 โ A1-A4 โ see [CHANGELOG.md](CHANGELOG.md) for the full breakdown). Run all tests yourself: `bash install/fixtures/v05/installer-flow/run-flow.sh && node install/fixtures/v05/unit/run-units.mjs && ...`.
---
## Cerberus mode (v0.5.5, opt-in)
When the planning step matters more than execution speed, enable Cerberus Head mode:
```sh
codex-on-claude reconfigure --cerberus=on --yes
# Then in a Claude Code session:
/cerberus head ""
```
**How it works.** `/cerberus head` spawns three independent planners in parallel โ `cerberus-h1-claude-only` (no Codex), `cerberus-h2-codex-only` (delegates fully to Codex CLI), `cerberus-h3-synergy` (Claude + Codex consultation) โ then merges their plans via a deterministic algorithm. Topics are extracted from each plan's markdown sections (Decision / Reasons / Risks / Steps), grouped by Jaccard similarity over Porter-stemmed tokens, and classified into one of four cases: case 1 (3-head merge), case 2 (3-head tournament with h3 tie-break + head-weight ladder), case 3 (2-head agreement + 1 missing), case 4 (single-head, validated / disputed / minority / missing buckets). Output is a single consensus plan plus an `agreement_score` (0โ1) and `label` (low / moderate / high). **Cost**: ~2โ3ร a single planner (typically 20โ50k tokens total). **Plan-only** โ execute and verify phases ship later.
**Hardening since v0.5.1.**
| Version | Change | Why it matters |
|---------|--------|----------------|
| v0.5.1 | Initial 3-head + deterministic merge | PoC |
| v0.5.2 | Porter Stemmer + nonce challenge + case-4 conservative partial credit | Paraphrase absorption (`deterministic`/`deterministically` โ same stem) and orchestration-layer guard against fake plans (each head plan must end with `cerberus-nonce: ` matching init's nonce; consensus rejects mismatches) |
| v0.5.3 | Polarity tracking (`detectPolarity` blocks `use cache` (+) merging with `do not use cache` (-)) + empty-token guard (no false-merge on Korean / single-char bullets) + minority dissent bucket + Porter `isV(s, -1)` base case fix | Closes 4 critical bugs surfaced by the n=2 self-review |
| v0.5.4 | Disputed render fix โ `*(disputed โ opposing polarity)*` for case-4 polarity-split entries (no more `*(lost to ?)*`) | MEDIUM #2 surfaced in Opus 4.7 re-test: 6 user-facing `?` outputs across 4 runs |
| v0.5.5 | Decision multiplier soft-curve (case-2 โ 1.2ร, was binary cliff to 1.0ร) + contrast conjunction polarity (`but/however/although/despite/except` with `but also/even/too` false-positive guard) + `cerberusConfig` seed/reader (dead path in server fixed) + HU-33 plan-level lock | MEDIUM #1 + #3 + LOW + HU-33 from v0.5.4 plan Phase 2 |
**Test coverage.** 102 cerberus-specific unit + integration tests (consensus 15 + install 13 + config 12 + stemming 14 + stemming-adversarial 7 + nonce 10 + score-formula 10 + v053-fixes 19 + e2e 4).
**Per-machine tuning.** When `cerberus=on`, the installer seeds `choices.cerberusConfig: {}` in `~/.claude/codex-on-claude/config.json`. Override any of `headWeights / jaccardGroupThreshold / bodyMergeThreshold / decisionMultiplier / decisionPartialMultiplier / costCapTokens` to taste โ the server reads these on every consensus call and merges with `DEFAULTS`. Empty seed keeps shipped defaults.
**Known backlog (v0.5.6+).** Cerberus **Full mode** (FU-01~10 + FC-02~05, 14 PENDING-IMPL scenarios: 3-worktree execute + verify + iteration loop) โ separate minor release. Embedding-based similarity to bridge Porter Stemmer paraphrase miss (opt-in LLM-judge) โ v0.5.6+ candidate.
Disable any time with `codex-on-claude reconfigure --cerberus=off --yes`. Spec: [`docs/cerberus-mode-spec.md`](docs/cerberus-mode-spec.md) (Draft 5). PoC walkthrough: [`docs/cerberus-poc-2026-05-22.md`](docs/cerberus-poc-2026-05-22.md). Re-test results: [`docs/test-execution-results-cerberus-v0.5.3-opus47.md`](docs/test-execution-results-cerberus-v0.5.3-opus47.md).
---
## Operational guidance (v0.5.0)
The v5โv9 analysis series in [`docs/`](docs/) captures **771 runs across 217 scenarios** (~\$100โ115 in measured external cost). Key references for everyday decisions:
| Read this when | File |
|---|---|
| 30-second decision card for one task | [`docs/guidance-quick-ref.md`](docs/guidance-quick-ref.md) |
| Detailed comparison (Claude ฮฑ / Codex ฮณ / ฮฒ orchestration) | [`docs/guidance-comparison.md`](docs/guidance-comparison.md) |
| Practical recipes (R1 Adversarial / R2 Self-review / R3 reasoning=high / R4 ฮณ hot-swap / R5 cheap ฮฒ trial / R6 Format-Safe Handoff) | [`docs/synergy-playbook.md`](docs/synergy-playbook.md) |
| Mode configuration spec (what every mode does, per-skill activation matrix) | [`docs/usage-mode-config.md`](docs/usage-mode-config.md) |
| Cross-domain validation results (post-v8 sweep) | [`docs/v8-validation-sweep.md`](docs/v8-validation-sweep.md) |
| Why these recommendations changed in v9 | [`docs/v9-adjustment-spec.md`](docs/v9-adjustment-spec.md) |
---
## What you get
### Skills under `~/.claude/skills/codex-*/SKILL.md`
- `codex-review` โ one-shot read-only review of the current diff / files
- `codex-followup` โ continue the same thread via `mcp__codex__codex-reply`
- `codex-resume` โ CLI fallback when MCP session is lost (auto-detects silent-new-session)
- `codex-fix` โ file edits under a strict allowlist (workspace-write)
- `codex-routine` โ templated recurring job
- `codex-analyze` โ analyze the usage log and surface improvement candidates *(improvementLoop โ off)*
- `codex-improve` โ apply / reject specific candidates *(improvementLoop โ off)*
- `codex-log` โ append usage log entries *(improvementLoop โ off; redundant when hooks handle it)*
- `codex-threads` โ browse / annotate / resume the persistent catalog *(threads โ off)*
### Agent under `~/.claude/agents/codex-reviewer.md`
Isolates large Codex responses and returns only a 4-line summary to the main session. Installed when `contextPolicy โ {summarize, mixed}`.
### Plugin bundle
Deprecated since 0.3.0. Existing 0.2.x bundles are left in place โ uninstall does **not** remove them; a one-line `Manual: rm -rf โฆ` hint prints.
---
## Operating flow
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Claude Code session โ
โ โ
โ User invokes /codex-review (or natural language match) โ
โ โ โ
โ โผ โ
โ Skill follows its MUST procedure โ mcp__codex__codex โ
โ โ โ
โ โผ โ
โ โโโโ MCP transport (codex mcp-server) โโโโโโโโโโโโโโโโโโโ โ
โ โ โ Codex model call โ โ
โ โ โ threadId + content โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ Response lands in main (or in codex-reviewer subagent) โ
โ โ โ
โ โผ โ
โ Skill writes catalog entry + log line โ
โ (improvementLoop=auto-on-skill: PostToolUse hook does logging)โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Periodically โ or on demand:
codex-on-claude analyze
โ
โผ
reads logs/threads, ranks improvement candidates
โ
โผ
/codex-improve walks the user through Apply / Reject / Skip
โ
โผ
decisions persisted under ~/.claude/codex-on-claude/improvements/
```
---
## CLI reference
```
codex-on-claude Install interactively (preflight included)
codex-on-claude reconfigure Reconfigure (previous answers as defaults; review/edit/cancel at the end)
codex-on-claude doctor Run preflight only (Node/codex/claude/MCP checks)
codex-on-claude status Show install state
codex-on-claude uninstall Remove installed components (Skills, Agent, hooks)
codex-on-claude analyze Analyze usage log and surface improvement candidates
codex-on-claude suggest Record an apply/reject decision for a candidate
codex-on-claude log Append a usage log entry (use --from-stdin for hook payloads)
codex-on-claude threads ... Persistent thread catalog (see below)
codex-on-claude help Print help
```
Common combinations:
```sh
codex-on-claude status
codex-on-claude reconfigure --improvement-loop=auto-on-skill --yes # turn on hook logging
codex-on-claude reconfigure --threads=full --yes # enable full thread context
codex-on-claude analyze --days=7 --format=markdown --save # write a snapshot report
codex-on-claude suggest --apply=2 # adopt candidate #2
codex-on-claude suggest --reject=2 --reason="intentional pattern"
codex-on-claude uninstall
```
---
## `threads` subcommand reference (v0.2+)
```sh
codex-on-claude threads list [--status=active|resolved|archived] [--tag=...] [--since=7d] [--skill=...]
codex-on-claude threads latest [--format=id|json] # deterministic latest thread (great for shell automation)
codex-on-claude threads show
codex-on-claude threads new --title="..." --tags=a,b --skill=codex-review --cwd="$PWD" --sandbox=read-only
codex-on-claude threads goal "Verify naming consistency"
codex-on-claude threads outcome "Codex flagged 3 issues"
codex-on-claude threads decision "Adopt suggestion 2"
codex-on-claude threads note "arbitrary memo"
codex-on-claude threads incident --issue=session-not-found --resolution="codex exec resume" --outcome=recovered
codex-on-claude threads tag --add=critical --remove=draft
codex-on-claude threads status resolved
codex-on-claude threads fallback auto-resume
codex-on-claude threads search "react refactor"
codex-on-claude threads resume "follow-up prompt"
codex-on-claude threads remove
```
`threads resume "prompt"` reads the stored `fallbackStrategy`:
- `auto-resume` โ calls `codex exec resume` and **automatically detects** silent-new-session (when codex returns a different `thread_id`), filing an incident on the original thread and registering the new one as a bifurcation.
- `ask` โ prints the thread metadata + recent summaries so the user picks the next step.
- `new` โ guides starting a fresh `mcp__codex__codex` call with the prior summaries as a preamble.
---
## Verification
### 1. Preflight + MCP
```sh
codex-on-claude doctor
```
Expected: `Node.js โฆ`, `Codex CLI โฆ`, `Claude Code โฆ`, `codex doctor: ok`, `Claude auth: Login method: โฆ`, `codex MCP server: โ Connected`.
### 2. End-to-end MCP call (a fresh `claude -p`)
```sh
claude -p --model haiku --verbose --output-format stream-json \
--permission-mode dontAsk \
--allowedTools=mcp__codex__codex \
"Use the Codex MCP tool to ask Codex: Return exactly OK. Then return Codex's exact answer."
```
### 3. Skill in a real session
Start a fresh Claude Code session and type:
```
/codex-review evaluate this README in two sentences.
```
Codex should answer; the response ends with a deterministic `Thread: ` line (per the Skill's MUST procedure).
### 4. Analyze cycle
```sh
codex-on-claude log --skill=codex-review --sandbox=read-only --outcome=ok \
--prompt-chars=120 --response-chars=600 --elapsed-ms=2100
codex-on-claude analyze --days=1
```
---
## Operational guidance
- **Default sandbox is `read-only`.** Switch to `workspace-write` only when files genuinely need edits (`/codex-fix` enforces an allowlist). `danger-full-access` is not used in this workflow.
- **Pass context explicitly.** Claude and Codex don't share hidden context โ when you want Codex to know about a goal / file / constraint, name it in the prompt.
- **threadId discipline.** The `threadId` itself is a public-safe identifier. Within the same MCP server process, prefer `/codex-followup`; on `Session not found`, switch to `/codex-resume` or `codex-on-claude threads resume`.
- **401 from Claude.** If `claude auth status` looks fine but the model returns `401 Invalid authentication credentials`, re-run `claude auth login --claudeai --email `.
- **Shrinking install.** Running `reconfigure` with fewer `--patterns` removes the unselected Skills automatically. Thread catalog data and improvement decisions are kept even when their Skills are removed.
---
## Privacy policy
- All logs / catalog / improvement decisions stay **on this machine**. No outbound network calls beyond the Codex/Claude APIs themselves.
- Log entries contain only metadata: timestamp, Skill name, sandbox mode, prompt/response **lengths**, threadId, outcome code.
- Prompt / response **bodies are never written to disk.**
- `~/.claude/codex-on-claude/` is created with restrictive permissions.
- Opt out at any time: `codex-on-claude reconfigure --improvement-loop=off` (also removes hook entries) and `codex-on-claude reconfigure --threads=off`.
- Full wipe: `codex-on-claude uninstall` or `rm -rf ~/.claude/codex-on-claude/`.
---
## Directory layout
```
codex-on-claude/
โโโ README.md (this file)
โโโ README.ko.md Korean mirror
โโโ CHANGELOG.md
โโโ LICENSE
โโโ package.json npm metadata (bin: codex-on-claude)
โโโ install/
โ โโโ install.mjs main installer / reconfigurer / analyze CLI
โ โโโ analyze.mjs usage log analyzer + improvement candidate rules
โ โโโ threads.mjs persistent thread catalog CRUD
โ โโโ hooks.mjs ~/.claude/settings.json PostToolUse merger
โ โโโ manifest.json option โ component mapping
โ โโโ components/
โ โโโ agents/codex-reviewer.md
โ โโโ skills/codex-{review,followup,resume,fix,routine,analyze,improve,log,threads}/SKILL.md
โโโ docs/
โโโ implementation-log.md English summary
โโโ test-report-2026-05-20.md English summary
โโโ ko/ Korean originals (history)
```
After install (on a user machine):
```
~/.claude/
โโโ skills/codex-*/SKILL.md (per selected patterns + improvement loop + threads)
โโโ agents/codex-reviewer.md (contextPolicy โ {summarize, mixed})
โโโ settings.json (PostToolUse hooks merged in when improvementLoop โ {auto-on-skill, periodic})
โโโ codex-on-claude/
โโโ config.json selections from last install/reconfigure
โโโ logs/usage-YYYY-MM-DD.jsonl
โโโ reports/ analyze --save outputs
โโโ improvements/ apply/reject decisions
โโโ threads/.json + index.json persistent thread catalog
```
---
## Troubleshooting
### `codex-on-claude: command not found` right after `npm update -g` / `npm install -g`
Stale shell command hash โ npm replaced the binary file on disk, but your current zsh/bash session is still caching the old path. Fix:
```sh
hash -r # zsh / bash: clear cached command lookups
# or: rehash # zsh-specific
# or open a new terminal
codex-on-claude --version
```
`codex-on-claude doctor` (v0.3.3+) auto-detects this case and prints the same hint.
### `npx codex-on-claude` fails with `ENOENT โฆ _npx//package.json`
Corrupted npx per-package cache, usually from a previously interrupted run. Force a fresh download:
```sh
npx --yes codex-on-claude@latest # `@latest` cache-busts the per-package directory
# still broken? nuke npx's cache entirely:
rm -rf ~/.npm/_npx
```
The canonical install command in this README (`npx --yes codex-on-claude@latest`) avoids this class of failure because `@latest` forces npx to revalidate the package version on every run.
### `claude mcp get codex` reports "Failed to connect"
Often just a sandboxed shell. Re-run in a normal terminal:
```sh
claude mcp get codex
claude mcp list
```
### `Session not found for thread_id`
The MCP server restarted. Fallback:
```sh
codex exec resume --skip-git-repo-check --json ""
```
Or, inside Claude Code: `/codex-resume`. The `codex-on-claude threads resume "..."` subcommand wraps this and auto-detects the silent-new-session case (where Codex returns a different `thread_id` than requested).
### `401 Invalid authentication credentials`
Even when `claude auth status` looks healthy:
```sh
claude auth login --claudeai --email
```
### Skill not detected in a new session
Restart Claude Code. Skills are loaded once at session start; install / reconfigure / uninstall only affect future sessions (with the rare exception of hot-load when the harness happens to refresh).
### Reconfigure removed something I wanted
Re-run with the full set:
```sh
codex-on-claude reconfigure
# step through with arrow keys, the previous answers are pre-selected
```
The final review screen lets you `Edit again` before saving.
### `codex-on-claude threads latest --format=id` includes the CLI banner / ANSI escape codes
Fixed in v0.3.4 โ the top-level `codex-on-claude vX.Y.Z` banner now goes to **stderr**, so stdout for `--format=id|json` data subcommands is clean:
```sh
codex-on-claude threads latest --format=id # stdout: bare UUID (or empty)
codex-on-claude threads latest --format=id 2>/dev/null # silences banner entirely
```
If you're stuck on โค 0.3.3 and need to scrape stdout, strip ANSI + grep UUIDs:
```sh
codex-on-claude threads latest --format=id \
| sed 's/\x1b\[[0-9;]*m//g' \
| grep -oE '[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}' \
| head -1
```
### Hook keeps logging after I set `improvementLoop=off` in `config.json` (without reconfigure)
Fixed in v0.3.4 โ the auto-on-skill hook (`codex-on-claude log --from-stdin`) now reads `~/.claude/codex-on-claude/config.json` on each call and silently no-ops when `improvementLoop` is `off` or `manual`. No `reconfigure` step required for the gate to take effect.
If your hook command path is stale (still points at the npm-cached binary from before the fix), reinstall + reconfigure to refresh:
```sh
npm install -g codex-on-claude@latest
codex-on-claude reconfigure --yes
```
Manual `/codex-log` invocations are unaffected by the guard โ explicit logging always works.
### Mixed-ownership PostToolUse group lost my user hook on uninstall (โค 0.3.3)
Fixed in v0.3.4. If you manually nested a `_coc.marker` entry inside a group also containing your own hook, the previous code removed the whole group via `.some()` semantics. The new `stripOursFromGroups` operates at the hook level โ your hooks are preserved, only marked entries are removed. The installer never produces mixed groups itself; this protects you only against manual `settings.json` edits.
---
## Contributing / license
- Issues / PRs welcome:
- License: MIT
---
## History
- [`CHANGELOG.md`](CHANGELOG.md) โ release notes
- [`docs/implementation-log.md`](docs/implementation-log.md) โ English summary of the original 2026-05-19 build session ([Korean original](docs/ko/implementation-log-2026-05-19.md))
- [`docs/test-report-2026-05-20.md`](docs/test-report-2026-05-20.md) โ English summary of the 2026-05-20 verification run ([Korean original](docs/ko/test-report-2026-05-20.md))
- [`docs/test-scenarios-codex-calls.md`](docs/test-scenarios-codex-calls.md) โ 0.3.3-era comprehensive test scenario document (~43 scenarios ร 8 groups covering MCP transport, 9 Skills, codex-reviewer Agent, 14 threads subcommands, hooks, analyzer rules, full E2E). Each scenario has Given/Setup/Command/Expected/Notes plus deterministic fixtures at `install/fixtures/analyze-rules/`.
- [`docs/test-execution-results-2026-05-20.md`](docs/test-execution-results-2026-05-20.md) โ execution log of the 0.3.4 cycle: Phase A (hooks, 10/10 PASS), Phase B (resume + SILENT_NEW_SESSION, 7/7 PASS after malformed-id correction), Codex-call ledger (27 calls across the session), bug discovery via meta-audit, synergy measurement (findings/$, findings/min, PRE/POST ratio).