https://github.com/nadimtuhin/claude-code-self-critic

Self-critique and stuck-detection hooks for Claude Code agents. Deterministic fact-gating catches hallucinated claims before a turn completes.
https://github.com/nadimtuhin/claude-code-self-critic

agent-safety agentic-coding ai-agent claude claude-code claude-hooks guardrails hooks llm self-critique

Last synced: 2 days ago
JSON representation

Self-critique and stuck-detection hooks for Claude Code agents. Deterministic fact-gating catches hallucinated claims before a turn completes.

Host: GitHub
URL: https://github.com/nadimtuhin/claude-code-self-critic
Owner: nadimtuhin
License: mit
Created: 2026-06-20T09:27:31.000Z (7 days ago)
Default Branch: main
Last Pushed: 2026-06-20T11:03:03.000Z (7 days ago)
Last Synced: 2026-06-20T13:05:09.649Z (7 days ago)
Topics: agent-safety, agentic-coding, ai-agent, claude, claude-code, claude-hooks, guardrails, hooks, llm, self-critique
Language: JavaScript
Size: 40 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Self-Critic Hooks for Claude Code

**Catch your AI agent lying to you — before the turn ends.**

Two zero-dependency hooks that give Claude Code real-time self-critique and stuck-detection. Deterministic fact-gating extracts evidence, applies rule-based checks, and only escalates to a cheap LLM call to suppress false positives. The LLM never authors the critique — it only votes.

---

## The Problem

You've seen it. Your agent says "all tests pass" — but you check, and they don't. It claims "the build succeeds" — but there's a compile error. It repeats the same failing command 5 times in a row, convinced this time will be different.

Agents hallucinate success. They get stuck in loops. And there's no safety net.

## The Solution

Two hooks that run at the right moments:

**Stop Hook (Self-Critic)** — Runs at the end of each turn. Extracts evidence from the transcript (did it actually run tests? did it read the file it claims verified?). Applies deterministic rule-based gates. If claims are unbacked by evidence, it blocks the turn and tells the agent exactly what's missing. Escalates to a Haiku veto only to suppress false positives — Haiku never writes the critique.

**PreToolUse Stuck-Detector** — Runs before every Bash, Edit, or Write. Detects when the same flag-normalized command or file target is repeated 3+ times. Sends a mid-turn nudge to course-correct before the turn spirals. Window resets each turn.

---

## Quick Start

**One-liner (no clone needed):**

```bash
curl -fsSL https://raw.githubusercontent.com/nadimtuhin/claude-code-self-critic/main/install.sh | bash
```

**Or clone and run:**

```bash
git clone https://github.com/nadimtuhin/claude-code-self-critic
cd claude-code-self-critic
./install.sh
```

That's it. The installer runs the full test suite in `/tmp` first. If tests pass, it deploys hooks and auto-merges them into `~/.claude/settings.json`. If tests fail, it aborts without touching anything.

Requires `node >=20` and `jq`.

```bash
./install.sh # test in /tmp, then install/update
./install.sh --test-only # just run tests, don't deploy
./install.sh --uninstall # remove hooks cleanly
```

---

## Why This Approach

| Approach | Problem |
|----------|---------|
| Trust the agent | Agents hallucinate success claims |
| Pure LLM critic | Expensive, slow, unreliable, writes its own opinions |
| Pure rules | Rigid, can't suppress false positives |
| **This: Deterministic-first + LLM veto** | **Cheap, reliable, LLM only votes to suppress** |

The key insight: don't let the LLM write the critique. Extract evidence deterministically, apply rules, build a block reason from unbacked claims. Then — and only then — let a cheap Haiku call vote "is this a false positive?" The LLM is a tiebreaker, not the author.

---

## Architecture

```
src/
fact-gate.mjs # Evidence extraction + rule-based gates (pure, testable)
critic-core.mjs # Fact-gate decision logic + episode tracking (pure)
stuck-core.mjs # Repeat detection + escalation logic (pure)
evidence.mjs # Test-run detection, command parsing (I/O edge)
state.mjs # File-based state persistence (I/O edge)
stop-hook.mjs # Stop hook entry point (thin wiring)
pretool-hook.mjs # PreToolUse hook entry point (thin wiring)
test/
*.test.mjs # 40 tests, zero dependencies
fixtures/ # Real transcript fixtures
```

Pure cores have no I/O — 100% testable. I/O lives at the edges. Thin hooks wire them together. This is a deliberate separation: the decision logic is deterministic and fast; the LLM is isolated to a single veto call with a hard timeout.

## Key Design Decisions

- **Deterministic fact-gating first** — Extract evidence (ran tests? read files?) before any LLM call
- **Haiku veto, not Haiku critic** — LLM only suppresses false positives, never authors the critique
- **Episode cap of 1** — Prevents infinite critique loops (critique-the-critique-the-critique)
- **Fail-open everywhere** — Any hook error allows the turn. A bug degrades to "un-critiqued," never a wrongful block
- **Flag-normalized Bash matching** — `npm test -- --grep foo` and `npm test --grep foo` are the same target

## Hardcoded Defaults

No config file. Tuned defaults baked in:
- `model=haiku` — Fast, cheap, good for veto
- `MAX_ROUNDS=1` — Episode cap (prevent infinite loops)
- `STUCK_THRESHOLD=3` — Nudge after 3 repeats
- `MIN_CRITIQUE_CHARS=80` — Turns shorter than this are trivial, skipped
- `CRITIC_TIMEOUT=60s` — Haiku veto wall-clock timeout

## Test

```bash
node --test
```

40 tests across fact-gate, evidence, critic-core, stuck-core, and state. Zero dependencies.

## Fail-Open Guarantee

Any hook error (file I/O, JSON parse, LLM timeout) allows the turn to proceed. Hooks never block due to their own bug — only due to detected agent issues.

## Known Limitations

- **Per-turn latency:** A turn that trips the fact-gate incurs one synchronous ~7-10s Haiku call. Only tripped turns pay this; clean turns are free.
- **Stuck-detector is coarse:** Matches flag-normalized Bash target or exact file_path. Editing the same file 3x in one turn can produce a spurious nudge (non-blocking, window resets each turn).
- **Transcript boundary:** Evidence scanning scopes to the current turn. On transcripts with no string-content user prompt (some sub-agent/resume shapes), it conservatively reports `ranTests=false`.

## License

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nadimtuhin/claude-code-self-critic

Awesome Lists containing this project

README