An open API service indexing awesome lists of open source software.

https://github.com/humanbased-ai/crosscheck

AI code review pipeline that turns agent-written PRs into merge-ready patches
https://github.com/humanbased-ai/crosscheck

agentic-coding ai-agents ai-code-review claude-code cli code-quality codex devtools github-webhooks pull-request

Last synced: 1 day ago
JSON representation

AI code review pipeline that turns agent-written PRs into merge-ready patches

Awesome Lists containing this project

README

          


🌏 Β δΈ­ζ–‡


crosscheck

A Humanbased project, built with crosscheck.

# crosscheck


crosscheck watch β€” live pipeline view

**Your agents ship fast. Crosscheck makes sure they ship right.**

AI coding agents create PRs faster than review habits can absorb. The failure mode isn't broken builds β€” it's *early victory*: patches that pass CI, look complete, and still hide regressions, brittle edge cases, or half-finished fixes.

Crosscheck adds an independent Review β†’ Fix β†’ Recheck loop. One agent writes the patch. Another reviews it. Findings go back to the author to repair. The result gets rechecked before merge. The PR moves toward genuinely merge-ready β€” not just "looks green."

No new hosted service. No per-review API bill. Crosscheck runs through the `claude` and `codex` CLIs you already have β€” your existing subscriptions, your machine or server.

Built by [Humanbased](https://github.com/humanbased-ai). Read the field report: [What 295 Agentic PRs Taught Us About Code Review](https://blog.humanbased.ai/posts/agentic-pr-quality-crosscheck/) β€” 295 agentic PRs analyzed, real Crosscheck logs included.

## Why crosscheck?

**Agent velocity without lowering the merge bar.**

- **Independent eyes, not self-review** β€” route Claude-authored PRs to Codex and vice versa. Self-review is exactly where early-victory failures hide.
- **Review β†’ Fix β†’ Recheck, not just comments** β€” findings return to the author agent for repair; a clean recheck follows before merge. PRs move forward, not sideways.
- **No new vendor** β€” runs through the `claude` and `codex` CLIs you already pay for. No per-review bill, no extra trust surface.
- **Configurable for any team size** β€” review-only mode, review + fix, or the full loop. Personal laptop via `watch`, always-on server via `serve`, one-shot via `review`.

## Who uses crosscheck

| Persona | Problem | How crosscheck helps |
|---|---|---|
| **Solo agentic builder** | Same agent that wrote the code may self-approve incomplete work | Independent reviewer from a different vendor, on your machine |
| **Technical founder** | AI PRs look done before delivering stable value | Closes the loop: review finding β†’ agent fix β†’ clean recheck |
| **Engineering lead** | Agent use is hard to supervise or standardize | Configurable workflow, review-only mode, and a visible PR audit trail |
| **OSS maintainer** | Review bandwidth is scarce; comments must be actionable | One-shot `crosscheck review` posts concrete findings directly on the PR |

### Common workflows

```bash
# Catch regressions before merging a solo PR
crosscheck run

# Continuous review on every incoming agent PR
crosscheck serve # always-on team server

# Review stale PRs from a queue
crosscheck scan && crosscheck kickass

# One-shot review of a specific PR
crosscheck review

# Loop until the agent produces an approved patch
crosscheck run --crazy
```

---

## Quick start

### First useful review in 10 minutes

Start with one low-risk PR before turning on continuous watch mode. You only need GitHub CLI plus one authenticated reviewer CLI.

```bash
# 1. Install crosscheck
npm install -g @humanbased/crosscheck

# 2. Authenticate GitHub
brew install gh && gh auth login

# 3. Authenticate one reviewer
npm install -g @openai/codex && codex login --device-auth
# or:
npm install -g @anthropic-ai/claude-code && claude

# 4. Check your setup
crosscheck status

# 5. Review the public fixture PR
crosscheck review https://github.com/humanbased-ai/crosscheck-proof-fixture/pull/1 --reviewer codex
```

This fixture PR intentionally contains a realistic agentic-code regression, so you can see whether Crosscheck produces a useful review before pointing it at your own repo. Use `--reviewer claude` if Claude Code is the authenticated reviewer. After the fixture review works, swap in one low-risk PR from your repo, then run `crosscheck onboard` to configure repos, workflow mode, and continuous monitoring.

### Continuous mode

```bash
# 1. Install crosscheck and the agent CLIs
npm install -g @humanbased/crosscheck
npm install -g @anthropic-ai/claude-code && claude # Claude Pro/Max subscription
npm install -g @openai/codex && codex login --device-auth # ChatGPT Plus/Pro subscription
brew install gh && gh auth login # GitHub CLI

# 2. Guided setup β€” repos, review mode, workflow pipeline
crosscheck onboard

# 3. Start watching
crosscheck watch # personal laptop
crosscheck serve # always-on team server
```

---

## Commands

```bash
crosscheck onboard # guided setup β€” pick repos, mode, and pipeline
crosscheck watch # personal use β€” tunnel + webhook + listening on your laptop
crosscheck serve # team use β€” fixed port, register webhook once
crosscheck review # review one or more PRs (comma lists, ranges, cross-repo)
crosscheck run # run the full workflow: review β†’ (fix β†’ recheck) Γ— max_rounds
crosscheck recheck|fix|resolve # force one workflow step on one or more PRs
crosscheck scan # show open PR workflow state across monitored repos
crosscheck detect-step # explain the next workflow step for one PR
crosscheck kickass # advance stale PRs from an interactive operator queue
crosscheck init # check prerequisites, write starter config
crosscheck status # auth state, config summary, CLI versions
```

**Operator queue (scan + kickass)**

`crosscheck scan` tracks two independent dimensions per PR:

| Workflow stage (`reviewState`) | Meaning | Next action |
|---|---|---|
| `NEEDS_REVIEW` | No crosscheck review for current HEAD | review |
| `NEEDS_FIX` | Reviewed β€” fix requested | fix |
| `NEEDS_RECHECK` | Fix committed, recheck pending | recheck |
| `APPROVED` | Reviewed and approved | merge |

| Verdict (`verdict`) | Meaning |
|---|---|
| `UNREVIEWED` | No review found |
| `APPROVE` | AI approved |
| `NEEDS_WORK` | AI requested changes |
| `BLOCK` | AI hard-blocked merge |

`BLOCK` and `NEEDS_WORK` both map to `NEEDS_FIX` stage β€” same next action, but the `verdict` field preserves severity so operators can prioritise.

**How workflow steps are counted**

Crosscheck reconstructs PR workflow state from visible artifacts:

| Evidence | Counts as |
|---|---|
| Review or recheck comment with `` | completed `review` / `recheck` step |
| Fix or conflict-resolve comment, such as `` | completed `fix` / `conflict-resolve` step |
| PR commit trailer, such as `Crosscheck-Step: fix` | completed step declared by that trailer |

Commit trailers are accepted as operator-declared workflow state. In practice, a PR author may command Claude, Codex, or another agent to apply a fix outside a standalone Crosscheck post; if the resulting PR commit carries `Crosscheck-Step: fix`, Crosscheck counts it as fix evidence.

That evidence only advances the next step to `recheck` when the fix commit is the current PR HEAD. If another commit lands after the fix evidence, Crosscheck starts a fresh review round so the newer code is reviewed normally. This prevents an old fix trailer from marking later changes as ready for recheck.

```bash
crosscheck scan [--tidy] [--stale-after ] [--force] [--json]
crosscheck kickass [--dry-run] [--stale-after ] [--force]
```

`crosscheck review --reviewer`, `crosscheck run --reviewer`, `crosscheck run --fixer`, and `crosscheck run --vendor` accept vendor aliases:
- Claude: `claude`, `claude-code`, `cc`, `anthropic`
- Codex: `codex`, `openai`

**Continuous improvement** *(experimental)*

```bash
crosscheck diagnose # surface failure patterns from review logs
crosscheck optimize [--apply] # rewrite reviewer instructions based on diagnose output
crosscheck impact [--money] # time saved, issues caught, code quality trends
crosscheck issue # draft and file a bug report from recent error logs
```

---

### `crosscheck onboard`

Interactive setup wizard. Picks repos/orgs to monitor, selects single-vendor or cross-vendor mode, configures the review pipeline, and writes `~/.crosscheck/config.yml` and `workflow.yml`.

```bash
crosscheck onboard # guided setup
crosscheck onboard --personal # skip persona prompt, go straight to personal mode
crosscheck onboard --team # skip persona prompt, go straight to team mode
crosscheck onboard -y # accept all defaults non-interactively
```

---

### `crosscheck watch`

Personal mode. Starts an SSH tunnel (localhost.run), registers GitHub webhooks, and listens for PR events. Everything self-cleans on Ctrl+C.

```bash
crosscheck watch
crosscheck watch --no-backtrace # skip startup scan for unreviewed open PRs
crosscheck watch --reconfigure # re-run deployment setup before starting
```

---

### `crosscheck serve`

Team mode. Binds to a fixed port β€” register the webhook once, cover the whole team. Designed for a mac-mini or home server.

```bash
crosscheck serve
crosscheck serve --no-backtrace # skip startup scan
crosscheck serve --personal # personal scope this session only
crosscheck serve --reconfigure # re-run deployment setup
```

---

### `crosscheck review `

Reviews one or more PRs. Clones, checks out, reviews, and posts the comment. The PR argument accepts the [multi-PR spec syntax](#multi-pr-syntax) β€” multiple PRs are reviewed concurrently.

```bash
crosscheck review https://github.com/org/repo/pull/42
crosscheck review --reviewer claude # force Claude regardless of detection
crosscheck review --reviewer codex # force Codex regardless of detection
crosscheck review --reviewer cc # alias for Claude
crosscheck review --reviewer openai # alias for Codex
crosscheck review .../pull/245,255 # review several PRs at once
crosscheck review .../pull/245-256 # review an inclusive range
```

---

### `crosscheck run `

Runs the full configured workflow against one or more PRs: review → (fix → recheck) × `max_rounds`. Loops autonomously through fix→recheck cycles up to the `max_rounds` value configured in `workflow.yml` (default: 1). Use `--crazy` or `--half-crazy` to loop until approved or unblocked, ignoring `max_rounds`. The PR argument accepts the [multi-PR spec syntax](#multi-pr-syntax); multiple PRs run concurrently (one agent per PR by default).

```bash
crosscheck run
crosscheck run --steps review # only the review step
crosscheck run --steps fix,recheck # skip initial review
crosscheck run --reviewer claude # force review/recheck agent
crosscheck run --fixer claude # force fix agent
crosscheck run --vendor claude # force review/recheck/fix agent
crosscheck run --dry-run # review without posting or fixing
crosscheck run --crazy # πŸ”₯πŸ”₯ loop until APPROVE
crosscheck run --half-crazy # πŸ”₯ loop until not BLOCK
crosscheck run --timeout 10m # custom reviewer timeout
crosscheck run .../pull/245,255 # several PRs, concurrently
crosscheck run .../pull/245-256 --concurrent 3 # range, max 3 agents in parallel
```

---

### `crosscheck recheck` / `fix` / `resolve `

Force a single workflow step against one or more PRs, bypassing next-step auto-detection. Each is sugar for `crosscheck run --steps ` and accepts the same [multi-PR spec syntax](#multi-pr-syntax) and `--concurrent` / `--sequential` / `--stagger` flags as `run`.

| Command | Forces the step |
|---|---|
| `crosscheck recheck ` | `recheck` β€” re-evaluate against the latest review |
| `crosscheck fix ` | `fix` β€” apply fixes for the latest review |
| `crosscheck resolve ` | `conflict-resolve` β€” resolve merge conflicts (Claude only) |

```bash
crosscheck recheck https://github.com/org/repo/pull/42
crosscheck fix .../pull/245,255 --fixer claude
crosscheck resolve .../pull/245-256 --vendor claude
```

When the step is absent from the active `workflow.yml`, `recheck` and `conflict-resolve` are synthesized with built-in defaults so the command still runs.

---

### Multi-PR syntax

`run`, `review`, `recheck`, `fix`, and `resolve` accept a single PR URL or a **spec** that expands to many PRs β€” a comma-separated list of full URLs, bare numbers, and `N-M` ranges:

```bash
.../pull/245,255 # two PRs in the same repo
.../pull/245-256 # an inclusive range
.../repo/pull/245,https://github.com/o/other/pull/3 # across repos
```

The first token must be a full URL (a bare number inherits the most recent repo). Duplicates are de-duplicated and a spec expands to at most 100 PRs. Multiple PRs run concurrently by default β€” control parallelism with `--concurrent `, `--sequential`, or `--stagger `.

---

### `crosscheck scan`

Scans every open PR in the configured monitor scope and reports where each one is in the crosscheck workflow. Results are cached for 60 seconds.

States: `NEEDS_REVIEW` Β· `NEEDS_FIX` Β· `BLOCK` Β· `NEEDS_RECHECK` Β· `APPROVE`

```bash
crosscheck scan # all open PRs, grouped stale/not-stale
crosscheck scan --tidy # stale actionable rows only
crosscheck scan --stale-after 4h # custom staleness threshold (default 24h)
crosscheck scan --force # bypass cache
crosscheck scan --json # machine-readable output
```

---

### `crosscheck detect-step`

Explains the workflow history for one PR and prints the next step Crosscheck would run. Use this when a PR has mixed evidence from comments, Crosscheck commits, or ad hoc agent commits with `Crosscheck-Step` trailers.

```bash
crosscheck detect-step
crosscheck detect-step --json
```

---

### `crosscheck kickass`

Selects stale PRs from the operator queue and advances them β€” runs `scan` first, presents a multi-select picker, shows a preflight summary, then executes after confirmation.

```bash
crosscheck kickass # interactive operator queue
crosscheck kickass --dry-run # preflight only β€” no mutations
crosscheck kickass --stale-after 2h # tighter staleness threshold
crosscheck kickass --force # bypass scan cache before picking
crosscheck kickass --crazy # πŸ”₯πŸ”₯ auto loop until APPROVE
crosscheck kickass --half-crazy # πŸ”₯ auto loop until not BLOCK
```

Actions: `NEEDS_REVIEW β†’ CR` Β· `NEEDS_FIX/BLOCK β†’ Fix` Β· `NEEDS_RECHECK β†’ Recheck` Β· `APPROVE β†’ Merge`

**`kickass` + `watch` combo**

For the best recovery experience when a batch of PRs is stuck (timed out, stopped before `watch` was running), run both commands together. Each plays a distinct role:

- `kickass` kicks each stuck PR **one step at a time** β€” it uses `detect-step` to read live PR history and dispatches only the next needed step (review, fix, or recheck).
- `watch` owns **all continuation** β€” it listens for the webhooks each completed step produces and runs the full remaining pipeline from there.

```
crosscheck kickass
└─ ck run --trigger kickass (one step; detect-step finds where to start)
└─ detect-step β†’ "review" run review only β†’ posts comment
└─ detect-step β†’ "fix" run fix only β†’ pushes commit
└─ detect-step β†’ "recheck" run recheck only β†’ posts verdict

crosscheck watch
β”œβ”€ issue_comment (type=review) β†’ pick up fix step automatically
└─ synchronize (fix commit) β†’ pick up recheck step automatically
```

> **Note:** `crosscheck run ` invoked directly runs the **full remaining pipeline** from the detected starting step. The one-step behaviour above applies only when kickass dispatches it with `--trigger kickass`.

Start `watch` first, then run `kickass` in a second terminal:

```bash
# terminal 1
crosscheck watch

# terminal 2
crosscheck scan --force # refresh PR state
crosscheck kickass
```

> **How the review→fix bridge works:** after `kickass` posts a review comment, GitHub fires an `issue_comment` webhook (not a `pull_request` event). `watch` subscribes to `issue_comment` and, when it sees a crosscheck `type=review` annotation on an open PR, fetches the current PR head and runs the fix step automatically — no new commit required to wake it up. (Introduced in [#193](https://github.com/Motivation-Labs/crosscheck/pull/193).)

**Autonomous loop modes**

`--crazy` and `--half-crazy` turn `run` and `kickass` into autonomous fix→recheck loops that keep going until the verdict improves — no manual re-runs needed.

| Flag | Stops when | Max rounds | Timeout |
|---|---|---|---|
| `--crazy` πŸ”₯πŸ”₯ | verdict = `APPROVE` | ∞ | none |
| `--half-crazy` πŸ”₯ | verdict β‰  `BLOCK` | ∞ | none |

Both flags disable all reviewer subprocess timeout constraints β€” long fixes on large PRs won't be cut short. Use `--timeout ` (e.g. `--timeout 10m`) without these flags to set a custom cap.

```bash
# Run full workflow and keep looping until approved
crosscheck run --crazy

# Advance every stale PR until it's no longer blocked
crosscheck kickass --half-crazy

# Custom timeout without looping
crosscheck run --timeout 10m
```

---

## Configuration

Crosscheck uses `~/.crosscheck/config.yml` by default. If that file exists, it wins over `./crosscheck.config.yml` unless you pass `--config ./crosscheck.config.yml`.

### Review depth (`quality.tier`)

```yaml
# crosscheck.config.yml
quality:
tier: balanced # fast | balanced | thorough
```

| Tier | Claude model | Codex model | Latency |
|---|---|---|---|
| `fast` | Haiku | default | ~10s |
| `balanced` | Sonnet (default) | default | ~30s |
| `thorough` | Opus | default | ~60s |

### Pipeline (`workflow.yml`)

```yaml
steps:
- name: review
type: review
reviewer: auto # auto | claude | codex | origin

- name: fix
type: fix
reviewer: origin
when: review.verdict != 'APPROVE'

- name: recheck
type: recheck
reviewer: auto
when: fix.applied_count > 0
```

### Config snapshot

```yaml
# ~/.crosscheck/config.yml
orgs:
- your-org

routing:
allowed_authors:
- your-github-login

mode: cross-vendor # cross-vendor | single-vendor

vendors:
claude:
enabled: true
codex:
enabled: true

quality:
tier: balanced

clone_protocol: ssh # ssh (default) | https
```

Full reference: [get-started.md](./get-started.md)

---

## Requirements

| | Minimum |
|---|---|
| Node.js | 18+ |
| Claude Code CLI | latest β€” `npm install -g @anthropic-ai/claude-code` |
| Codex CLI | latest β€” `npm install -g @openai/codex` |
| GitHub CLI | 2.65+ β€” `brew install gh` |

`GITHUB_TOKEN` is derived automatically from `gh auth login`. No manual export needed.

---

## Documentation

| | |
|---|---|
| **[get-started.md](./get-started.md)** | Full setup guide β€” prerequisites, all flags, complete config reference, FAQ |
| **[What 295 Agentic PRs Taught Us About Code Review](https://blog.humanbased.ai/posts/agentic-pr-quality-crosscheck/)** | Humanbased field report on agentic PR quality, review routing, and why Crosscheck exists |
| **[docs/fixture-pr.md](./docs/fixture-pr.md)** | Safe public fixture PR for the first Crosscheck review |
| **[crosscheck.config.example.yml](./crosscheck.config.example.yml)** | Annotated config with every option |
| **[CHANGELOG.md](./CHANGELOG.md)** | Release notes |

---

## Contributing

Issues and PRs welcome at [github.com/humanbased-ai/crosscheck](https://github.com/humanbased-ai/crosscheck).

---

## License

[MIT](./LICENSE) β€” Copyright (c) 2025–2026 Humanbased PTE LTD.