https://github.com/sukhrobnurali/pr-review-agent
Multi-agent PR reviewer on LangGraph — four specialists (security, performance, tests, quality) review in parallel. Self-hostable, model-agnostic. GitHub Action + CLI + Python library.
https://github.com/sukhrobnurali/pr-review-agent
code-review github-action langchain langgraph llm multi-agent pr-review python
Last synced: 4 days ago
JSON representation
Multi-agent PR reviewer on LangGraph — four specialists (security, performance, tests, quality) review in parallel. Self-hostable, model-agnostic. GitHub Action + CLI + Python library.
- Host: GitHub
- URL: https://github.com/sukhrobnurali/pr-review-agent
- Owner: sukhrobnurali
- License: apache-2.0
- Created: 2026-05-09T22:38:06.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-05-10T00:20:50.000Z (about 2 months ago)
- Last Synced: 2026-06-20T11:32:40.096Z (8 days ago)
- Topics: code-review, github-action, langchain, langgraph, llm, multi-agent, pr-review, python
- Language: Python
- Homepage: https://github.com/marketplace/actions/multi-agent-pr-reviewer
- Size: 360 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Agents: docs/agents.md
Awesome Lists containing this project
README
# pr-review-agent
> Multi-agent pull-request reviewer built on LangGraph. Specialist agents (security, performance, tests, quality) review a PR in parallel; a supervisor synthesizes a structured, severity-ranked review comment.
Ships as a **GitHub Action**, a **CLI**, and a **Python library**. Model-agnostic — runs against OpenAI, Anthropic, or any OpenAI-compatible endpoint (Ollama, vLLM, internal gateway). Open-source, self-hostable, no data egress.
[](https://github.com/marketplace/actions/multi-agent-pr-reviewer)
[](https://github.com/sukhrobnurali/pr-review-agent/actions/workflows/ci.yml)
[](LICENSE)
[](pyproject.toml)
## How it works
```
diff + PR metadata
│
▼
┌───────────────────┐
│ supervisor │ picks specialists from
│ (path + keyword │ enabled registry; quality
│ heuristics) │ + tests always run
└────────┬──────────┘
│ selected
▼
┌───────────┴───────────┐
│ parallel fan-out │
▼ ▼ ▼ ▼
Quality Tests Perf Security each returns findings + cost
└─────┴─────┴─────┘
│
▼
┌───────────────────┐
│ dedup + rank by │
│ severity │
└────────┬──────────┘
│
▼
┌───────────────────┐
│ compose markdown │
│ + post (or │ idempotent via
│ update) comment │ marker
└───────────────────┘
```
Specialist scoping is enforced in the prompt — the security agent doesn't comment on naming, the quality agent doesn't comment on N+1 queries. Parallel fan-out keeps the wall-clock floor at one slow call, not four.
## Quick start
### Drop into a repo as a GitHub Action
```yaml
# .github/workflows/pr-review.yml
name: PR Review
on:
pull_request:
types: [opened, synchronize, reopened]
permissions:
contents: read
pull-requests: write
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: sukhrobnurali/pr-review-agent@v0.1.0
with:
provider: openai
model: gpt-4o-mini
api_key: ${{ secrets.OPENAI_API_KEY }}
```
That's it — the bot reviews every PR open / sync, posts a single comment, updates it on subsequent pushes.
### CLI
```bash
pip install pr-review-agent
export OPENAI_API_KEY=sk-...
export GITHUB_TOKEN=ghp-...
pr-review-agent review acme/widgets#42 # post the review
pr-review-agent review acme/widgets#42 --dry-run # print, don't post
pr-review-agent review --diff-file change.patch # local, no GitHub call
```
More recipes in [`examples/cli-usage.md`](examples/cli-usage.md).
### Library
```python
import asyncio
from pr_review_agent import review_pr
async def main():
out = await review_pr("acme/widgets#42")
print(out.final_comment)
print(f"cost: ${out.cost_usd:.4f}")
asyncio.run(main())
```
Full example: [`examples/library-usage.py`](examples/library-usage.py).
## Sample output
The agent reviewing a real PR on this repo — `gpt-4o-mini`, **15 findings, run
cost $0.0089**, posted by `github-actions[bot]` on PR open:


Live demo PR: [`#1` (deliberately bad, do not merge)](https://github.com/sukhrobnurali/pr-review-agent/pull/1).
Raw Markdown of an earlier fixture run is at [`scripts/dev/last_review.md`](scripts/dev/last_review.md)
if you want to see exactly what the composer emits.
Specialist scoping is enforced in the prompts — security flags injection,
quality flags nesting, tests flags missing coverage; none of them step on each
other. The comment updates idempotently via the ``
HTML marker, so re-runs on the same PR overwrite instead of duplicating.
## Why another PR reviewer
| | |
|---|---|
| **CodeRabbit / Greptile / Bito / Qodo** | Closed SaaS. Code leaves your network. Most enterprises can't ship to a third party. |
| **Generic "GPT review" Actions** | Single-prompt, one specialist's worth of attention spread across every concern. Output reads as filler. |
| **GitHub Copilot review** | Microsoft-hosted, opinionated, can't be steered. |
| **Claude Code / Cursor** | Interactive, editor-side. Don't run unattended on PRs. |
| **`pr-review-agent`** | Open-source, self-hosted, multi-agent, model-agnostic. Works with local models for air-gapped use. |
The architecture is the differentiator. Four specialists in parallel produce far better signal than one prompt asked to do everything — and the prompts are flat Markdown files you can iterate on without touching code.
## Cost & latency
Per-review cost depends on the diff size and which models the supervisor selects. The hard ceiling is enforced in CI: a 500-line synthetic PR fanned out across all four specialists must come in under **$0.50** and **90 seconds**, parametrized over OpenAI / Anthropic / Ollama — see [`tests/integration/test_cost_under_cap.py`](tests/integration/test_cost_under_cap.py).
In practice the cost is dominated by input tokens (the diff itself), so a small PR on `gpt-4o-mini` is typically a fraction of a cent. The composer renders the actual cost in the comment footer of every review, so the number is never hidden.
Wall-clock latency is bounded by the slowest single specialist call rather than the sum, because the four agents fan out in parallel — see [architecture.md → graph](docs/architecture.md#graph) and [`tests/integration/test_parallel_fanout.py`](tests/integration/test_parallel_fanout.py).
## Configuration
Drop a `.pr-review.yml` at your repo root:
```yaml
provider: openai
model: gpt-4o-mini
agents:
enabled: [quality, tests, performance, security]
models:
security: gpt-4o # use a smarter model where false positives hurt most
severity_threshold: medium
include_paths: ["**/*"]
exclude_paths: ["vendor/**", "**/*.generated.*"]
```
Full schema: [`docs/configuration.md`](docs/configuration.md).
## Documentation
| | |
|---|---|
| [`docs/architecture.md`](docs/architecture.md) | how the graph, supervisor, agents, cache, and reporter fit together |
| [`docs/agents.md`](docs/agents.md) | per-specialist scope, severity calibration, supervisor selection rules |
| [`docs/configuration.md`](docs/configuration.md) | full `.pr-review.yml` reference, provider switching, path filters |
| [`docs/prompts.md`](docs/prompts.md) | prompt format, output schema, iteration recipe |
| [`docs/benchmarks.md`](docs/benchmarks.md) | eval methodology and v0.1.0 baseline numbers |
| [`docs/landscape.md`](docs/landscape.md) | competitor survey and where this project fits |
## Hacking
```bash
git clone https://github.com/sukhrobnurali/pr-review-agent
cd pr-review-agent
uv sync --extra dev
uv run pytest # full suite (incl. eval benchmark)
uv run pytest -m "not eval" # fast loop
uv run pytest -m eval -s # benchmark with per-fixture report
uv run ruff check
uv run mypy src/
```
Adding a new specialist takes ~6 lines of Python plus a prompt — see [`docs/agents.md` → Adding a specialist](docs/agents.md#adding-a-specialist).
## License
[Apache 2.0](LICENSE). Patent grant matters for code-touching tools.