https://github.com/orenlab/codeclone
Deterministic structural code quality analysis for Python with baseline-aware governance, canonical reporting, and an optional MCP interface for agents and IDEs.
https://github.com/orenlab/codeclone
agentic-development ast baseline ci clone-detection code-quality dead-code deterministic-reporting mcp python quality-gates software-metrics static-analysis
Last synced: about 1 month ago
JSON representation
Deterministic structural code quality analysis for Python with baseline-aware governance, canonical reporting, and an optional MCP interface for agents and IDEs.
- Host: GitHub
- URL: https://github.com/orenlab/codeclone
- Owner: orenlab
- License: mpl-2.0
- Created: 2026-01-15T17:13:40.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-04-02T15:54:30.000Z (about 2 months ago)
- Last Synced: 2026-04-03T01:42:57.648Z (about 2 months ago)
- Topics: agentic-development, ast, baseline, ci, clone-detection, code-quality, dead-code, deterministic-reporting, mcp, python, quality-gates, software-metrics, static-analysis
- Language: Python
- Homepage: https://orenlab.github.io/codeclone/
- Size: 2.51 MB
- Stars: 5
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md
- Agents: AGENTS.md
Awesome Lists containing this project
README
Structural code quality analysis for Python
---
CodeClone provides deterministic structural code quality analysis for Python.
It detects architectural duplication, computes quality metrics, and enforces CI gates — all with **baseline-aware
governance** that separates **known** technical debt from **new** regressions.
An optional MCP interface exposes the same canonical analysis pipeline to AI agents and IDEs.
Docs: [orenlab.github.io/codeclone](https://orenlab.github.io/codeclone/) ·
Live sample report:
[orenlab.github.io/codeclone/examples/report/](https://orenlab.github.io/codeclone/examples/report/)
> [!NOTE]
> This README and docs site track the in-development `v2.0.x` line from `main`.
> For the latest stable CodeClone documentation (`v1.4.4`), see the
> [`v1.4.4` README](https://github.com/orenlab/codeclone/blob/v1.4.4/README.md)
> and the
> [`v1.4.4` docs tree](https://github.com/orenlab/codeclone/tree/v1.4.4/docs).
## Features
- **Clone detection** — function (CFG fingerprint), block (statement windows), and segment (report-only) clones
- **Structural findings** — duplicated branch families, clone guard/exit divergence and clone-cohort drift (report-only)
- **Quality metrics** — cyclomatic complexity, coupling (`CBO`), cohesion (`LCOM4`), dependency cycles, dead code,
health score, and report-only `Overloaded Modules` profiling
- **Baseline governance** — separates accepted **legacy** debt from **new regressions** and lets CI fail **only** on
what changed
- **Reports** — interactive HTML, deterministic JSON/TXT plus Markdown and SARIF projections from one canonical report
- **MCP server** — optional read-only surface for AI agents and IDEs, designed as a budget-aware guided control
surface for agentic development
- **VS Code extension** — preview native client for CodeClone MCP with triage-first structural review
- **Native client surfaces** — preview Claude Desktop bundle and Codex plugin over the same canonical MCP contract
- **CI-first** — deterministic output, stable ordering, exit code contract, pre-commit support
- **Fast** — incremental caching, parallel processing, warm-run optimization, and reproducible benchmark coverage
## Quick Start
```bash
uv tool install codeclone # use --pre for beta
codeclone . # analyze
codeclone . --html # HTML report
codeclone . --html --open-html-report # open in browser
codeclone . --json --md --sarif --text # all formats
codeclone . --ci # CI mode
```
More examples
```bash
# timestamped report snapshots
codeclone . --html --json --timestamped-report-paths
# changed-scope gating against git diff
codeclone . --changed-only --diff-against main
# shorthand: diff source for changed-scope review
codeclone . --paths-from-git-diff HEAD~1
```
Run without install
```bash
uvx codeclone@latest .
```
## CI Integration
```bash
# 1. Generate baseline (commit to repo)
codeclone . --update-baseline
# 2. Add to CI pipeline
codeclone . --ci
```
What --ci enables
The --ci preset equals --fail-on-new --no-color --quiet.
When a trusted metrics baseline is loaded, CI mode also enables
--fail-on-new-metrics.
### GitHub Action
CodeClone also ships a composite GitHub Action for PR and CI workflows:
```yaml
- uses: orenlab/codeclone/.github/actions/codeclone@main
with:
fail-on-new: "true"
sarif: "true"
pr-comment: "true"
```
It can:
- run baseline-aware gating
- generate JSON and SARIF reports
- upload SARIF to GitHub Code Scanning
- post or update a PR summary comment
Action docs:
[.github/actions/codeclone/README.md](https://github.com/orenlab/codeclone/blob/main/.github/actions/codeclone/README.md)
### Quality Gates
```bash
# Metrics thresholds
codeclone . --fail-complexity 20 --fail-coupling 10 --fail-cohesion 4 --fail-health 60
# Structural policies
codeclone . --fail-cycles --fail-dead-code
# Regression detection vs baseline
codeclone . --fail-on-new-metrics
```
### Pre-commit
```yaml
repos:
- repo: local
hooks:
- id: codeclone
name: CodeClone
entry: codeclone
language: system
pass_filenames: false
args: [ ".", "--ci" ]
types: [ python ]
```
## MCP Server
Optional read-only MCP server for AI agents and IDE clients.
21 tools + 10 resources — never mutates source, baselines, or repo state.
```bash
uv tool install --pre "codeclone[mcp]" # or: uv pip install --pre "codeclone[mcp]"
codeclone-mcp --transport stdio # local (Claude Code, Codex, Copilot, Gemini CLI)
codeclone-mcp --transport streamable-http # remote / HTTP-only clients
```
Docs:
[MCP usage guide](https://orenlab.github.io/codeclone/mcp/)
·
[MCP interface contract](https://orenlab.github.io/codeclone/book/20-mcp-interface/)
### Native Client Surfaces
| Surface | Location | Purpose |
|---------------------------|------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------|
| **VS Code extension** | [VS Code Marketplace](https://marketplace.visualstudio.com/items?itemName=orenlab.codeclone) | Triage-first structural review in the editor |
| **Claude Desktop bundle** | [`extensions/claude-desktop-codeclone/`](https://github.com/orenlab/codeclone/tree/main/extensions/claude-desktop-codeclone) | Local `.mcpb` install with pre-loaded instructions |
| **Codex plugin** | [`plugins/codeclone/`](https://github.com/orenlab/codeclone/tree/main/plugins/codeclone) | Native discovery, two skills, and MCP definition |
All three are thin wrappers over the same `codeclone-mcp` contract — no second analysis engine.
## Configuration
CodeClone can load project-level configuration from `pyproject.toml`:
```toml
[tool.codeclone]
min_loc = 10
min_stmt = 6
baseline = "codeclone.baseline.json"
skip_metrics = false
quiet = false
html_out = ".cache/codeclone/report.html"
json_out = ".cache/codeclone/report.json"
md_out = ".cache/codeclone/report.md"
sarif_out = ".cache/codeclone/report.sarif"
text_out = ".cache/codeclone/report.txt"
block_min_loc = 20
block_min_stmt = 8
segment_min_loc = 20
segment_min_stmt = 10
```
Precedence: CLI flags > `pyproject.toml` > built-in defaults.
## Baseline Workflow
Baselines capture the current duplication state. Once committed, they become the CI reference point.
- Clones are classified as **NEW** (not in baseline) or **KNOWN** (accepted debt)
- `--update-baseline` writes both clone and metrics snapshots
- Trust is verified via `generator`, `fingerprint_version`, and `payload_sha256`
- In `--ci` mode, an untrusted baseline is a contract error (exit 2)
Full contract: [Baseline contract](https://orenlab.github.io/codeclone/book/06-baseline/)
## Exit Codes
| Code | Meaning |
|------|-------------------------------------------------------------------------------|
| `0` | Success |
| `2` | Contract error — untrusted baseline, invalid config, unreadable sources in CI |
| `3` | Gating failure — new clones or metric threshold exceeded |
| `5` | Internal error |
Contract errors (`2`) take precedence over gating failures (`3`).
## Reports
| Format | Flag | Default path |
|----------|-----------|---------------------------------|
| HTML | `--html` | `.cache/codeclone/report.html` |
| JSON | `--json` | `.cache/codeclone/report.json` |
| Markdown | `--md` | `.cache/codeclone/report.md` |
| SARIF | `--sarif` | `.cache/codeclone/report.sarif` |
| Text | `--text` | `.cache/codeclone/report.txt` |
All report formats are rendered from one canonical JSON report document.
- `--open-html-report` opens the generated HTML report in the default browser and requires `--html`.
- `--timestamped-report-paths` appends a UTC timestamp to default report filenames for bare report flags such as
`--html` or `--json`. Explicit report paths are not rewritten.
The docs site also includes live example HTML/JSON/SARIF reports generated from the current `codeclone` repository.
Structural findings include:
- `duplicated_branches`
- `clone_guard_exit_divergence`
- `clone_cohort_drift`
### Inline Suppressions
CodeClone keeps dead-code detection deterministic and static by default. When a symbol is intentionally
invoked through runtime dynamics (for example framework callbacks, plugin loading, or reflection), suppress
the known false positive explicitly at the declaration site:
```python
# codeclone: ignore[dead-code]
def handle_exception(exc: Exception) -> None:
...
class Middleware: # codeclone: ignore[dead-code]
...
```
Dynamic/runtime false positives are resolved via explicit inline suppressions, not via broad heuristics.
Canonical JSON report shape (v2.3)
```json
{
"report_schema_version": "2.3",
"meta": {
"codeclone_version": "2.0.0b4",
"project_name": "...",
"scan_root": ".",
"report_mode": "full",
"analysis_thresholds": {
"design_findings": {
"...": "..."
}
},
"baseline": {
"...": "..."
},
"cache": {
"...": "..."
},
"metrics_baseline": {
"...": "..."
},
"runtime": {
"analysis_started_at_utc": "...",
"report_generated_at_utc": "..."
}
},
"inventory": {
"files": {
"...": "..."
},
"code": {
"...": "..."
},
"file_registry": {
"encoding": "relative_path",
"items": []
}
},
"findings": {
"summary": {
"...": "..."
},
"groups": {
"clones": {
"functions": [],
"blocks": [],
"segments": []
},
"structural": {
"groups": []
},
"dead_code": {
"groups": []
},
"design": {
"groups": []
}
}
},
"metrics": {
"summary": {},
"families": {}
},
"derived": {
"suggestions": [],
"overview": {
"families": {},
"top_risks": [],
"source_scope_breakdown": {},
"health_snapshot": {},
"directory_hotspots": {}
},
"hotlists": {
"most_actionable_ids": [],
"highest_spread_ids": [],
"production_hotspot_ids": [],
"test_fixture_hotspot_ids": []
}
},
"integrity": {
"canonicalization": {
"version": "1",
"scope": "canonical_only"
},
"digest": {
"algorithm": "sha256",
"verified": true,
"value": "..."
}
}
}
```
Canonical contract: [Report contract](https://orenlab.github.io/codeclone/book/08-report/) and
[Dead-code contract](https://orenlab.github.io/codeclone/book/16-dead-code-contract/)
## How It Works
1. **Parse** — Python source to AST
2. **Normalize** — canonical structure (robust to renaming, formatting)
3. **CFG** — per-function control flow graph
4. **Fingerprint** — stable hash computation
5. **Group** — function, block, and segment clone groups
6. **Metrics** — complexity, coupling, cohesion, dependencies, dead code, health
7. **Gate** — baseline comparison, threshold checks
Architecture: [Architecture narrative](https://orenlab.github.io/codeclone/architecture/) ·
CFG semantics: [CFG semantics](https://orenlab.github.io/codeclone/cfg/)
## Documentation
| Topic | Link |
|----------------------------|-----------------------------------------------------------------------------------------------------|
| Contract book (start here) | [Contracts and guarantees](https://orenlab.github.io/codeclone/book/00-intro/) |
| Exit codes | [Exit codes and failure policy](https://orenlab.github.io/codeclone/book/03-contracts-exit-codes/) |
| Configuration | [Config and defaults](https://orenlab.github.io/codeclone/book/04-config-and-defaults/) |
| Baseline contract | [Baseline contract](https://orenlab.github.io/codeclone/book/06-baseline/) |
| Cache contract | [Cache contract](https://orenlab.github.io/codeclone/book/07-cache/) |
| Report contract | [Report contract](https://orenlab.github.io/codeclone/book/08-report/) |
| Metrics & quality gates | [Metrics and quality gates](https://orenlab.github.io/codeclone/book/15-metrics-and-quality-gates/) |
| Dead code | [Dead-code contract](https://orenlab.github.io/codeclone/book/16-dead-code-contract/) |
| Docker benchmark contract | [Benchmarking contract](https://orenlab.github.io/codeclone/book/18-benchmarking/) |
| Determinism | [Determinism policy](https://orenlab.github.io/codeclone/book/12-determinism/) |
## Benchmarking Notes
Reproducible Docker Benchmark
```bash
./benchmarks/run_docker_benchmark.sh
```
The wrapper builds `benchmarks/Dockerfile`, runs isolated container benchmarks, and writes results to
`.cache/benchmarks/codeclone-benchmark.json`.
Use environment overrides to pin the benchmark envelope:
```bash
CPUSET=0 CPUS=1.0 MEMORY=2g RUNS=16 WARMUPS=4 \
./benchmarks/run_docker_benchmark.sh
```
Performance claims are backed by the reproducible benchmark workflow documented
in [Benchmarking contract](https://orenlab.github.io/codeclone/book/18-benchmarking/)
## License
- **Code:** MPL-2.0
- **Documentation:** MIT
Versions released before this change remain under their original license terms.
## Links
- **Issues:**
- **PyPI:**
- **Licenses:** [MPL-2.0](LICENSE) · [MIT docs](LICENSE-docs)