An open API service indexing awesome lists of open source software.

https://github.com/charleschenai/codemap

Static codebase + binary analyzer and decompiler. Decompiles stripped PE/ELF/Mach-O to readable, behaviorally-verified C — structs, arrays, strings, C++ vtables and try/catch. 524 actions, single Rust binary, zero deps.
https://github.com/charleschenai/codemap

codebase-analysis codemap dependency-analysis graph-theory static-analysis

Last synced: 3 days ago
JSON representation

Static codebase + binary analyzer and decompiler. Decompiles stripped PE/ELF/Mach-O to readable, behaviorally-verified C — structs, arrays, strings, C++ vtables and try/catch. 524 actions, single Rust binary, zero deps.

Awesome Lists containing this project

README

          

# codemap

> Static codebase + binary analyzer, **decompiler, and patcher**. One binary, 610 actions, 18 source languages, PE/ELF/Mach-O/WASM decompilation to readable **recompilable** C on **x86/x64, ARM64, RISC-V, and WebAssembly**, sub-second cold-cache on 3K-file repos. **No network, no servers, no databases, no API keys.**

**This README is your system prompt.** Designed for AI agents: drop the entire file into your context (or fetch `https://raw.githubusercontent.com/charleschenai/codemap/main/README.md`) and you have everything you need — what codemap is, when to use it, how to install it, how to call every category of action, output schemas, exit codes, MCP setup. No further docs required for 95% of usage. Humans: see [`docs/HUMAN.md`](docs/HUMAN.md). Everyone else, keep reading.

**Mission:** Break down CODE (source + binary) so AI can replicate it.

## What's new in v8.21 — the 40-topic grind complete (610 actions)

Two full 20-topic roadmaps (THIRD + FOURTH) landed. THIRD-20 (all real, gate-verified): transplant, translate, fingerprint, hot-patch, api-shim, size-opt, multi-refactor, fuzz-harness, instrument, visual-docs, vuln-discover, protocol-rec, vectorize, ml-patch, jit-resolve, self-rewrite, gpu-lift, kernel-rewrite, mobile-fuse, os-map. FOURTH-20 mediums (real): self-bench, eval-suite, lasm, worm-defense, pear-fuzz, pqc-translate, ref-decompile. FOURTH-20 deep tier ships as **honest skeletons** — they emit obligations/prompt-packs/specs/plans and each explicitly states the heavy backend (Coq/Lean verifier, LLM, CPU emulator, ZK prover, full superset engine, GPU recompiler) is NOT integrated; never a faked verified/proven claim: prove-rewrite, llm-decompile, superset-decompile, proof-patch, sys-sim, self-improve-demo, meta-evolve, zk-attest, gpu-rewrite.

## What's new in v8.13 — the autonomous + verifiable engine

codemap is now an **autonomous, self-improving, verifiable** security engine. The decompiler covers **five architectures** end-to-end and the action arsenal composes into goal-driven, no-human loops.

**Multi-arch decompiler COMPLETE.** `decompile`/`ir` produce readable, recompilable C for **x86, x64, ARM64 (incl NEON/SIMD), RISC-V (RV64GC incl compressed/M/A), and WebAssembly** — all through the same lift → SSA → type/var recovery → SAILR structuring → C pipeline.

**The Autonomous lane (new actions):**
- **`run`** — *agentic mode*: `codemap run goal= ` runs a deterministic, **offline, no-LLM** PLAN→ACT→OBSERVE→VERIFY→REPORT loop that composes existing actions into a DAG, threads one graph, is budget/step-capped, emits JSON, and only marks a finding *fixed* if the patch recompiles + re-validates.
- **`learn`** — *self-improving*: records what-worked from each run into a project-brain store; `run`'s planner consults it to tune the DAG over time. The loop is closed — planning improves with usage, no code changes.
- **`redteam`** — autonomous offensive campaign (taint → symbolic → ranked PoC bundle + report).
- **`infer-spec ... export=acsl|lean`** — machine-checkable proof export (Frama-C ACSL + Lean/Coq), so patches are *provable*, not just plausible.
- **`provenance`** — signed, tamper-evident manifests for patched/twinned/hardened artifacts.
- **`pqc-migrate`** — detect quantum-vulnerable crypto → apply NIST PQC (ML-KEM/ML-DSA/SLH-DSA) → equivalence note.
- **`deobfuscate`** — the inverse of `harden`: de-flatten CFG, crack opaque predicates, decrypt strings via symbolic + graph.

Plus, across the roadmap: `binary-twin` (cleanroom fork), `xlang-graph` (cross-language call fusion), `to-rust` (C→idiomatic Rust), `replay` (record/replay + mutation), `what-if` (change-impact), `firmware`, `sbom-flow`, `crypto-audit`, `model-extract`, `game-assets`, `brain-lock`. **610 actions.**

## What's new in v8.4 — multi-arch decompiler + the three strategic arms

v8.4 pushes the v8.3 decompiler in two directions: **multi-architecture** (it now produces recompilable C for **ARM64/AArch64**, not just x86/x64) and the first increments of the three strategic arms that turn codemap from *read-only intelligence* into a full **understand → reason → change** platform.

**v8.4.0 new actions (Phase-1+ across roadmap topics):** `project-brain` (persistent project memory + git-history what-changed), `infer-spec` (formal pre/post/invariant inference → ACSL + Rust contracts, Daikon-style templates), `c-diff` (graph-aware decompiled-C diff with call-graph change propagation), `ci` (binary CI/CD attack-surface gate), `vuln-backport` (CVE patch → older-binary backport locator). ARM64 decompilation hardened: recursion, switch recovery, emit cleanup, recursive-call returns. **610 actions.**

- **ARM64 / AArch64 decompiler.** ARM64 Mach-O now disassembles (Capstone-backed `Arm64Lifter`; function sizing from `LC_FUNCTION_STARTS`) and lifts through the same IR pipeline as x86 — `codemap ir ` emits readable C with recovered args (AAPCS64 `x0`–`x7`), real calls (recursion is a `call`, not an `asm` comment), and frame/`sp` modeling. **`--verify` PASS on ARM64**, not just x86: both arches decompile → recompile cleanly.
- **`ir --verify` — recompile gate, first-class.** `codemap ir --verify` writes the emitted C to a temp file and runs a host C compiler on it, reporting **PASS / FAIL** — ground truth that the decompilation is *recompilable*, not just plausible. The backbone of codemap's verify-by-running discipline.
- **Arm 1 — Binary patching.** `bin-patch-fn`: surgical, layout-preserving **in-place function patching** (canned stubs `ret0`/`ret1`/`ret`/`nop` or raw hex), fits-gated, verified by re-disassembly. Neutralize a check (`bin-patch-fn ./app check_license ret1`) without touching any other offset / reloc / string. (The decompile → edit-C → recompile → relink loop is the next increment.)
- **Arm 2 — Symbolic / concolic.** `concolic`: an interval constraint solver over the SSA-IR branch guards (no SMT dependency) — per path it reports **SAT** (with a concrete register seed that drives execution down it), **DEAD** (contradictory guards → opaque-predicate / dead-code signal), or **PARTIAL**. Concrete concolic seeds in the default build.
- **Arm 3 — Dynamic bridge.** `trace-plan`: uses the code-property graph to choose a *selective* instrumentation scope (entry, call sites, dangerous sinks, loop heads — not every instruction) and emits a ready-to-run, ABI-aware GDB script. Drive it with `concolic` seeds; ingest the trace with `runtime-merge`.
- **Graph fusion — cross-binary name recovery.** `name-recovery` recovers a stripped binary's anonymous `sub_` names by matching them (40-dim structural fingerprint, cosine, greedy 1:1) to *named* functions in a reference binary, fusing the recovered names into the graph. Exact on same-build; honest-partial across optimization levels.
- **Decompiler correctness sweep.** Fixed multi-block **argument recovery** (args flowing across a loop/branch were emitted `void` with use-before-def; now seeded at the SSA entry from the calling convention), **struct-field deref** (`p->x`), and **2D-array index** (`m[i*cols+j]`) — all now recompile.
- **Built for the AI-agent customer.** `agent-brief` (one-page high-signal map of a codebase), `search` (relevance-ranked discovery across 610 actions), `graph-export` (Graphviz / Mermaid / **Cytoscape JSON** / interactive HTML). Plus human onboarding: `cargo binstall`, a Homebrew formula, and a [`docs/HUMAN.md`](docs/HUMAN.md) quickstart.

## What's new in v8.3 — the graph-fused decompiler

v8.3 (through `8.3.5`) turns codemap's binary side into a real **decompiler**: lift → SSA → DCE/copy-prop → type & variable recovery → SAILR structuring → readable, recompilable C. It went from "finds 1 function in a stripped PE" to:

- **Full binary coverage.** PE (x86/x64), ELF (x86/x64/ARM/AArch64), and **Mach-O x86-64** — function discovery via PE `.pdata` `RUNTIME_FUNCTION`, ELF symbols/`.eh_frame`, and Mach-O `LC_FUNCTION_STARTS`.
- **Readable C reconstruction.** Recovered **structs** (`p->field` with synthesized typedefs), **arrays** (`a[i]`), **string literals** (`return "hello world"`), **float/XMM ABI** params & returns (SysV + Win64), **C++ virtual calls** (`obj->vfunc_0()`), and clean control flow on `-O2` (no goto-soup).
- **C++ exception recovery.** Idiomatic `try { … } catch (int e) { … }` reconstructed from a stripped binary's `.eh_frame` + `.gcc_except_table` — **including the caught type**, demangled from the LSDA type table. Most decompilers drop the handler entirely or render it as goto-soup.
- **Correctness, not just readability.** Fixed real silent mis-decompilations — array-index liveness (loops returned `a[0]·n`), dropped `movzbl` masks (`x & 0xff` → `x`) — caught and fixed via a re-execution gate.
- **Behaviorally verified.** Every change is gated on a **79-binary recompilability corpus** + a **G10 re-execution harness** (decompile → recompile → run → diff): recovered code is behavior-identical on the scalar subset, not just plausible-looking.
- **Graph-fused.** Decompiled functions feed codemap's heterogeneous code-property graph, so its dataflow / taint / call-graph / centrality analyses run on **stripped binaries**, not just source.

## What's new in v8

v8 cuts the v7 series at `7.184.0` (2026-05-18) and turns over to `8.0.0` (2026-05-20). Headline themes:

- **Action registry complete (T1).** Every action self-registers via `inventory::submit!`; `actions/mod.rs` has zero dispatch arms (catch-all `_ => Err(UnknownAction)` only). Adding a new action is a single submit-block edit in the owning module file.
- **iced-x86 linear-sweep precision (T3).** All `bin_text_*` density actions disassemble via iced-x86 instead of raw byte-scans — eliminates instruction-boundary false positives.
- **Lint zero (T8).** `#![deny(warnings)]` locked into `codemap-core` and `codemap-cli`; `cargo clippy -- -D warnings` ships at 0 warnings.
- **arXiv research: filter scaffolds, ship real work (T9).** `pointer-analysis` (Andersen field-sensitive PA + Tarjan SCC) and `cegio` (rsmt2-driven SMT) shipped with real implementations. `bin-taint` shipped Phase A (CFG, intra/inter-procedural taint, PLT-resolved source/sink, pathfinding, stripped-binary fallback). **16 items removed in v8.2.0 cleanup:** 13 skeleton scaffolds (`symex-concolic`, `loop-polyhedral`, `detect-memory-corruption`, `neural-decompile`, `side-channel-detect`, `symex-speculative`, `gpu-analyze`, `semantic-slice`, `synthesize`, `abstract-interp`, `bin-search`, `patch-binary`, `natural-query`) + 3 failed experiments (`meta-path-ppr` proof +0.0000 lift, `rfmoe` 3/8 FAIL, `ising-landscape` proof pending) — all 59–145 LOC with no proof reports or integration tests.
- **16 Phase F actions multi-corpus replicated:** `transfer-entropy`, `hebbian-coupling`, `kl-drift`, `network-motifs`, `code-entropy`, `criticality-soc`, `fatigue-crack`, `bio-physarum`, `preferential-attachment`, `small-world`, `phase-transitions`, `lyapunov-tracker`, `universality-class`, `lattice-evidence`, `control-theory-pid-ci-cd`, `codemap-mcp`.

**610 actions** registered (full index in `docs/ACTION_CATALOG.md`; generated from the registry by `gen-action-docs` and gated by `tests/single_source_of_truth.rs`). **235** `bin-*` parsers, **18** source-language tree-sitter parsers, **338/338** lib tests, **90/90** CI verdict-gate baseline, **0** clippy warnings.

---

## When to reach for codemap

| Problem | Codemap action | Why codemap (vs alternatives) |
|---|---|---|
| "What does this codebase do?" | `summary --dir ` | Cross-file structural overview in one call. Beats reading files. |
| "Find unused functions / dead code" | `dead-functions --dir ` | Call-graph reachability across modules. grep can't do this. |
| "Who calls function X?" | `callers --dir X` | True call graph (AST-aware), not a string match. |
| "What does function X depend on (transitively)?" | `trace --dir X` | Walks the dep graph. grep would only find direct refs. |
| "What changed between two commits?" | `diff --dir ` | Semantic diff, not line diff. |
| "Find security issues" | `audit --dir ` | Composite of taint + secret-scan + dep-tree + dead-deps. |
| "Where would a tainted input flow?" | `taint --dir --source --sink ` | Path-sensitive, sanitizer-aware, alias-aware, cross-procedural. |
| "Reverse-engineer a binary" | `bin-info ` | PE/ELF/Mach-O parser. capa + YARA + signsrch + PEiD rules built in. |
| "Find cross-language coupling" | `cross-lang --dir ` | Imports/calls that cross language boundaries. |
## When NOT to reach for codemap

- **Editing files**: codemap is read-only. Use Edit/Write directly.
- **Running code**: codemap doesn't compile or exec. Use bash.
- **Live process state**: codemap is static. Use `ps`, `lsof`, `ss`.
- **Single-file grep**: if you know the file, `grep` is faster.
- **String search across few files**: if N<5 files, just `grep`.

---
## Install

### From release (recommended)

Download the tarball for your platform and extract the binary:

```bash
# Linux x86_64
curl -fsSL https://github.com/charleschenai/codemap/releases/latest/download/codemap-v8.2.0-x86_64-linux.tar.gz -o codemap.tar.gz
tar xzf codemap.tar.gz -C ~/.local/bin/
chmod +x ~/.local/bin/codemap

# Linux aarch64
curl -fsSL https://github.com/charleschenai/codemap/releases/latest/download/codemap-v8.2.0-aarch64-linux.tar.gz -o codemap.tar.gz
tar xzf codemap.tar.gz -C ~/.local/bin/
chmod +x ~/.local/bin/codemap

# macOS (add to PATH if needed)
export PATH="$HOME/.local/bin:$PATH"
```

Add `$HOME/.local/bin` to your `PATH` in `~/.bashrc` or `~/.zshrc`:

```bash
export PATH="$HOME/.local/bin:$PATH"
```

For system-wide install (`/usr/local/bin/codemap`):

```bash
sudo cp codemap /usr/local/bin/
sudo chmod +x /usr/local/bin/codemap
```

### From source

```bash
git clone https://github.com/charleschenai/codemap && cd codemap
cargo build --release -p codemap-cli
cp target/release/codemap ~/.local/bin/codemap
chmod +x ~/.local/bin/codemap
```

## Verify

```
codemap --version-detail
```

Prints:
```
codemap 8.2.0
git:
built:
host: /
```

If the binary is older than expected, re-run install with `--update`.

---

## How to call any action

Universal shape:
```
codemap [TARGET...] --dir [--json] [--quiet] [other-flags]
```

| Flag | Purpose |
|---|---|
| `--dir ` | **Required.** Repo/dir to scan. Repeatable for multi-repo. |
| `--json` | Output JSON (parseable). Default is text (human-readable). |
| `--quiet` | Suppress scan/cache status messages on stderr. |
| `--no-cache` | Force re-scan, ignore `.codemap/cache.bincode`. |
| `--include-path ` | C/C++ include search path. |
| `--watch [SECS]` | Re-run every N seconds. |

For agents: **always use `--json` and `--quiet`** unless you specifically want text output.

## Discover actions

```
codemap --help # full action list
codemap --help # action-specific flags
```

---

## Action categories

610 actions (a curated subset advertised in `--help`, 235 fine-grained `bin-*` parsers, plus the rest) grouped by purpose. Full catalog at [`docs/ACTION_CATALOG.md`](docs/ACTION_CATALOG.md). High-level groups:

| Category | Action count | Examples |
|---|---|---|
| **Analysis** | ~20 | `summary`, `stats`, `trace`, `callers`, `hotspots`, `layers`, `health`, `decorators` |
| **Code intelligence** | ~30 | `complexity`, `import-cost`, `churn`, `api-diff`, `clones`, `entry-points`, `dead-functions` |
| **Dataflow / security** | ~16 | `data-flow`, `taint`, `bin-taint`, `slice`, `trace-value`, `sinks`, `secret-scan`, `audit`, `dep-tree` |
| **Graph theory** | ~40 | `pagerank`, `hubs`, `bridges`, `centrality` (17 measures), `community` (Leiden), `bellman-ford` |
| **Binary / RE** | ~235 | `elf-info`, `pe-imports`, `macho-info`, `bin-anti-debug`, `bin-disasm`, `bin-strings`, `bin-relocs` |
| **Schemas** | ~10 | `proto-schema`, `openapi-schema`, `graphql-schema`, `sql-extract`, `dbf-schema` |
| **Supply chain** | ~10 | `osv-scan`, `sbom-diff`, `license-check`, `cve-scan` |
| **Config-as-code** | ~10 | `k8s-scan`, `iac-scan`, `dockerfile-scan`, `ci-scan`, `oci-scan` |
| **ML / AI** | ~10 | `gguf-info`, `safetensors-info`, `onnx-info`, `cuda-info`, `pyc-info` |
| **LSP bridge** | ~5 | `lsp-symbols`, `lsp-references`, `lsp-calls`, `lsp-diagnostics`, `lsp-types` |
| **Web** | ~5 | `web-sitemap`, `js-api-extract` (HAR/HTML input required) |
| **Cross-language** | ~5 | `lang-bridges`, `gpu-functions`, `monkey-patches` |
| **Composite** | ~10 | `audit`, `compare`, `validate`, `changeset`, `handoff`, `pipeline` |
| **arXiv-derived** | 2 | `pointer-analysis` (Andersen PA), `cegio` (SMT optimizer) |

---

## Output schema

All `--json` outputs follow:
```
{
"ok": ,
"action": "",
"dir": "",
"result": ,
"stats": { "files_scanned": N, "duration_ms": M, "cache_hits": K }
}
```

`result` shape varies per action. Action-specific schemas in [`docs/SCHEMAS.md`](docs/SCHEMAS.md).

## Exit codes

| Code | Meaning | Agent response |
|---|---|---|
| 0 | Success | Parse `--json` output |
| 1 | Usage error (bad flag, missing --dir) | Re-read `--help`, fix args, retry |
| 2 | I/O error (path not found, no read perm) | Verify path, retry |
| 101 | Panic | **Do not retry.** File a bug at https://github.com/charleschenai/codemap/issues |

Other non-zero codes: action-specific. See ` --help`.

## AI agent usage guide

codemap is designed for AI agents as its primary customer. Below is the canonical walkthrough for integrating codemap into agent workflows.

### Why use codemap instead of grep/read?

| Scenario | grep / raw edits | codemap |
|---|---|---|
| "What does this codebase do?" | Read every file sequentially | `summary` — structural overview in one call |
| "Find dead / unused code" | Manual reachability tracing | `dead-functions` — true call-graph reachability |
| "Who calls function X?" | String match across files | `callers` — AST-aware call graph |
| "What does function X depend on?" | Direct import grep | `trace` — transitive dep graph walk |
| "What changed between two commits?" | Line-level diff | `diff` — semantic diff (AST-aware) |
| "Find security issues" | YARA / pattern match | `audit` — composite: taint + secret-scan + dep-tree + dead-deps |
| "Where does tainted input flow?" | No tool | `taint` — path-sensitive, sanitizer-aware, cross-procedural |
| "Analyze a compiled binary" | `strings` + `hexdump` + manual | `bin-info` + `bin-taint` — PE/ELF/Mach-O parsers + taint analysis |
| "Graph metrics on code" | Custom scripts | 500+ built-in actions (graph theory, entropy, ML, physics-inspired) |

codemap is **read-only**, **no network**, **no servers**, **no databases**, **no API keys**. It scans your local filesystem, builds ASTs + CFGs + graphs in memory, and returns structured JSON output.

### Canonical call pattern

Every action follows this pattern:

```bash
codemap [TARGET] --dir --json --quiet [OPTIONS]
```

| Flag | Purpose |
|---|---|
| `--json` | JSON output (machine-readable) |
| `--quiet` | Suppress progress bars and logs |
| `--dir` | Directory to analyze (required) |

**Output schema** (for actions that return results):

```json
{
"ok": true,
"result": { ... },
"metrics": {
"time_ms": 42,
"files_scanned": 1501,
"edges": 100219
}
}
```

On failure:

```json
{
"ok": false,
"error": "error message"
}
```

Exit codes:
- `0` — success
- `1` — error (check `--json` output for details)

### Worked examples

**Example 1: "What does this repo do?"**

```bash
codemap summary --json --quiet --dir ./project
# → Cross-file structural overview: top-level modules, key dependencies, entry points
```

**Example 2: "Find unused functions"**

```bash
codemap dead-functions --json --quiet --dir ./project
# → Functions with zero callers across the module graph. Includes call-chain depth.
```

**Example 3: "Security audit"**

```bash
codemap audit --json --quiet --dir ./project
# → Composite: taint analysis + secret detection + dependency tree + dead deps
# Returns findings ranked by confidence with source→sink paths
```

**Example 4: "Taint analysis — find injection paths"**

```bash
codemap taint --json --quiet --dir ./project --source read --sink system
# → Path-sensitive taint from `read` to `system` with confidence scoring
# Reports ranked source→sink paths with alias resolution
```

**Example 5: "Binary analysis — what is this executable?"**

```bash
codemap bin-info --json --quiet ./target/release/my-binary
# → PE/ELF/Mach-O parser: sections, imports, exports, symbols,
# capa-rules detection, YARA signatures, anti-debug indicators
```

### MCP: the recommended adoption path

For agents that use MCP-compatible clients (Claude Code, Cursor, Windsurf), add codemap as an MCP tool server. All 610 actions become available as MCP tools with proper input schemas:

```json
// ~/.claude/settings.json
{
"mcpServers": {
"codemap": {
"command": "python3",
"args": ["/path/to/codemap/docs/codemap-mcp-server.py"]
}
}
}
```

This is the recommended path because:
1. **No CLI parsing needed** — tools have structured input schemas
2. **Self-documenting** — `tools/list` returns every action name, description, and schema
3. **Executable via JSON-RPC** — `tools/call` with `{name, arguments}` dispatches any action
4. **Zero config for AI** — the agent discovers capabilities automatically

Set `CODEMAP_BIN` if your codemap binary is not on PATH:

```bash
export CODEMAP_BIN=~/.local/bin/codemap
```

### Environment variables

| Variable | Purpose | Default |
|---|---|---|
| `CODEMAP_BIN` | Path to codemap binary | `codemap` (from PATH) |
| `CODEMAP_CACHE` | Custom cache directory | `.codemap/cache.bincode` (next to scanned dir) |

### Error handling

Always check `--json` output for error details:

```bash
result=$(codemap --json --quiet --dir ./project)
if echo "$result" | python3 -c "import sys,json; d=json.load(sys.stdin); sys.exit(0 if d['ok'] else 1)"; then
echo "Success: $(echo "$result" | python3 -c "import sys,json; print(json.load(sys.stdin)['result'])")"
else
echo "Error: $(echo "$result" | python3 -c "import sys,json; print(json.load(sys.stdin)['error'])")"
fi
```

### Performance notes

- Cold cache: sub-second on repos up to 3K files
- Warm cache: near-instant (reads `.codemap/cache.bincode`)
- Large repos (10K+ files): 5-30 seconds for full analysis
- All analysis is in-memory. No disk writes except the cache file.
- No network calls during analysis.

---
## Recipes — when the agent has a specific job to do

Each recipe: **what the action does** → **command** → **sample output** → **when to use it**.

For the complete flat list of action names see [`docs/ACTION_CATALOG.md`](docs/ACTION_CATALOG.md).

---

### Codebase understanding (first-look on an unknown repo)

#### `summary` — one-page structural overview
Reports file count, languages, entry points, top modules, dispatch density. Single-call onboarding.
```
$ codemap summary --dir ./my-repo --json --quiet
{"ok":true,"result":{"files":2824,"languages":["rust","python","typescript"],
"entry_points":["src/main.rs","src/lib.rs"],"top_modules":["analysis","insights","cpg"]}}
```
**Use when:** new repo, "tell me what this does" before diving deeper.

#### `stats` — quantitative metrics
Per-language LOC + file counts, function/class density, fan-in/fan-out distribution.
```
$ codemap stats --dir ./my-repo --json --quiet
{"ok":true,"result":{"rust":{"files":341,"loc":89432,"fns":2104},"python":{"files":52,"loc":4108}}}
```
**Use when:** comparing repos by size, reporting metrics, sanity-checking parse coverage.

#### `layers` — architectural layer detection
Infers boundaries (web / service / data / infra) from import patterns + naming conventions.
```
$ codemap layers --dir ./my-repo --json --quiet
{"ok":true,"result":{"layers":[{"name":"web","modules":["routes","handlers"]},
{"name":"data","modules":["models","repo"]}],"violations":[...]}}
```
**Use when:** validating that "web shouldn't import from data" type architectural rules hold.

#### `hotspots` — files with most churn × complexity
Surfaces "danger zone" code (high git churn + high cyclomatic complexity).
```
$ codemap hotspots --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"hotspots":[{"file":"src/parser.rs","churn":48,"complexity":92,"score":4416}]}}
```
**Use when:** prioritizing refactor work, finding "where bugs live."

#### `entry-points` — public API surface
Lists exported functions/classes that other code can call from outside.
```
$ codemap entry-points --dir ./my-repo --json --quiet
{"ok":true,"result":{"entries":[{"name":"create_user","file":"api/users.rs","kind":"public_fn"}]}}
```
**Use when:** API documentation, understanding what's a stable contract.

#### `health` — overall quality summary
Composite: dead code % + clippy/lint count + circular deps + missing tests. Single "is this repo healthy?" score.
```
$ codemap health --dir ./my-repo --json --quiet
{"ok":true,"result":{"score":78,"dead_code_pct":3.2,"circular_deps":2,"missing_tests":["api/users.rs::delete"]}}
```
**Use when:** quick "should we touch this codebase or not" gut-check.

---

### Code quality & cleanup

#### `dead-functions` — unreachable code
Functions never called by any other function in the workspace.
```
$ codemap dead-functions --dir ./my-repo --json --quiet
{"ok":true,"result":{"dead":[{"file":"src/old.rs","function":"legacy_helper","line":42}]}}
```
**Use when:** cleanup PR, removing tech debt. **Don't use for:** identifying entry points (they're "dead" by call-graph but intentionally public).

#### `dead-files` — files imported nowhere
Files no other file imports / uses.
```
$ codemap dead-files --dir ./my-repo --json --quiet
{"ok":true,"result":{"dead_files":["src/experimental/old_impl.rs","tools/debug.py"]}}
```
**Use when:** dead-import cleanup.

#### `dead-deps` — declared deps never imported
Packages in `Cargo.toml`/`package.json`/`pyproject.toml` that no source file imports.
```
$ codemap dead-deps --dir ./my-repo --json --quiet
{"ok":true,"result":{"dead":["serde_json (Cargo.toml)","lodash (package.json)"]}}
```
**Use when:** dep cleanup, reducing build time + attack surface.

#### `complexity` — cyclomatic complexity per function
McCabe complexity (branches+1). Catches "this function should be split."
```
$ codemap complexity --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"top":[{"fn":"parse_expression","file":"parser.rs","cyclomatic":34,"lines":280}]}}
```
**Use when:** finding refactor candidates, code review automation.

#### `churn` — git change frequency per file
Commits-touching-file count over a window.
```
$ codemap churn --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"top":[{"file":"src/parser.rs","commits":78,"authors":12}]}}
```
**Use when:** combined with complexity for hotspots, ownership analysis.

#### `clones` — duplicated code blocks
Detects near-identical token sequences across files (copy-paste detection).
```
$ codemap clones --dir ./my-repo --json --quiet --min-tokens 50
{"ok":true,"result":{"clones":[{"size":120,"locations":[["a.rs:14","b.rs:22"]],"similarity":0.94}]}}
```
**Use when:** finding extraction candidates for shared functions.

#### `circular` — circular import detection
Reports module cycles (a → b → c → a).
```
$ codemap circular --dir ./my-repo --json --quiet
{"ok":true,"result":{"cycles":[["src/a.rs","src/b.rs","src/a.rs"]]}}
```
**Use when:** untangling architecture before a refactor.

---

### Impact tracing & change analysis

#### `trace` — transitive callees (what does X depend on?)
Walks the call graph forward from a function/symbol, returns full dep tree.
```
$ codemap trace --dir ./my-repo --json --quiet RecalcInvoiceTotals
{"ok":true,"result":{"node":"RecalcInvoiceTotals","calls":[
{"name":"ship_chg_sum","file":"backend/invoices.go:120","depth":1},
{"name":"format_money","file":"util/money.go:8","depth":2}]}}
```
**Use when:** impact analysis before changing a function, generating context for an LLM.

#### `callers` — transitive callers (who calls X?)
Reverse of `trace`. Returns the function's call sites + their callers.
```
$ codemap callers --dir ./my-repo --json --quiet validate_user
{"ok":true,"result":{"callers":[{"caller":"login","file":"auth.py:88","depth":1}]}}
```
**Use when:** "if I change this signature, what breaks?"

#### `blast-radius` — affected entities from a change
Combines callers + dataflow + tests touched. Most pessimistic estimate.
```
$ codemap blast-radius --dir ./my-repo --json --quiet --target User.id
{"ok":true,"result":{"functions":42,"tests":7,"endpoints":3,"db_columns":2}}
```
**Use when:** "what's the size of changing this thing?"

#### `diff` — semantic diff between two refs
Function-level diff: added, removed, signature-changed, body-changed.
```
$ codemap diff --dir ./my-repo --json --quiet HEAD~5 HEAD
{"ok":true,"result":{"added":["validate_email"],"removed":["old_validator"],
"signature_changed":[{"fn":"create","before":"(name)","after":"(name,email)"}]}}
```
**Use when:** generating PR descriptions, understanding code review scope.

#### `api-diff` — breaking-change classifier
Like `diff` but specifically flags BREAKING vs additive changes to public API.
```
$ codemap api-diff --dir ./my-repo --json --quiet HEAD~5 HEAD
{"ok":true,"result":{"breaking":[
{"kind":"removed","fn":"OldAPI::v1_login"},
{"kind":"signature_change","fn":"create_user","before":"(name)","after":"(name,email)"}]}}
```
**Use when:** versioning decisions (semver minor vs major), CHANGELOG generation.

#### `diff-impact` — functions affected by a commit range
Maps the diff to every transitively-affected caller.
```
$ codemap diff-impact --dir ./my-repo --json --quiet HEAD~5 HEAD
{"ok":true,"result":{"impacted_fns":127,"impacted_files":34,"high_risk":["payment::charge"]}}
```
**Use when:** deciding test scope for a PR.

#### `churn-vs-complexity` (via `hotspots`) — see Codebase understanding above

---

### Data flow & security

#### `audit` — composite security report
Runs taint + secret-scan + dead-deps + dep-tree + license-check in one pass.
```
$ codemap audit --dir ./my-repo --json --quiet
{"ok":true,"result":{"findings":[
{"kind":"secret","file":".env.sample","line":3,"pattern":"AWS_KEY"},
{"kind":"taint","source":"req.body","sink":"db.execute","path":[...]},
{"kind":"dep-vuln","package":"lodash","version":"4.17.20","cve":"CVE-2021-23337"}]}}
```
**Use when:** first-pass security review of an unfamiliar repo.

#### `taint` — path-sensitive taint flow
Tracks tainted values from source(s) to sink(s). Sanitizer-aware, alias-aware (e.g. `safe = sanitize(x)`), cross-procedural (parses wrapper bodies to detect hidden sanitizers).
```
$ codemap taint --dir ./my-repo --json --quiet --source 'req.query' --sink 'db.execute'
{"ok":true,"result":{"paths":[{"source":"req.query.id","sink":"db.execute(sql)",
"hops":["params.id","userId","query"],"sanitized":false}]}}
```
**Use when:** SQLi/XSS/SSRF detection, "is user input reaching this sink?"

#### `slice` — backward program slice
Given a target variable/sink, return only the code that influences it.
```
$ codemap slice --dir ./my-repo --json --quiet --var 'password' --file auth.py
{"ok":true,"result":{"slice_lines":[12,15,22,30,42],"file":"auth.py"}}
```
**Use when:** narrowing what to read when chasing a bug.

#### `sinks` — list all dangerous sinks
Enumerates every `db.execute`, `eval`, `exec`, `Runtime.exec`, `subprocess.shell=True`, `innerHTML=`, etc.
```
$ codemap sinks --dir ./my-repo --json --quiet
{"ok":true,"result":{"sinks":[{"kind":"sql","file":"api/users.rs","line":88,"expr":"db.execute(query)"}]}}
```
**Use when:** building taint queries, audit checklist generation.

#### `secret-scan` — credentials in source
20+ patterns (AWS key, GitHub PAT, Slack token, Stripe live key, private keys, JWT, DB conn strings, etc.). Redacted output.
```
$ codemap secret-scan --dir ./my-repo --json --quiet
{"ok":true,"result":{"findings":[{"file":".env.sample","line":3,"kind":"aws_access_key","masked":"AKIA****REDACTED"}]}}
```
**Use when:** pre-commit hook, pre-publish audit.

#### `data-flow` — value origin tracing
Where does this variable's value come from? (def-use chain)
```
$ codemap data-flow --dir ./my-repo --json --quiet --target 'user_id'
{"ok":true,"result":{"origins":[{"file":"auth.py:88","expr":"req.cookies['session']"}]}}
```
**Use when:** "where does this magic value come from?"

#### `api-surface` — every exported HTTP endpoint
Detects Flask/Express/Axum/FastAPI/Spring/Rocket route handlers. Lists path + method + handler.
```
$ codemap api-surface --dir ./my-repo --json --quiet
{"ok":true,"result":{"endpoints":[{"method":"POST","path":"/users","handler":"create_user","auth_required":false}]}}
```
**Use when:** generating OpenAPI from existing code, finding unauthenticated endpoints.

---

### Graph algorithms (heterogeneous-graph queries)

These run on codemap's internal call graph + import graph + AST graph.

#### `pagerank` — most-important nodes
NetworkX-style PageRank. High score = central + many incoming refs.
```
$ codemap pagerank --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"ranked":[{"fn":"handle_request","score":0.082}]}}
```
**Use when:** finding "load-bearing" functions, prioritizing code review.

#### `hubs` — high-out-degree nodes
Functions/modules that depend on many others. Different from PageRank (which is about incoming).
```
$ codemap hubs --dir ./my-repo --json --quiet
{"ok":true,"result":{"hubs":[{"fn":"orchestrator","out_degree":47}]}}
```
**Use when:** finding god-objects, refactor targets.

#### `bridges` — single-edge cut points
Edges whose removal disconnects the graph. These are critical paths.
```
$ codemap bridges --dir ./my-repo --json --quiet
{"ok":true,"result":{"bridges":[{"from":"auth","to":"db","modules":["auth.rs","db.rs"]}]}}
```
**Use when:** identifying single points of failure in module coupling.

#### `centrality` (17 measures) — broker / connector detection
Run with a specific measure: `betweenness`, `eigenvector`, `katz`, `closeness`, `harmonic`, `load`, `structural-holes` (brokers), `voterank`, etc. All NetworkX standards.
```
$ codemap betweenness --dir ./my-repo --json --quiet --top 5
{"ok":true,"result":{"top":[{"node":"db_session","betweenness":0.34}]}}
```
**Use when:** finding modules that connect otherwise-separate subsystems.

#### `clusters` — community detection (Leiden default)
Partitions the graph into densely-connected sub-communities.
```
$ codemap clusters --dir ./my-repo --json --quiet leiden
{"ok":true,"result":{"clusters":[{"id":0,"size":34,"members":["auth.rs","users.rs"]}]}}
```
**Use when:** discovering implicit module boundaries.

#### `paths` — shortest path between two nodes
Returns the chain of imports/calls connecting source → target.
```
$ codemap paths --dir ./my-repo --json --quiet user_input db_write
{"ok":true,"result":{"path":["user_input","sanitize","query_builder","db_write"],"length":4}}
```
**Use when:** "how does X reach Y?"

#### `subgraph` — extract a focused subgraph
Returns nodes within N hops of a target. Useful before deep analysis.
```
$ codemap subgraph --dir ./my-repo --json --quiet --target login --depth 2
{"ok":true,"result":{"nodes":[...],"edges":[...]}}
```
**Use when:** narrowing scope before more expensive analysis.

#### `bellman-ford ` / `astar ` / `floyd-warshall` / etc.
Classical shortest-path algorithms exposed for graph queries. See ACTION_CATALOG.md for full list.

---

### Binary analysis & reverse engineering

#### Decompiler (`ir` / `decompile`) — full lift → SSA → simplify → type-recovery → variable-recovery → calling-convention → SAILR structuring → C++ RTTI → readable-C emit pipeline

**This is a real decompiler.** 14-stage pipeline that reconstructs expressions, variables, types, and `if` / `while` / `switch` syntax (incl. jump tables / computed branches / string-literal returns) from compiled binaries. Full G10 fidelity (10/10) + 79/79 protected-bin decomp test pass (bugbins-verify + reexec_harness) with switch_dispatch special-case recovery (const char* + "zero".."seven"/"unknown" map, a1 scrutinee, correct default VA); see CHANGELOG + docs/COMMIT_LEDGER.md for G10 fixes + Job 3 consolidation + GAP3-6 (F-4 -O2 dangling-goto/continue, C++ vcall via rtti, XMM/float ABI + libc-extern recomp fix, Mach-O x86-64 thin+FAT) + GAP9 (no more rsp/rbp/rbx/r12-r15 "lifter gap =0" noise decls in every fn; frame uses elided to 0) + GAP8 (struct field recovery: ptr->field_0xN with synthesized typedefs for recompile) + GAP7 Part A (array element type from access width: int32_t* for 4-byte loads). Emitted C is gcc-recompilable (current 79/79 state supersedes earlier ~48/60 notes). Cross-binary type propagation + RTTI + stack slots + confidence scores. Mach-O x86-64 support (function discovery via LC_FUNCTION_STARTS + symtab + sections; feeds iced-x86/IR).

**Known limitation (gap 11, deferred)**: Array indexing inside loops can decompile with an incorrect (use-before-def) index (e.g. ghost reg instead of loop counter v), producing behaviorally-wrong recompiled output (sum may return a[0]*n instead of 10); element type is correct. Root: copy-prop drops the index register's def on register reuse inside the loop. Tracked as gap 11.

Remaining gaps documented in DECOMPILER.md. (New direction: user-driven decompiler quality per Ghidra issues etc.)

```bash
# Decompile a single function (full pipeline)
codemap ir [ | ]

# Decompile entry point
codemap ir

# Batch call-tree walk with structural hints
codemap decompile [max-depth=N] [max-children=N] [deep]
```

**Pipeline stages:**
1. **Lift** — iced-x86 decode → IRCFG (three-address IR with explicit BitWidth)
2. **SSA construction** — Cytron et al. (1991): iterated-dominance-frontier phi placement + pre-order DFS renaming
3. **Simplify** — 42 peephole rules (Miasm / angr reference-FIRST): constant folding, identity elimination, SSA-aware simplification, signed-div-by-power-of-2, ROL/ROR detection, byte-swap, etc.
4. **Calling-convention recovery** — SysV AMD64 ABI: populate Call.args from rdi/rsi/rdx/rcx/r8/r9
5. **Dead-code elimination** — backward dataflow liveness (~80% flag computations pruned)
6. **Copy/constant propagation** — 4 alternating iterations of copy-prop + simplify + DCE
7. **Dead-block removal** — reachability from entry; prunes linker padding
8. **Block coalescing** — merge linear Goto-chains
9. **SAILR structuring** — CFG + IRCFG → C-shaped AST (Sequence / IfThen / IfThenElse / While / For / Switch / Call / Goto)
10. **Variable recovery** — classifies variables: Register, Stack, Memory, Temporary, Constant
11. **Type inference** — Phase 2 seeded from widths + Mem-loads/Stores; iterated-meet solver infers Int / Pointer / struct types
12. **Stack-slot analysis** — rsp-relative offsets for `*(rsp_N)` → `stack[]`
13. **C++ RTTI analysis** — vtable references → class declarations (base classes, virtual methods, fields)
14. **C emission** — structured AST → readable C source with type annotations, stack-slot names, symbol resolution

**Differentiators:**
- **Cross-binary type/name propagation** — types from one binary's RTTI flow into another's
- **Graph-as-validator** — heterogeneous code graph cross-checks decompilation output
- **Recompilable-C target** — structured, typed, symbol-resolved C suitable for recompilation

**Example output:**
```text
=== codemap ir ===
Binary: ./target/release/codemap
Format: ELF64 (64-bit, arch=x64)
Function: main @ 0x401000 (234 bytes, 78 insns)
CFG blocks: 12
CFG edges: 18 (pre-enrich) → 18 (post-enrich)
Jump tables: 0 resolved indirect-JMPs
SSA phis: 3 inserted
Variables: 45 total (12 reg, 20 stack[-0x10..+0x18], 10 mem, 3 const, 0 tmp)
Types: 30 bound (15 int, 10 ptr, 3 top, 2 bot, 0 other)
CC args: 5 call sites populated (SysV AMD64)
DCE removed: 62 dead stmts (pre-prop) + 8 (post-prop)
Copy-prop: 15 stmts inlined
Dead blocks: 2 removed (unreachable)
Coalesced: 4 blocks merged

--- structured AST ---
Sequence {
Let { rbp_0 = rbp }
Let { rsp_0 = (rsp - 0x10) }
IfThen {
Cond: (rax_0 == 0)
Then: Sequence { Call { printf("usage\n") } }
}
While {
Cond: (argc_0 > 0)
Body: Sequence { ... }
}
Ret { rax_0 }
}

--- C-shaped output ---
int main(int argc, char *argv[]) {
uint64_t rbp_0 = rbp;
uint64_t rsp_0 = (rsp - 0x10);

if (rax_0 == 0) {
printf("usage\n");
}

while (argc_0 > 0) {
// ... loop body ...
argc_0 = argc_0 - 1;
}

return rax_0;
}
```

**Use when:** binary reverse engineering, understanding compiled code, patch generation, static analysis of binaries. See [`docs/DECOMPILER.md`](docs/DECOMPILER.md) for full pipeline reference.

---

#### `bin-info` / `elf-info` / `macho-info` / `pe-info` — binary fingerprint
Format detection, arch, sections, strip state, language hints (Rust/Go/C++), anti-debug rules, packer detection.
```
$ codemap bin-info /usr/local/bin/codemap --json --quiet
{"ok":true,"result":{"format":"ELF64","arch":"aarch64","rust":true,"strip":false,
"sections":34,"anti_debug":[],"packed":false}}
```
**Use when:** triage step 1 — "what is this binary?"

#### `pe-imports` / `pe-exports` — Windows PE import/export tables
Lists every DLL imported + every function exported.
```
$ codemap pe-imports ./sample.exe --json --quiet
{"ok":true,"result":{"imports":[{"dll":"kernel32.dll","functions":["VirtualAlloc","CreateProcessA"]}]}}
```
**Use when:** static behavioral profiling — what APIs does this binary depend on?

#### `pe-strings` / `bin-strings` — string extraction
Ascii + utf16le + entropy-filtered.
```
$ codemap pe-strings ./sample.exe --json --quiet --min-len 8
{"ok":true,"result":{"strings":["http://c2.example.com","cmd.exe /c"]}}
```
**Use when:** triaging unknown binaries — strings often reveal C2 URLs, command lines, paths.

#### `binary-diff` — semantic binary diff
Functions added / removed / modified between two builds.
```
$ codemap binary-diff --json --quiet --left v1.exe --right v2.exe
{"ok":true,"result":{"added":["new_handler"],"removed":["legacy_proc"],"modified":["main"]}}
```
**Use when:** patch analysis, regression hunting in firmware.

#### `dotnet-meta` — .NET assembly metadata
PE that contains CLI/.NET — reads the metadata streams, lists types + methods.
```
$ codemap dotnet-meta ./sample.dll --json --quiet
{"ok":true,"result":{"assembly":"Sample.Dll","types":["Foo","Bar"],"methods_count":42}}
```
**Use when:** analyzing .NET malware or .NET 3rd-party libs.

#### `java-class` — JVM class file
Constant pool, method signatures, bytecode summaries.

#### `wasm-info` — WebAssembly module
Imports, exports, function table, memory layout.

---

### Schemas & config-as-code

#### `openapi-schema` / `graphql-schema` / `proto-schema` — extract API schemas
Parses spec files and reports endpoints/types/operations.
```
$ codemap openapi-schema --dir ./api --json --quiet
{"ok":true,"result":{"paths":[{"method":"GET","path":"/users","operationId":"listUsers"}]}}
```
**Use when:** generating client code, checking spec consistency.

#### `k8s-scan` — Kubernetes CIS audit (16 rules)
Checks privileged containers, hostNetwork, missing resource limits, etc.
```
$ codemap k8s-scan --dir ./k8s/ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"K8S-001","resource":"Deployment/api","severity":"high","msg":"privileged=true"}]}}
```
**Use when:** auditing manifests before apply.

#### `iac-scan` — Terraform/CloudFormation/Pulumi audit (12 rules)
```
$ codemap iac-scan --dir ./infra/ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"IAC-007","file":"main.tf","msg":"S3 bucket public-read ACL"}]}}
```

#### `dockerfile-scan` — Dockerfile audit (10 rules)
```
$ codemap dockerfile-scan --dir ./ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"DKR-002","msg":"running as root","line":18}]}}
```

#### `ci-scan` — CI/CD pipeline audit (37 rules across 6 ecosystems)
GitHub Actions, GitLab CI, Jenkinsfile, CircleCI, Azure Pipelines, Travis. Catches injection, unpinned actions, secret literals, `pull_request_target` misuse.
```
$ codemap ci-scan --dir ./.github/ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"GH-003","file":"deploy.yml","msg":"unpinned action ref"}]}}
```

#### `oci-scan` — OCI image / docker save tarball audit
Per-layer manifest, layer-resident secrets (11 patterns), licenses, file/dir/symlink counts.
```
$ codemap oci-scan --dir ./image.tar --json --quiet --mode all
{"ok":true,"result":{"layers":[...],"secrets":[...],"licenses":[...]}}
```

#### `sql-extract` — SQL DDL/DML extraction
Pulls SQL out of source code or .sql files. Schema + queries.
```
$ codemap sql-extract --dir ./my-repo --json --quiet
{"ok":true,"result":{"tables":[{"name":"users","columns":[...]}],"queries":[...]}}
```

---

### Supply chain

#### `osv-scan` — match deps against OSV.dev advisories (offline)
Semver-range-aware.
```
$ codemap osv-scan --dir ./my-repo --json --quiet
{"ok":true,"result":{"vulns":[{"package":"lodash","version":"4.17.20","cve":"CVE-2021-23337"}]}}
```

#### `sbom-diff` — CycloneDX/SPDX diff
Added, removed, upgraded, downgraded packages between two SBOMs.
```
$ codemap sbom-diff --left ./sbom-1.spdx.json --right ./sbom-2.spdx.json --json --quiet
{"ok":true,"result":{"added":[...],"removed":[...],"upgraded":[...]}}
```

#### `license-check` — SPDX compatibility
Per-package license + compatibility verdict.
```
$ codemap license-check --dir ./my-repo --json --quiet
{"ok":true,"result":{"deps":[{"name":"foo","license":"GPL-3.0","compatible":false}]}}
```

#### `cve-scan` — same as osv-scan but specifically against MITRE CVE corpus

---

### ML / AI model files

#### `gguf-info` — llama.cpp GGUF inspection
Architecture, layer count, head count, quant level, vocab size.
```
$ codemap gguf-info ./model.gguf --json --quiet
{"ok":true,"result":{"arch":"llama","n_layers":32,"n_heads":32,"vocab_size":32000,"quant":"Q4_K_M"}}
```
**Use when:** "what model is this file?" Pre-load sanity check.

#### `safetensors-info` — HuggingFace safetensors inspection
Tensor shapes, dtypes, total params.
```
$ codemap safetensors-info ./model.safetensors --json --quiet
{"ok":true,"result":{"tensors":291,"total_params":7240000000,"dtype":"float16"}}
```

#### `onnx-info` — ONNX model graph
Operators, inputs, outputs, opset.
```
$ codemap onnx-info ./model.onnx --json --quiet
{"ok":true,"result":{"opset":17,"ops":["Conv","Relu","MaxPool"],"inputs":[{"name":"x","shape":[1,3,224,224]}]}}
```

#### `cuda-info` — CUDA fatbin/cubin inspection
SM versions present, kernel symbols.

#### `pyc-info` — Python bytecode inspection
Magic number, marshalled code object, imports.

---

### Cross-language & web

#### `lang-bridges` — FFI/binding detection
Detects PyO3 / napi / wasm-bindgen / JNI etc. — where languages interop.
```
$ codemap lang-bridges --dir ./my-repo --json --quiet
{"ok":true,"result":{"bridges":[{"kind":"pyo3","rust_fn":"create_user","py_module":"my_lib"}]}}
```

#### `gpu-functions` — GPU kernels in source
CUDA `__global__`, OpenCL kernels, Metal compute kernels, ROCm/HIP.
```
$ codemap gpu-functions --dir ./my-repo --json --quiet
{"ok":true,"result":{"kernels":[{"name":"matmul_kernel","framework":"cuda","file":"kernels.cu"}]}}
```

#### `monkey-patches` — runtime mutation detection
`obj.method = new_fn`, `setattr`, `prototype` patching.

#### `dispatch-map` — generic dispatch tables
Routers, registries, plugin maps. Finds the "switch statement that controls behavior."

#### `web-sitemap` — sitemap.xml + crawled link graph

#### `js-api-extract` — extract API calls from HAR / JS source

---

### LSP bridge (requires a running language server)

#### `lsp-symbols` — workspace symbol table from LSP
Real symbol info, not AST-inferred. More accurate for typed languages.

#### `lsp-references` — every reference to a symbol (LSP-grade)

#### `lsp-calls` — call hierarchy from LSP

#### `lsp-diagnostics` — current LSP diagnostics across the workspace
```
$ codemap lsp-diagnostics --dir ./my-repo --json --quiet
{"ok":true,"result":{"diagnostics":[{"file":"src/main.rs","line":42,"severity":"error","msg":"E0308: mismatched types"}]}}
```
**Use when:** programmatic access to compiler/type-checker errors.

#### `lsp-types` — type info on hover for a position

---

### arXiv-derived research actions (advanced)

These implement specific research papers. `cegio` and `pointer-analysis` have real implementations with proof reports; `bin-taint` Phase A shipped with empirical proof (P@10 target, achieved P=1.00/R=0.80).

#### `pointer-analysis` — Andersen field-sensitive PA
Computes points-to sets (which pointers can alias which memory). Field-sensitive + flow-insensitive + Tarjan SCC pre-pass for performance.
```
$ codemap pointer-analysis --dir ./my-repo --json --quiet
{"ok":true,"result":{"scope_vars":102000,"copy_constraints":132000,
"aliases":[{"ptr":"p","may_alias":["a","b"]}]}}
```
**Use when:** understanding aliasing for refactoring (rename a field safely), upstream of taint analysis.

#### `cegio` — counterexample-guided inductive optimization
**arXiv 1704.03738**. Given taint paths, synthesizes the minimum input that triggers a vulnerability.
```
$ codemap cegio --dir ./my-repo --json --quiet --taint-result
{"ok":true,"result":{"trigger":{"input":"' OR 1=1--","reaches_sink":true}}}
```
**Use when:** turning a taint finding into a proof-of-concept exploit input.

#### `bin-taint` — binary taint analysis (Phase A)
Lifts x86-64 ELF executable sections to a taint IR, builds CFG, propagates forward may-taint dataflow from PLT-resolved sources (read/recv/fread/getenv/strcpy/memcpy) to sinks (system/popen/exec/sprintf/dlopen), reports ranked source→sink paths. Stripped-binary fallback via bounded `.text` pathfinding. Proof: precision 1.00, recall 0.80 on 8-binary corpus (4 vuln classes detected, 0 false positives on 3 safe programs).
```
$ codemap bin-taint ./vulnerable-binary --json --quiet
{"ok":true,"result":{"findings":[{"source":"getenv","sink":"system","hops":["env","cmd","system"],"confidence":0.9},{"source":"read","sink":"sprintf","hops":["buf","format","sprintf"],"confidence":0.7}]}}
```
**Use when:** binary taint analysis on stripped ELF, finding command injection / format string / exec injection paths in compiled code.

---

### Composite workflows

#### `audit` — kitchen-sink security report
See "Data flow & security" section above.

#### `validate` — sanity check (build + lint + tests + audit summary)
Single composite for "is this repo broken?"

#### `changeset` — file-grouped diff summary
```
$ codemap changeset --dir ./my-repo --json --quiet HEAD~10 HEAD
{"ok":true,"result":{"changes":{"feat":[...],"fix":[...],"refactor":[...]}}}
```

#### `handoff` — generate handoff document for a project
Distills repo state into a single MD doc (status + open issues + recent work + next-steps).

#### `pipeline` — multi-action pipeline runner
Run several actions in sequence, accumulate results.
```
$ codemap pipeline --dir ./my-repo --json --quiet --target 'audit:./,trace:main,hotspots:'
{"ok":true,"result":{"audit":{...},"trace":{...},"hotspots":{...}}}
```
**Use when:** scripted multi-step analysis.

---

## Architecture (1-paragraph)

codemap walks `--dir`, parses with tree-sitter, builds a file-level import graph and a function-level call graph, layers PE/ELF/Mach-O/WASM/Java binary parsers + x86/x64 disassembly, and exposes 610 actions through a uniform CLI registry (`inventory::submit!`). Cache: `.codemap/cache.bincode` next to the scanned dir. Pure static. No daemons, no network access at analysis time.

## Repo layout

- `codemap-core/` — parsing, graph, algorithms, actions
- `codemap-cli/` — the `codemap` binary
- `codemap-napi/` — Node.js bindings (optional)
- `docs/` — REFERENCE.md, ACTION_CATALOG.md, SCHEMAS.md, HUMAN.md
- `install.sh` — single install entry

## License

MIT. See [`LICENSE`](LICENSE).