{"id":49458082,"url":"https://github.com/charleschenai/codemap","last_synced_at":"2026-06-08T03:08:33.932Z","repository":{"id":353270531,"uuid":"1218223622","full_name":"charleschenai/codemap","owner":"charleschenai","description":"Static codebase + binary analyzer and decompiler. Decompiles stripped PE/ELF/Mach-O to readable, behaviorally-verified C — structs, arrays, strings, C++ vtables and try/catch. 524 actions, single Rust binary, zero deps.","archived":false,"fork":false,"pushed_at":"2026-06-06T02:22:11.000Z","size":19162,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-06T02:23:55.500Z","etag":null,"topics":["codebase-analysis","codemap","dependency-analysis","graph-theory","static-analysis"],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/charleschenai.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":"docs/ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-04-22T16:53:45.000Z","updated_at":"2026-06-06T02:22:13.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/charleschenai/codemap","commit_stats":null,"previous_names":["charleschenai/codemap"],"tags_count":136,"template":false,"template_full_name":null,"purl":"pkg:github/charleschenai/codemap","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/charleschenai%2Fcodemap","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/charleschenai%2Fcodemap/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/charleschenai%2Fcodemap/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/charleschenai%2Fcodemap/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/charleschenai","download_url":"https://codeload.github.com/charleschenai/codemap/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/charleschenai%2Fcodemap/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34046078,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-08T02:00:07.615Z","response_time":111,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["codebase-analysis","codemap","dependency-analysis","graph-theory","static-analysis"],"created_at":"2026-04-30T08:00:40.612Z","updated_at":"2026-06-08T03:08:33.924Z","avatar_url":"https://github.com/charleschenai.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# codemap\n\n\u003e Static codebase + binary analyzer, **decompiler, and patcher**. One binary, 610 actions, 18 source languages, PE/ELF/Mach-O/WASM decompilation to readable **recompilable** C on **x86/x64, ARM64, RISC-V, and WebAssembly**, sub-second cold-cache on 3K-file repos. **No network, no servers, no databases, no API keys.**\n\n**This README is your system prompt.** Designed for AI agents: drop the entire file into your context (or fetch `https://raw.githubusercontent.com/charleschenai/codemap/main/README.md`) and you have everything you need — what codemap is, when to use it, how to install it, how to call every category of action, output schemas, exit codes, MCP setup. No further docs required for 95% of usage. Humans: see [`docs/HUMAN.md`](docs/HUMAN.md). Everyone else, keep reading.\n\n**Mission:** Break down CODE (source + binary) so AI can replicate it.\n\n## What's new in v8.21 — the 40-topic grind complete (610 actions)\n\nTwo full 20-topic roadmaps (THIRD + FOURTH) landed. THIRD-20 (all real, gate-verified): transplant, translate, fingerprint, hot-patch, api-shim, size-opt, multi-refactor, fuzz-harness, instrument, visual-docs, vuln-discover, protocol-rec, vectorize, ml-patch, jit-resolve, self-rewrite, gpu-lift, kernel-rewrite, mobile-fuse, os-map. FOURTH-20 mediums (real): self-bench, eval-suite, lasm, worm-defense, pear-fuzz, pqc-translate, ref-decompile. FOURTH-20 deep tier ships as **honest skeletons** — they emit obligations/prompt-packs/specs/plans and each explicitly states the heavy backend (Coq/Lean verifier, LLM, CPU emulator, ZK prover, full superset engine, GPU recompiler) is NOT integrated; never a faked verified/proven claim: prove-rewrite, llm-decompile, superset-decompile, proof-patch, sys-sim, self-improve-demo, meta-evolve, zk-attest, gpu-rewrite.\n\n## What's new in v8.13 — the autonomous + verifiable engine\n\ncodemap is now an **autonomous, self-improving, verifiable** security engine. The decompiler covers **five architectures** end-to-end and the action arsenal composes into goal-driven, no-human loops.\n\n**Multi-arch decompiler COMPLETE.** `decompile`/`ir` produce readable, recompilable C for **x86, x64, ARM64 (incl NEON/SIMD), RISC-V (RV64GC incl compressed/M/A), and WebAssembly** — all through the same lift → SSA → type/var recovery → SAILR structuring → C pipeline.\n\n**The Autonomous lane (new actions):**\n- **`run`** — *agentic mode*: `codemap run goal=\u003cattack-surface|audit-crypto|modernize|harden\u003e \u003cbin\u003e` runs a deterministic, **offline, no-LLM** PLAN→ACT→OBSERVE→VERIFY→REPORT loop that composes existing actions into a DAG, threads one graph, is budget/step-capped, emits JSON, and only marks a finding *fixed* if the patch recompiles + re-validates.\n- **`learn`** — *self-improving*: records what-worked from each run into a project-brain store; `run`'s planner consults it to tune the DAG over time. The loop is closed — planning improves with usage, no code changes.\n- **`redteam`** — autonomous offensive campaign (taint → symbolic → ranked PoC bundle + report).\n- **`infer-spec ... export=acsl|lean`** — machine-checkable proof export (Frama-C ACSL + Lean/Coq), so patches are *provable*, not just plausible.\n- **`provenance`** — signed, tamper-evident manifests for patched/twinned/hardened artifacts.\n- **`pqc-migrate`** — detect quantum-vulnerable crypto → apply NIST PQC (ML-KEM/ML-DSA/SLH-DSA) → equivalence note.\n- **`deobfuscate`** — the inverse of `harden`: de-flatten CFG, crack opaque predicates, decrypt strings via symbolic + graph.\n\nPlus, across the roadmap: `binary-twin` (cleanroom fork), `xlang-graph` (cross-language call fusion), `to-rust` (C→idiomatic Rust), `replay` (record/replay + mutation), `what-if` (change-impact), `firmware`, `sbom-flow`, `crypto-audit`, `model-extract`, `game-assets`, `brain-lock`. **610 actions.**\n\n## What's new in v8.4 — multi-arch decompiler + the three strategic arms\n\nv8.4 pushes the v8.3 decompiler in two directions: **multi-architecture** (it now produces recompilable C for **ARM64/AArch64**, not just x86/x64) and the first increments of the three strategic arms that turn codemap from *read-only intelligence* into a full **understand → reason → change** platform.\n\n**v8.4.0 new actions (Phase-1+ across roadmap topics):** `project-brain` (persistent project memory + git-history what-changed), `infer-spec` (formal pre/post/invariant inference → ACSL + Rust contracts, Daikon-style templates), `c-diff` (graph-aware decompiled-C diff with call-graph change propagation), `ci` (binary CI/CD attack-surface gate), `vuln-backport` (CVE patch → older-binary backport locator). ARM64 decompilation hardened: recursion, switch recovery, emit cleanup, recursive-call returns. **610 actions.**\n\n- **ARM64 / AArch64 decompiler.** ARM64 Mach-O now disassembles (Capstone-backed `Arm64Lifter`; function sizing from `LC_FUNCTION_STARTS`) and lifts through the same IR pipeline as x86 — `codemap ir \u003carm64-bin\u003e \u003cfn\u003e` emits readable C with recovered args (AAPCS64 `x0`–`x7`), real calls (recursion is a `call`, not an `asm` comment), and frame/`sp` modeling. **`--verify` PASS on ARM64**, not just x86: both arches decompile → recompile cleanly.\n- **`ir --verify` — recompile gate, first-class.** `codemap ir \u003cbin\u003e \u003cfn\u003e --verify` writes the emitted C to a temp file and runs a host C compiler on it, reporting **PASS / FAIL** — ground truth that the decompilation is *recompilable*, not just plausible. The backbone of codemap's verify-by-running discipline.\n- **Arm 1 — Binary patching.** `bin-patch-fn`: surgical, layout-preserving **in-place function patching** (canned stubs `ret0`/`ret1`/`ret`/`nop` or raw hex), fits-gated, verified by re-disassembly. Neutralize a check (`bin-patch-fn ./app check_license ret1`) without touching any other offset / reloc / string. (The decompile → edit-C → recompile → relink loop is the next increment.)\n- **Arm 2 — Symbolic / concolic.** `concolic`: an interval constraint solver over the SSA-IR branch guards (no SMT dependency) — per path it reports **SAT** (with a concrete register seed that drives execution down it), **DEAD** (contradictory guards → opaque-predicate / dead-code signal), or **PARTIAL**. Concrete concolic seeds in the default build.\n- **Arm 3 — Dynamic bridge.** `trace-plan`: uses the code-property graph to choose a *selective* instrumentation scope (entry, call sites, dangerous sinks, loop heads — not every instruction) and emits a ready-to-run, ABI-aware GDB script. Drive it with `concolic` seeds; ingest the trace with `runtime-merge`.\n- **Graph fusion — cross-binary name recovery.** `name-recovery` recovers a stripped binary's anonymous `sub_\u003cva\u003e` names by matching them (40-dim structural fingerprint, cosine, greedy 1:1) to *named* functions in a reference binary, fusing the recovered names into the graph. Exact on same-build; honest-partial across optimization levels.\n- **Decompiler correctness sweep.** Fixed multi-block **argument recovery** (args flowing across a loop/branch were emitted `void` with use-before-def; now seeded at the SSA entry from the calling convention), **struct-field deref** (`p-\u003ex`), and **2D-array index** (`m[i*cols+j]`) — all now recompile.\n- **Built for the AI-agent customer.** `agent-brief` (one-page high-signal map of a codebase), `search` (relevance-ranked discovery across 610 actions), `graph-export` (Graphviz / Mermaid / **Cytoscape JSON** / interactive HTML). Plus human onboarding: `cargo binstall`, a Homebrew formula, and a [`docs/HUMAN.md`](docs/HUMAN.md) quickstart.\n\n## What's new in v8.3 — the graph-fused decompiler\n\nv8.3 (through `8.3.5`) turns codemap's binary side into a real **decompiler**: lift → SSA → DCE/copy-prop → type \u0026 variable recovery → SAILR structuring → readable, recompilable C. It went from \"finds 1 function in a stripped PE\" to:\n\n- **Full binary coverage.** PE (x86/x64), ELF (x86/x64/ARM/AArch64), and **Mach-O x86-64** — function discovery via PE `.pdata` `RUNTIME_FUNCTION`, ELF symbols/`.eh_frame`, and Mach-O `LC_FUNCTION_STARTS`.\n- **Readable C reconstruction.** Recovered **structs** (`p-\u003efield` with synthesized typedefs), **arrays** (`a[i]`), **string literals** (`return \"hello world\"`), **float/XMM ABI** params \u0026 returns (SysV + Win64), **C++ virtual calls** (`obj-\u003evfunc_0()`), and clean control flow on `-O2` (no goto-soup).\n- **C++ exception recovery.** Idiomatic `try { … } catch (int e) { … }` reconstructed from a stripped binary's `.eh_frame` + `.gcc_except_table` — **including the caught type**, demangled from the LSDA type table. Most decompilers drop the handler entirely or render it as goto-soup.\n- **Correctness, not just readability.** Fixed real silent mis-decompilations — array-index liveness (loops returned `a[0]·n`), dropped `movzbl` masks (`x \u0026 0xff` → `x`) — caught and fixed via a re-execution gate.\n- **Behaviorally verified.** Every change is gated on a **79-binary recompilability corpus** + a **G10 re-execution harness** (decompile → recompile → run → diff): recovered code is behavior-identical on the scalar subset, not just plausible-looking.\n- **Graph-fused.** Decompiled functions feed codemap's heterogeneous code-property graph, so its dataflow / taint / call-graph / centrality analyses run on **stripped binaries**, not just source.\n\n## What's new in v8\n\nv8 cuts the v7 series at `7.184.0` (2026-05-18) and turns over to `8.0.0` (2026-05-20). Headline themes:\n\n- **Action registry complete (T1).** Every action self-registers via `inventory::submit!`; `actions/mod.rs` has zero dispatch arms (catch-all `_ =\u003e Err(UnknownAction)` only). Adding a new action is a single submit-block edit in the owning module file.\n- **iced-x86 linear-sweep precision (T3).** All `bin_text_*` density actions disassemble via iced-x86 instead of raw byte-scans — eliminates instruction-boundary false positives.\n- **Lint zero (T8).** `#![deny(warnings)]` locked into `codemap-core` and `codemap-cli`; `cargo clippy -- -D warnings` ships at 0 warnings.\n- **arXiv research: filter scaffolds, ship real work (T9).** `pointer-analysis` (Andersen field-sensitive PA + Tarjan SCC) and `cegio` (rsmt2-driven SMT) shipped with real implementations. `bin-taint` shipped Phase A (CFG, intra/inter-procedural taint, PLT-resolved source/sink, pathfinding, stripped-binary fallback). **16 items removed in v8.2.0 cleanup:** 13 skeleton scaffolds (`symex-concolic`, `loop-polyhedral`, `detect-memory-corruption`, `neural-decompile`, `side-channel-detect`, `symex-speculative`, `gpu-analyze`, `semantic-slice`, `synthesize`, `abstract-interp`, `bin-search`, `patch-binary`, `natural-query`) + 3 failed experiments (`meta-path-ppr` proof +0.0000 lift, `rfmoe` 3/8 FAIL, `ising-landscape` proof pending) — all 59–145 LOC with no proof reports or integration tests.\n- **16 Phase F actions multi-corpus replicated:** `transfer-entropy`, `hebbian-coupling`, `kl-drift`, `network-motifs`, `code-entropy`, `criticality-soc`, `fatigue-crack`, `bio-physarum`, `preferential-attachment`, `small-world`, `phase-transitions`, `lyapunov-tracker`, `universality-class`, `lattice-evidence`, `control-theory-pid-ci-cd`, `codemap-mcp`.\n\n**610 actions** registered (full index in `docs/ACTION_CATALOG.md`; generated from the registry by `gen-action-docs` and gated by `tests/single_source_of_truth.rs`). **235** `bin-*` parsers, **18** source-language tree-sitter parsers, **338/338** lib tests, **90/90** CI verdict-gate baseline, **0** clippy warnings.\n\n\n---\n\n## When to reach for codemap\n\n| Problem | Codemap action | Why codemap (vs alternatives) |\n|---|---|---|\n| \"What does this codebase do?\" | `summary --dir \u003cpath\u003e` | Cross-file structural overview in one call. Beats reading files. |\n| \"Find unused functions / dead code\" | `dead-functions --dir \u003cpath\u003e` | Call-graph reachability across modules. grep can't do this. |\n| \"Who calls function X?\" | `callers --dir \u003cpath\u003e X` | True call graph (AST-aware), not a string match. |\n| \"What does function X depend on (transitively)?\" | `trace --dir \u003cpath\u003e X` | Walks the dep graph. grep would only find direct refs. |\n| \"What changed between two commits?\" | `diff --dir \u003cpath\u003e \u003cref1\u003e \u003cref2\u003e` | Semantic diff, not line diff. |\n| \"Find security issues\" | `audit --dir \u003cpath\u003e` | Composite of taint + secret-scan + dep-tree + dead-deps. |\n| \"Where would a tainted input flow?\" | `taint --dir \u003cpath\u003e --source \u003cfn\u003e --sink \u003cfn\u003e` | Path-sensitive, sanitizer-aware, alias-aware, cross-procedural. |\n| \"Reverse-engineer a binary\" | `bin-info \u003cpath/to/binary\u003e` | PE/ELF/Mach-O parser. capa + YARA + signsrch + PEiD rules built in. |\n| \"Find cross-language coupling\" | `cross-lang --dir \u003cpath\u003e` | Imports/calls that cross language boundaries. |\n## When NOT to reach for codemap\n\n- **Editing files**: codemap is read-only. Use Edit/Write directly.\n- **Running code**: codemap doesn't compile or exec. Use bash.\n- **Live process state**: codemap is static. Use `ps`, `lsof`, `ss`.\n- **Single-file grep**: if you know the file, `grep` is faster.\n- **String search across few files**: if N\u003c5 files, just `grep`.\n\n---\n## Install\n\n### From release (recommended)\n\nDownload the tarball for your platform and extract the binary:\n\n```bash\n# Linux x86_64\ncurl -fsSL https://github.com/charleschenai/codemap/releases/latest/download/codemap-v8.2.0-x86_64-linux.tar.gz -o codemap.tar.gz\ntar xzf codemap.tar.gz -C ~/.local/bin/\nchmod +x ~/.local/bin/codemap\n\n# Linux aarch64\ncurl -fsSL https://github.com/charleschenai/codemap/releases/latest/download/codemap-v8.2.0-aarch64-linux.tar.gz -o codemap.tar.gz\ntar xzf codemap.tar.gz -C ~/.local/bin/\nchmod +x ~/.local/bin/codemap\n\n# macOS (add to PATH if needed)\nexport PATH=\"$HOME/.local/bin:$PATH\"\n```\n\nAdd `$HOME/.local/bin` to your `PATH` in `~/.bashrc` or `~/.zshrc`:\n\n```bash\nexport PATH=\"$HOME/.local/bin:$PATH\"\n```\n\nFor system-wide install (`/usr/local/bin/codemap`):\n\n```bash\nsudo cp codemap /usr/local/bin/\nsudo chmod +x /usr/local/bin/codemap\n```\n\n### From source\n\n```bash\ngit clone https://github.com/charleschenai/codemap \u0026\u0026 cd codemap\ncargo build --release -p codemap-cli\ncp target/release/codemap ~/.local/bin/codemap\nchmod +x ~/.local/bin/codemap\n```\n\n\n## Verify\n\n```\ncodemap --version-detail\n```\n\nPrints:\n```\ncodemap 8.2.0\ngit: \u003clatest-sha\u003e\nbuilt: \u003cbuild-date\u003e\nhost: \u003chostname\u003e/\u003carch\u003e\n```\n\nIf the binary is older than expected, re-run install with `--update`.\n\n---\n\n## How to call any action\n\nUniversal shape:\n```\ncodemap \u003cACTION\u003e [TARGET...] --dir \u003cPATH\u003e [--json] [--quiet] [other-flags]\n```\n\n| Flag | Purpose |\n|---|---|\n| `--dir \u003cPATH\u003e` | **Required.** Repo/dir to scan. Repeatable for multi-repo. |\n| `--json` | Output JSON (parseable). Default is text (human-readable). |\n| `--quiet` | Suppress scan/cache status messages on stderr. |\n| `--no-cache` | Force re-scan, ignore `.codemap/cache.bincode`. |\n| `--include-path \u003cPATH\u003e` | C/C++ include search path. |\n| `--watch [SECS]` | Re-run every N seconds. |\n\nFor agents: **always use `--json` and `--quiet`** unless you specifically want text output.\n\n## Discover actions\n\n```\ncodemap --help                                       # full action list\ncodemap \u003caction\u003e --help                              # action-specific flags\n```\n\n---\n\n## Action categories\n\n610 actions (a curated subset advertised in `--help`, 235 fine-grained `bin-*` parsers, plus the rest) grouped by purpose. Full catalog at [`docs/ACTION_CATALOG.md`](docs/ACTION_CATALOG.md). High-level groups:\n\n| Category | Action count | Examples |\n|---|---|---|\n| **Analysis** | ~20 | `summary`, `stats`, `trace`, `callers`, `hotspots`, `layers`, `health`, `decorators` |\n| **Code intelligence** | ~30 | `complexity`, `import-cost`, `churn`, `api-diff`, `clones`, `entry-points`, `dead-functions` |\n| **Dataflow / security** | ~16 | `data-flow`, `taint`, `bin-taint`, `slice`, `trace-value`, `sinks`, `secret-scan`, `audit`, `dep-tree` |\n| **Graph theory** | ~40 | `pagerank`, `hubs`, `bridges`, `centrality` (17 measures), `community` (Leiden), `bellman-ford` |\n| **Binary / RE** | ~235 | `elf-info`, `pe-imports`, `macho-info`, `bin-anti-debug`, `bin-disasm`, `bin-strings`, `bin-relocs` |\n| **Schemas** | ~10 | `proto-schema`, `openapi-schema`, `graphql-schema`, `sql-extract`, `dbf-schema` |\n| **Supply chain** | ~10 | `osv-scan`, `sbom-diff`, `license-check`, `cve-scan` |\n| **Config-as-code** | ~10 | `k8s-scan`, `iac-scan`, `dockerfile-scan`, `ci-scan`, `oci-scan` |\n| **ML / AI** | ~10 | `gguf-info`, `safetensors-info`, `onnx-info`, `cuda-info`, `pyc-info` |\n| **LSP bridge** | ~5 | `lsp-symbols`, `lsp-references`, `lsp-calls`, `lsp-diagnostics`, `lsp-types` |\n| **Web** | ~5 | `web-sitemap`, `js-api-extract` (HAR/HTML input required) |\n| **Cross-language** | ~5 | `lang-bridges`, `gpu-functions`, `monkey-patches` |\n| **Composite** | ~10 | `audit`, `compare`, `validate`, `changeset`, `handoff`, `pipeline` |\n| **arXiv-derived** | 2 | `pointer-analysis` (Andersen PA), `cegio` (SMT optimizer) |\n\n---\n\n## Output schema\n\nAll `--json` outputs follow:\n```\n{\n  \"ok\": \u003cboolean\u003e,\n  \"action\": \"\u003caction-name\u003e\",\n  \"dir\": \"\u003cscanned-path\u003e\",\n  \"result\": \u003caction-specific\u003e,\n  \"stats\": { \"files_scanned\": N, \"duration_ms\": M, \"cache_hits\": K }\n}\n```\n\n`result` shape varies per action. Action-specific schemas in [`docs/SCHEMAS.md`](docs/SCHEMAS.md).\n\n## Exit codes\n\n| Code | Meaning | Agent response |\n|---|---|---|\n| 0 | Success | Parse `--json` output |\n| 1 | Usage error (bad flag, missing --dir) | Re-read `--help`, fix args, retry |\n| 2 | I/O error (path not found, no read perm) | Verify path, retry |\n| 101 | Panic | **Do not retry.** File a bug at https://github.com/charleschenai/codemap/issues |\n\nOther non-zero codes: action-specific. See `\u003caction\u003e --help`.\n\n\n## AI agent usage guide\n\ncodemap is designed for AI agents as its primary customer. Below is the canonical walkthrough for integrating codemap into agent workflows.\n\n### Why use codemap instead of grep/read?\n\n| Scenario | grep / raw edits | codemap |\n|---|---|---|\n| \"What does this codebase do?\" | Read every file sequentially | `summary` — structural overview in one call |\n| \"Find dead / unused code\" | Manual reachability tracing | `dead-functions` — true call-graph reachability |\n| \"Who calls function X?\" | String match across files | `callers` — AST-aware call graph |\n| \"What does function X depend on?\" | Direct import grep | `trace` — transitive dep graph walk |\n| \"What changed between two commits?\" | Line-level diff | `diff` — semantic diff (AST-aware) |\n| \"Find security issues\" | YARA / pattern match | `audit` — composite: taint + secret-scan + dep-tree + dead-deps |\n| \"Where does tainted input flow?\" | No tool | `taint` — path-sensitive, sanitizer-aware, cross-procedural |\n| \"Analyze a compiled binary\" | `strings` + `hexdump` + manual | `bin-info` + `bin-taint` — PE/ELF/Mach-O parsers + taint analysis |\n| \"Graph metrics on code\" | Custom scripts | 500+ built-in actions (graph theory, entropy, ML, physics-inspired) |\n\ncodemap is **read-only**, **no network**, **no servers**, **no databases**, **no API keys**. It scans your local filesystem, builds ASTs + CFGs + graphs in memory, and returns structured JSON output.\n\n### Canonical call pattern\n\nEvery action follows this pattern:\n\n```bash\ncodemap \u003cACTION\u003e [TARGET] --dir \u003cPATH\u003e --json --quiet [OPTIONS]\n```\n\n| Flag | Purpose |\n|---|---|\n| `--json` | JSON output (machine-readable) |\n| `--quiet` | Suppress progress bars and logs |\n| `--dir` | Directory to analyze (required) |\n\n**Output schema** (for actions that return results):\n\n```json\n{\n  \"ok\": true,\n  \"result\": { ... },\n  \"metrics\": {\n    \"time_ms\": 42,\n    \"files_scanned\": 1501,\n    \"edges\": 100219\n  }\n}\n```\n\nOn failure:\n\n```json\n{\n  \"ok\": false,\n  \"error\": \"error message\"\n}\n```\n\nExit codes:\n- `0` — success\n- `1` — error (check `--json` output for details)\n\n### Worked examples\n\n**Example 1: \"What does this repo do?\"**\n\n```bash\ncodemap summary --json --quiet --dir ./project\n# → Cross-file structural overview: top-level modules, key dependencies, entry points\n```\n\n**Example 2: \"Find unused functions\"**\n\n```bash\ncodemap dead-functions --json --quiet --dir ./project\n# → Functions with zero callers across the module graph. Includes call-chain depth.\n```\n\n**Example 3: \"Security audit\"**\n\n```bash\ncodemap audit --json --quiet --dir ./project\n# → Composite: taint analysis + secret detection + dependency tree + dead deps\n#   Returns findings ranked by confidence with source→sink paths\n```\n\n**Example 4: \"Taint analysis — find injection paths\"**\n\n```bash\ncodemap taint --json --quiet --dir ./project --source read --sink system\n# → Path-sensitive taint from `read` to `system` with confidence scoring\n#   Reports ranked source→sink paths with alias resolution\n```\n\n**Example 5: \"Binary analysis — what is this executable?\"**\n\n```bash\ncodemap bin-info --json --quiet ./target/release/my-binary\n# → PE/ELF/Mach-O parser: sections, imports, exports, symbols,\n#   capa-rules detection, YARA signatures, anti-debug indicators\n```\n\n### MCP: the recommended adoption path\n\nFor agents that use MCP-compatible clients (Claude Code, Cursor, Windsurf), add codemap as an MCP tool server. All 610 actions become available as MCP tools with proper input schemas:\n\n```json\n// ~/.claude/settings.json\n{\n  \"mcpServers\": {\n    \"codemap\": {\n      \"command\": \"python3\",\n      \"args\": [\"/path/to/codemap/docs/codemap-mcp-server.py\"]\n    }\n  }\n}\n```\n\nThis is the recommended path because:\n1. **No CLI parsing needed** — tools have structured input schemas\n2. **Self-documenting** — `tools/list` returns every action name, description, and schema\n3. **Executable via JSON-RPC** — `tools/call` with `{name, arguments}` dispatches any action\n4. **Zero config for AI** — the agent discovers capabilities automatically\n\nSet `CODEMAP_BIN` if your codemap binary is not on PATH:\n\n```bash\nexport CODEMAP_BIN=~/.local/bin/codemap\n```\n\n### Environment variables\n\n| Variable | Purpose | Default |\n|---|---|---|\n| `CODEMAP_BIN` | Path to codemap binary | `codemap` (from PATH) |\n| `CODEMAP_CACHE` | Custom cache directory | `.codemap/cache.bincode` (next to scanned dir) |\n\n### Error handling\n\nAlways check `--json` output for error details:\n\n```bash\nresult=$(codemap \u003cACTION\u003e --json --quiet --dir ./project)\nif echo \"$result\" | python3 -c \"import sys,json; d=json.load(sys.stdin); sys.exit(0 if d['ok'] else 1)\"; then\n  echo \"Success: $(echo \"$result\" | python3 -c \"import sys,json; print(json.load(sys.stdin)['result'])\")\"\nelse\n  echo \"Error: $(echo \"$result\" | python3 -c \"import sys,json; print(json.load(sys.stdin)['error'])\")\"\nfi\n```\n\n### Performance notes\n\n- Cold cache: sub-second on repos up to 3K files\n- Warm cache: near-instant (reads `.codemap/cache.bincode`)\n- Large repos (10K+ files): 5-30 seconds for full analysis\n- All analysis is in-memory. No disk writes except the cache file.\n- No network calls during analysis.\n\n---\n## Recipes — when the agent has a specific job to do\n\nEach recipe: **what the action does** → **command** → **sample output** → **when to use it**.\n\nFor the complete flat list of action names see [`docs/ACTION_CATALOG.md`](docs/ACTION_CATALOG.md).\n\n---\n\n### Codebase understanding (first-look on an unknown repo)\n\n#### `summary` — one-page structural overview\nReports file count, languages, entry points, top modules, dispatch density. Single-call onboarding.\n```\n$ codemap summary --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"files\":2824,\"languages\":[\"rust\",\"python\",\"typescript\"],\n  \"entry_points\":[\"src/main.rs\",\"src/lib.rs\"],\"top_modules\":[\"analysis\",\"insights\",\"cpg\"]}}\n```\n**Use when:** new repo, \"tell me what this does\" before diving deeper.\n\n#### `stats` — quantitative metrics\nPer-language LOC + file counts, function/class density, fan-in/fan-out distribution.\n```\n$ codemap stats --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"rust\":{\"files\":341,\"loc\":89432,\"fns\":2104},\"python\":{\"files\":52,\"loc\":4108}}}\n```\n**Use when:** comparing repos by size, reporting metrics, sanity-checking parse coverage.\n\n#### `layers` — architectural layer detection\nInfers boundaries (web / service / data / infra) from import patterns + naming conventions.\n```\n$ codemap layers --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"layers\":[{\"name\":\"web\",\"modules\":[\"routes\",\"handlers\"]},\n  {\"name\":\"data\",\"modules\":[\"models\",\"repo\"]}],\"violations\":[...]}}\n```\n**Use when:** validating that \"web shouldn't import from data\" type architectural rules hold.\n\n#### `hotspots` — files with most churn × complexity\nSurfaces \"danger zone\" code (high git churn + high cyclomatic complexity).\n```\n$ codemap hotspots --dir ./my-repo --json --quiet --top 10\n{\"ok\":true,\"result\":{\"hotspots\":[{\"file\":\"src/parser.rs\",\"churn\":48,\"complexity\":92,\"score\":4416}]}}\n```\n**Use when:** prioritizing refactor work, finding \"where bugs live.\"\n\n#### `entry-points` — public API surface\nLists exported functions/classes that other code can call from outside.\n```\n$ codemap entry-points --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"entries\":[{\"name\":\"create_user\",\"file\":\"api/users.rs\",\"kind\":\"public_fn\"}]}}\n```\n**Use when:** API documentation, understanding what's a stable contract.\n\n#### `health` — overall quality summary\nComposite: dead code % + clippy/lint count + circular deps + missing tests. Single \"is this repo healthy?\" score.\n```\n$ codemap health --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"score\":78,\"dead_code_pct\":3.2,\"circular_deps\":2,\"missing_tests\":[\"api/users.rs::delete\"]}}\n```\n**Use when:** quick \"should we touch this codebase or not\" gut-check.\n\n---\n\n### Code quality \u0026 cleanup\n\n#### `dead-functions` — unreachable code\nFunctions never called by any other function in the workspace.\n```\n$ codemap dead-functions --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"dead\":[{\"file\":\"src/old.rs\",\"function\":\"legacy_helper\",\"line\":42}]}}\n```\n**Use when:** cleanup PR, removing tech debt. **Don't use for:** identifying entry points (they're \"dead\" by call-graph but intentionally public).\n\n#### `dead-files` — files imported nowhere\nFiles no other file imports / uses.\n```\n$ codemap dead-files --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"dead_files\":[\"src/experimental/old_impl.rs\",\"tools/debug.py\"]}}\n```\n**Use when:** dead-import cleanup.\n\n#### `dead-deps` — declared deps never imported\nPackages in `Cargo.toml`/`package.json`/`pyproject.toml` that no source file imports.\n```\n$ codemap dead-deps --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"dead\":[\"serde_json (Cargo.toml)\",\"lodash (package.json)\"]}}\n```\n**Use when:** dep cleanup, reducing build time + attack surface.\n\n#### `complexity` — cyclomatic complexity per function\nMcCabe complexity (branches+1). Catches \"this function should be split.\"\n```\n$ codemap complexity --dir ./my-repo --json --quiet --top 10\n{\"ok\":true,\"result\":{\"top\":[{\"fn\":\"parse_expression\",\"file\":\"parser.rs\",\"cyclomatic\":34,\"lines\":280}]}}\n```\n**Use when:** finding refactor candidates, code review automation.\n\n#### `churn` — git change frequency per file\nCommits-touching-file count over a window.\n```\n$ codemap churn --dir ./my-repo --json --quiet --top 10\n{\"ok\":true,\"result\":{\"top\":[{\"file\":\"src/parser.rs\",\"commits\":78,\"authors\":12}]}}\n```\n**Use when:** combined with complexity for hotspots, ownership analysis.\n\n#### `clones` — duplicated code blocks\nDetects near-identical token sequences across files (copy-paste detection).\n```\n$ codemap clones --dir ./my-repo --json --quiet --min-tokens 50\n{\"ok\":true,\"result\":{\"clones\":[{\"size\":120,\"locations\":[[\"a.rs:14\",\"b.rs:22\"]],\"similarity\":0.94}]}}\n```\n**Use when:** finding extraction candidates for shared functions.\n\n#### `circular` — circular import detection\nReports module cycles (a → b → c → a).\n```\n$ codemap circular --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"cycles\":[[\"src/a.rs\",\"src/b.rs\",\"src/a.rs\"]]}}\n```\n**Use when:** untangling architecture before a refactor.\n\n---\n\n### Impact tracing \u0026 change analysis\n\n#### `trace` — transitive callees (what does X depend on?)\nWalks the call graph forward from a function/symbol, returns full dep tree.\n```\n$ codemap trace --dir ./my-repo --json --quiet RecalcInvoiceTotals\n{\"ok\":true,\"result\":{\"node\":\"RecalcInvoiceTotals\",\"calls\":[\n  {\"name\":\"ship_chg_sum\",\"file\":\"backend/invoices.go:120\",\"depth\":1},\n  {\"name\":\"format_money\",\"file\":\"util/money.go:8\",\"depth\":2}]}}\n```\n**Use when:** impact analysis before changing a function, generating context for an LLM.\n\n#### `callers` — transitive callers (who calls X?)\nReverse of `trace`. Returns the function's call sites + their callers.\n```\n$ codemap callers --dir ./my-repo --json --quiet validate_user\n{\"ok\":true,\"result\":{\"callers\":[{\"caller\":\"login\",\"file\":\"auth.py:88\",\"depth\":1}]}}\n```\n**Use when:** \"if I change this signature, what breaks?\"\n\n#### `blast-radius` — affected entities from a change\nCombines callers + dataflow + tests touched. Most pessimistic estimate.\n```\n$ codemap blast-radius --dir ./my-repo --json --quiet --target User.id\n{\"ok\":true,\"result\":{\"functions\":42,\"tests\":7,\"endpoints\":3,\"db_columns\":2}}\n```\n**Use when:** \"what's the size of changing this thing?\"\n\n#### `diff` — semantic diff between two refs\nFunction-level diff: added, removed, signature-changed, body-changed.\n```\n$ codemap diff --dir ./my-repo --json --quiet HEAD~5 HEAD\n{\"ok\":true,\"result\":{\"added\":[\"validate_email\"],\"removed\":[\"old_validator\"],\n  \"signature_changed\":[{\"fn\":\"create\",\"before\":\"(name)\",\"after\":\"(name,email)\"}]}}\n```\n**Use when:** generating PR descriptions, understanding code review scope.\n\n#### `api-diff` — breaking-change classifier\nLike `diff` but specifically flags BREAKING vs additive changes to public API.\n```\n$ codemap api-diff --dir ./my-repo --json --quiet HEAD~5 HEAD\n{\"ok\":true,\"result\":{\"breaking\":[\n  {\"kind\":\"removed\",\"fn\":\"OldAPI::v1_login\"},\n  {\"kind\":\"signature_change\",\"fn\":\"create_user\",\"before\":\"(name)\",\"after\":\"(name,email)\"}]}}\n```\n**Use when:** versioning decisions (semver minor vs major), CHANGELOG generation.\n\n#### `diff-impact` — functions affected by a commit range\nMaps the diff to every transitively-affected caller.\n```\n$ codemap diff-impact --dir ./my-repo --json --quiet HEAD~5 HEAD\n{\"ok\":true,\"result\":{\"impacted_fns\":127,\"impacted_files\":34,\"high_risk\":[\"payment::charge\"]}}\n```\n**Use when:** deciding test scope for a PR.\n\n#### `churn-vs-complexity` (via `hotspots`) — see Codebase understanding above\n\n---\n\n### Data flow \u0026 security\n\n#### `audit` — composite security report\nRuns taint + secret-scan + dead-deps + dep-tree + license-check in one pass.\n```\n$ codemap audit --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"findings\":[\n  {\"kind\":\"secret\",\"file\":\".env.sample\",\"line\":3,\"pattern\":\"AWS_KEY\"},\n  {\"kind\":\"taint\",\"source\":\"req.body\",\"sink\":\"db.execute\",\"path\":[...]},\n  {\"kind\":\"dep-vuln\",\"package\":\"lodash\",\"version\":\"4.17.20\",\"cve\":\"CVE-2021-23337\"}]}}\n```\n**Use when:** first-pass security review of an unfamiliar repo.\n\n#### `taint` — path-sensitive taint flow\nTracks tainted values from source(s) to sink(s). Sanitizer-aware, alias-aware (e.g. `safe = sanitize(x)`), cross-procedural (parses wrapper bodies to detect hidden sanitizers).\n```\n$ codemap taint --dir ./my-repo --json --quiet --source 'req.query' --sink 'db.execute'\n{\"ok\":true,\"result\":{\"paths\":[{\"source\":\"req.query.id\",\"sink\":\"db.execute(sql)\",\n  \"hops\":[\"params.id\",\"userId\",\"query\"],\"sanitized\":false}]}}\n```\n**Use when:** SQLi/XSS/SSRF detection, \"is user input reaching this sink?\"\n\n#### `slice` — backward program slice\nGiven a target variable/sink, return only the code that influences it.\n```\n$ codemap slice --dir ./my-repo --json --quiet --var 'password' --file auth.py\n{\"ok\":true,\"result\":{\"slice_lines\":[12,15,22,30,42],\"file\":\"auth.py\"}}\n```\n**Use when:** narrowing what to read when chasing a bug.\n\n#### `sinks` — list all dangerous sinks\nEnumerates every `db.execute`, `eval`, `exec`, `Runtime.exec`, `subprocess.shell=True`, `innerHTML=`, etc.\n```\n$ codemap sinks --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"sinks\":[{\"kind\":\"sql\",\"file\":\"api/users.rs\",\"line\":88,\"expr\":\"db.execute(query)\"}]}}\n```\n**Use when:** building taint queries, audit checklist generation.\n\n#### `secret-scan` — credentials in source\n20+ patterns (AWS key, GitHub PAT, Slack token, Stripe live key, private keys, JWT, DB conn strings, etc.). Redacted output.\n```\n$ codemap secret-scan --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"findings\":[{\"file\":\".env.sample\",\"line\":3,\"kind\":\"aws_access_key\",\"masked\":\"AKIA****REDACTED\"}]}}\n```\n**Use when:** pre-commit hook, pre-publish audit.\n\n#### `data-flow` — value origin tracing\nWhere does this variable's value come from? (def-use chain)\n```\n$ codemap data-flow --dir ./my-repo --json --quiet --target 'user_id'\n{\"ok\":true,\"result\":{\"origins\":[{\"file\":\"auth.py:88\",\"expr\":\"req.cookies['session']\"}]}}\n```\n**Use when:** \"where does this magic value come from?\"\n\n#### `api-surface` — every exported HTTP endpoint\nDetects Flask/Express/Axum/FastAPI/Spring/Rocket route handlers. Lists path + method + handler.\n```\n$ codemap api-surface --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"endpoints\":[{\"method\":\"POST\",\"path\":\"/users\",\"handler\":\"create_user\",\"auth_required\":false}]}}\n```\n**Use when:** generating OpenAPI from existing code, finding unauthenticated endpoints.\n\n---\n\n### Graph algorithms (heterogeneous-graph queries)\n\nThese run on codemap's internal call graph + import graph + AST graph.\n\n#### `pagerank` — most-important nodes\nNetworkX-style PageRank. High score = central + many incoming refs.\n```\n$ codemap pagerank --dir ./my-repo --json --quiet --top 10\n{\"ok\":true,\"result\":{\"ranked\":[{\"fn\":\"handle_request\",\"score\":0.082}]}}\n```\n**Use when:** finding \"load-bearing\" functions, prioritizing code review.\n\n#### `hubs` — high-out-degree nodes\nFunctions/modules that depend on many others. Different from PageRank (which is about incoming).\n```\n$ codemap hubs --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"hubs\":[{\"fn\":\"orchestrator\",\"out_degree\":47}]}}\n```\n**Use when:** finding god-objects, refactor targets.\n\n#### `bridges` — single-edge cut points\nEdges whose removal disconnects the graph. These are critical paths.\n```\n$ codemap bridges --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"bridges\":[{\"from\":\"auth\",\"to\":\"db\",\"modules\":[\"auth.rs\",\"db.rs\"]}]}}\n```\n**Use when:** identifying single points of failure in module coupling.\n\n#### `centrality` (17 measures) — broker / connector detection\nRun with a specific measure: `betweenness`, `eigenvector`, `katz`, `closeness`, `harmonic`, `load`, `structural-holes` (brokers), `voterank`, etc. All NetworkX standards.\n```\n$ codemap betweenness --dir ./my-repo --json --quiet --top 5\n{\"ok\":true,\"result\":{\"top\":[{\"node\":\"db_session\",\"betweenness\":0.34}]}}\n```\n**Use when:** finding modules that connect otherwise-separate subsystems.\n\n#### `clusters` — community detection (Leiden default)\nPartitions the graph into densely-connected sub-communities.\n```\n$ codemap clusters --dir ./my-repo --json --quiet leiden\n{\"ok\":true,\"result\":{\"clusters\":[{\"id\":0,\"size\":34,\"members\":[\"auth.rs\",\"users.rs\"]}]}}\n```\n**Use when:** discovering implicit module boundaries.\n\n#### `paths` — shortest path between two nodes\nReturns the chain of imports/calls connecting source → target.\n```\n$ codemap paths --dir ./my-repo --json --quiet user_input db_write\n{\"ok\":true,\"result\":{\"path\":[\"user_input\",\"sanitize\",\"query_builder\",\"db_write\"],\"length\":4}}\n```\n**Use when:** \"how does X reach Y?\"\n\n#### `subgraph` — extract a focused subgraph\nReturns nodes within N hops of a target. Useful before deep analysis.\n```\n$ codemap subgraph --dir ./my-repo --json --quiet --target login --depth 2\n{\"ok\":true,\"result\":{\"nodes\":[...],\"edges\":[...]}}\n```\n**Use when:** narrowing scope before more expensive analysis.\n\n#### `bellman-ford \u003csrc\u003e` / `astar \u003csrc\u003e \u003ctgt\u003e` / `floyd-warshall` / etc.\nClassical shortest-path algorithms exposed for graph queries. See ACTION_CATALOG.md for full list.\n\n---\n\n### Binary analysis \u0026 reverse engineering\n\n#### Decompiler (`ir` / `decompile`) — full lift → SSA → simplify → type-recovery → variable-recovery → calling-convention → SAILR structuring → C++ RTTI → readable-C emit pipeline\n\n**This is a real decompiler.** 14-stage pipeline that reconstructs expressions, variables, types, and `if` / `while` / `switch` syntax (incl. jump tables / computed branches / string-literal returns) from compiled binaries. Full G10 fidelity (10/10) + 79/79 protected-bin decomp test pass (bugbins-verify + reexec_harness) with switch_dispatch special-case recovery (const char* + \"zero\"..\"seven\"/\"unknown\" map, a1 scrutinee, correct default VA); see CHANGELOG + docs/COMMIT_LEDGER.md for G10 fixes + Job 3 consolidation + GAP3-6 (F-4 -O2 dangling-goto/continue, C++ vcall via rtti, XMM/float ABI + libc-extern recomp fix, Mach-O x86-64 thin+FAT) + GAP9 (no more rsp/rbp/rbx/r12-r15 \"lifter gap =0\" noise decls in every fn; frame uses elided to 0) + GAP8 (struct field recovery: ptr-\u003efield_0xN with synthesized typedefs for recompile) + GAP7 Part A (array element type from access width: int32_t* for 4-byte loads). Emitted C is gcc-recompilable (current 79/79 state supersedes earlier ~48/60 notes). Cross-binary type propagation + RTTI + stack slots + confidence scores. Mach-O x86-64 support (function discovery via LC_FUNCTION_STARTS + symtab + sections; feeds iced-x86/IR). \n\n**Known limitation (gap 11, deferred)**: Array indexing inside loops can decompile with an incorrect (use-before-def) index (e.g. ghost reg instead of loop counter v), producing behaviorally-wrong recompiled output (sum may return a[0]*n instead of 10); element type is correct. Root: copy-prop drops the index register's def on register reuse inside the loop. Tracked as gap 11.\n\nRemaining gaps documented in DECOMPILER.md. (New direction: user-driven decompiler quality per Ghidra issues etc.)\n\n```bash\n# Decompile a single function (full pipeline)\ncodemap ir \u003cbinary\u003e [\u003chex-fn-addr\u003e | \u003cname\u003e]\n\n# Decompile entry point\ncodemap ir \u003cbinary\u003e\n\n# Batch call-tree walk with structural hints\ncodemap decompile \u003cbinary\u003e [max-depth=N] [max-children=N] [deep]\n```\n\n**Pipeline stages:**\n1. **Lift** — iced-x86 decode → IRCFG (three-address IR with explicit BitWidth)\n2. **SSA construction** — Cytron et al. (1991): iterated-dominance-frontier phi placement + pre-order DFS renaming\n3. **Simplify** — 42 peephole rules (Miasm / angr reference-FIRST): constant folding, identity elimination, SSA-aware simplification, signed-div-by-power-of-2, ROL/ROR detection, byte-swap, etc.\n4. **Calling-convention recovery** — SysV AMD64 ABI: populate Call.args from rdi/rsi/rdx/rcx/r8/r9\n5. **Dead-code elimination** — backward dataflow liveness (~80% flag computations pruned)\n6. **Copy/constant propagation** — 4 alternating iterations of copy-prop + simplify + DCE\n7. **Dead-block removal** — reachability from entry; prunes linker padding\n8. **Block coalescing** — merge linear Goto-chains\n9. **SAILR structuring** — CFG + IRCFG → C-shaped AST (Sequence / IfThen / IfThenElse / While / For / Switch / Call / Goto)\n10. **Variable recovery** — classifies variables: Register, Stack, Memory, Temporary, Constant\n11. **Type inference** — Phase 2 seeded from widths + Mem-loads/Stores; iterated-meet solver infers Int / Pointer / struct types\n12. **Stack-slot analysis** — rsp-relative offsets for `*(rsp_N)` → `stack[\u003coffset\u003e]`\n13. **C++ RTTI analysis** — vtable references → class declarations (base classes, virtual methods, fields)\n14. **C emission** — structured AST → readable C source with type annotations, stack-slot names, symbol resolution\n\n**Differentiators:**\n- **Cross-binary type/name propagation** — types from one binary's RTTI flow into another's\n- **Graph-as-validator** — heterogeneous code graph cross-checks decompilation output\n- **Recompilable-C target** — structured, typed, symbol-resolved C suitable for recompilation\n\n**Example output:**\n```text\n=== codemap ir ===\nBinary:        ./target/release/codemap\nFormat:        ELF64 (64-bit, arch=x64)\nFunction:      main @ 0x401000 (234 bytes, 78 insns)\nCFG blocks:    12\nCFG edges:     18 (pre-enrich) → 18 (post-enrich)\nJump tables:   0 resolved indirect-JMPs\nSSA phis:      3 inserted\nVariables:     45 total (12 reg, 20 stack[-0x10..+0x18], 10 mem, 3 const, 0 tmp)\nTypes:         30 bound (15 int, 10 ptr, 3 top, 2 bot, 0 other)\nCC args:       5 call sites populated (SysV AMD64)\nDCE removed:   62 dead stmts (pre-prop) + 8 (post-prop)\nCopy-prop:     15 stmts inlined\nDead blocks:   2 removed (unreachable)\nCoalesced:     4 blocks merged\n\n--- structured AST ---\nSequence {\n  Let { rbp_0 = rbp }\n  Let { rsp_0 = (rsp - 0x10) }\n  IfThen {\n    Cond: (rax_0 == 0)\n    Then: Sequence { Call { printf(\"usage\\n\") } }\n  }\n  While {\n    Cond: (argc_0 \u003e 0)\n    Body: Sequence { ... }\n  }\n  Ret { rax_0 }\n}\n\n--- C-shaped output ---\nint main(int argc, char *argv[]) {\n    uint64_t rbp_0 = rbp;\n    uint64_t rsp_0 = (rsp - 0x10);\n\n    if (rax_0 == 0) {\n        printf(\"usage\\n\");\n    }\n\n    while (argc_0 \u003e 0) {\n        // ... loop body ...\n        argc_0 = argc_0 - 1;\n    }\n\n    return rax_0;\n}\n```\n\n**Use when:** binary reverse engineering, understanding compiled code, patch generation, static analysis of binaries. See [`docs/DECOMPILER.md`](docs/DECOMPILER.md) for full pipeline reference.\n\n---\n\n#### `bin-info` / `elf-info` / `macho-info` / `pe-info` — binary fingerprint\nFormat detection, arch, sections, strip state, language hints (Rust/Go/C++), anti-debug rules, packer detection.\n```\n$ codemap bin-info /usr/local/bin/codemap --json --quiet\n{\"ok\":true,\"result\":{\"format\":\"ELF64\",\"arch\":\"aarch64\",\"rust\":true,\"strip\":false,\n  \"sections\":34,\"anti_debug\":[],\"packed\":false}}\n```\n**Use when:** triage step 1 — \"what is this binary?\"\n\n#### `pe-imports` / `pe-exports` — Windows PE import/export tables\nLists every DLL imported + every function exported.\n```\n$ codemap pe-imports ./sample.exe --json --quiet\n{\"ok\":true,\"result\":{\"imports\":[{\"dll\":\"kernel32.dll\",\"functions\":[\"VirtualAlloc\",\"CreateProcessA\"]}]}}\n```\n**Use when:** static behavioral profiling — what APIs does this binary depend on?\n\n#### `pe-strings` / `bin-strings` — string extraction\nAscii + utf16le + entropy-filtered.\n```\n$ codemap pe-strings ./sample.exe --json --quiet --min-len 8\n{\"ok\":true,\"result\":{\"strings\":[\"http://c2.example.com\",\"cmd.exe /c\"]}}\n```\n**Use when:** triaging unknown binaries — strings often reveal C2 URLs, command lines, paths.\n\n#### `binary-diff` — semantic binary diff\nFunctions added / removed / modified between two builds.\n```\n$ codemap binary-diff --json --quiet --left v1.exe --right v2.exe\n{\"ok\":true,\"result\":{\"added\":[\"new_handler\"],\"removed\":[\"legacy_proc\"],\"modified\":[\"main\"]}}\n```\n**Use when:** patch analysis, regression hunting in firmware.\n\n#### `dotnet-meta` — .NET assembly metadata\nPE that contains CLI/.NET — reads the metadata streams, lists types + methods.\n```\n$ codemap dotnet-meta ./sample.dll --json --quiet\n{\"ok\":true,\"result\":{\"assembly\":\"Sample.Dll\",\"types\":[\"Foo\",\"Bar\"],\"methods_count\":42}}\n```\n**Use when:** analyzing .NET malware or .NET 3rd-party libs.\n\n#### `java-class` — JVM class file\nConstant pool, method signatures, bytecode summaries.\n\n#### `wasm-info` — WebAssembly module\nImports, exports, function table, memory layout.\n\n---\n\n### Schemas \u0026 config-as-code\n\n#### `openapi-schema` / `graphql-schema` / `proto-schema` — extract API schemas\nParses spec files and reports endpoints/types/operations.\n```\n$ codemap openapi-schema --dir ./api --json --quiet\n{\"ok\":true,\"result\":{\"paths\":[{\"method\":\"GET\",\"path\":\"/users\",\"operationId\":\"listUsers\"}]}}\n```\n**Use when:** generating client code, checking spec consistency.\n\n#### `k8s-scan` — Kubernetes CIS audit (16 rules)\nChecks privileged containers, hostNetwork, missing resource limits, etc.\n```\n$ codemap k8s-scan --dir ./k8s/ --json --quiet\n{\"ok\":true,\"result\":{\"findings\":[{\"rule\":\"K8S-001\",\"resource\":\"Deployment/api\",\"severity\":\"high\",\"msg\":\"privileged=true\"}]}}\n```\n**Use when:** auditing manifests before apply.\n\n#### `iac-scan` — Terraform/CloudFormation/Pulumi audit (12 rules)\n```\n$ codemap iac-scan --dir ./infra/ --json --quiet\n{\"ok\":true,\"result\":{\"findings\":[{\"rule\":\"IAC-007\",\"file\":\"main.tf\",\"msg\":\"S3 bucket public-read ACL\"}]}}\n```\n\n#### `dockerfile-scan` — Dockerfile audit (10 rules)\n```\n$ codemap dockerfile-scan --dir ./ --json --quiet\n{\"ok\":true,\"result\":{\"findings\":[{\"rule\":\"DKR-002\",\"msg\":\"running as root\",\"line\":18}]}}\n```\n\n#### `ci-scan` — CI/CD pipeline audit (37 rules across 6 ecosystems)\nGitHub Actions, GitLab CI, Jenkinsfile, CircleCI, Azure Pipelines, Travis. Catches injection, unpinned actions, secret literals, `pull_request_target` misuse.\n```\n$ codemap ci-scan --dir ./.github/ --json --quiet\n{\"ok\":true,\"result\":{\"findings\":[{\"rule\":\"GH-003\",\"file\":\"deploy.yml\",\"msg\":\"unpinned action ref\"}]}}\n```\n\n#### `oci-scan` — OCI image / docker save tarball audit\nPer-layer manifest, layer-resident secrets (11 patterns), licenses, file/dir/symlink counts.\n```\n$ codemap oci-scan --dir ./image.tar --json --quiet --mode all\n{\"ok\":true,\"result\":{\"layers\":[...],\"secrets\":[...],\"licenses\":[...]}}\n```\n\n#### `sql-extract` — SQL DDL/DML extraction\nPulls SQL out of source code or .sql files. Schema + queries.\n```\n$ codemap sql-extract --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"tables\":[{\"name\":\"users\",\"columns\":[...]}],\"queries\":[...]}}\n```\n\n---\n\n### Supply chain\n\n#### `osv-scan` — match deps against OSV.dev advisories (offline)\nSemver-range-aware.\n```\n$ codemap osv-scan --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"vulns\":[{\"package\":\"lodash\",\"version\":\"4.17.20\",\"cve\":\"CVE-2021-23337\"}]}}\n```\n\n#### `sbom-diff` — CycloneDX/SPDX diff\nAdded, removed, upgraded, downgraded packages between two SBOMs.\n```\n$ codemap sbom-diff --left ./sbom-1.spdx.json --right ./sbom-2.spdx.json --json --quiet\n{\"ok\":true,\"result\":{\"added\":[...],\"removed\":[...],\"upgraded\":[...]}}\n```\n\n#### `license-check` — SPDX compatibility\nPer-package license + compatibility verdict.\n```\n$ codemap license-check --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"deps\":[{\"name\":\"foo\",\"license\":\"GPL-3.0\",\"compatible\":false}]}}\n```\n\n#### `cve-scan` — same as osv-scan but specifically against MITRE CVE corpus\n\n---\n\n### ML / AI model files\n\n#### `gguf-info` — llama.cpp GGUF inspection\nArchitecture, layer count, head count, quant level, vocab size.\n```\n$ codemap gguf-info ./model.gguf --json --quiet\n{\"ok\":true,\"result\":{\"arch\":\"llama\",\"n_layers\":32,\"n_heads\":32,\"vocab_size\":32000,\"quant\":\"Q4_K_M\"}}\n```\n**Use when:** \"what model is this file?\" Pre-load sanity check.\n\n#### `safetensors-info` — HuggingFace safetensors inspection\nTensor shapes, dtypes, total params.\n```\n$ codemap safetensors-info ./model.safetensors --json --quiet\n{\"ok\":true,\"result\":{\"tensors\":291,\"total_params\":7240000000,\"dtype\":\"float16\"}}\n```\n\n#### `onnx-info` — ONNX model graph\nOperators, inputs, outputs, opset.\n```\n$ codemap onnx-info ./model.onnx --json --quiet\n{\"ok\":true,\"result\":{\"opset\":17,\"ops\":[\"Conv\",\"Relu\",\"MaxPool\"],\"inputs\":[{\"name\":\"x\",\"shape\":[1,3,224,224]}]}}\n```\n\n#### `cuda-info` — CUDA fatbin/cubin inspection\nSM versions present, kernel symbols.\n\n#### `pyc-info` — Python bytecode inspection\nMagic number, marshalled code object, imports.\n\n---\n\n### Cross-language \u0026 web\n\n#### `lang-bridges` — FFI/binding detection\nDetects PyO3 / napi / wasm-bindgen / JNI etc. — where languages interop.\n```\n$ codemap lang-bridges --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"bridges\":[{\"kind\":\"pyo3\",\"rust_fn\":\"create_user\",\"py_module\":\"my_lib\"}]}}\n```\n\n#### `gpu-functions` — GPU kernels in source\nCUDA `__global__`, OpenCL kernels, Metal compute kernels, ROCm/HIP.\n```\n$ codemap gpu-functions --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"kernels\":[{\"name\":\"matmul_kernel\",\"framework\":\"cuda\",\"file\":\"kernels.cu\"}]}}\n```\n\n#### `monkey-patches` — runtime mutation detection\n`obj.method = new_fn`, `setattr`, `prototype` patching.\n\n#### `dispatch-map` — generic dispatch tables\nRouters, registries, plugin maps. Finds the \"switch statement that controls behavior.\"\n\n#### `web-sitemap` — sitemap.xml + crawled link graph\n\n#### `js-api-extract` — extract API calls from HAR / JS source\n\n---\n\n### LSP bridge (requires a running language server)\n\n#### `lsp-symbols` — workspace symbol table from LSP\nReal symbol info, not AST-inferred. More accurate for typed languages.\n\n#### `lsp-references` — every reference to a symbol (LSP-grade)\n\n#### `lsp-calls` — call hierarchy from LSP\n\n#### `lsp-diagnostics` — current LSP diagnostics across the workspace\n```\n$ codemap lsp-diagnostics --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"diagnostics\":[{\"file\":\"src/main.rs\",\"line\":42,\"severity\":\"error\",\"msg\":\"E0308: mismatched types\"}]}}\n```\n**Use when:** programmatic access to compiler/type-checker errors.\n\n#### `lsp-types` — type info on hover for a position\n\n---\n\n### arXiv-derived research actions (advanced)\n\nThese implement specific research papers. `cegio` and `pointer-analysis` have real implementations with proof reports; `bin-taint` Phase A shipped with empirical proof (P@10 target, achieved P=1.00/R=0.80).\n\n#### `pointer-analysis` — Andersen field-sensitive PA\nComputes points-to sets (which pointers can alias which memory). Field-sensitive + flow-insensitive + Tarjan SCC pre-pass for performance.\n```\n$ codemap pointer-analysis --dir ./my-repo --json --quiet\n{\"ok\":true,\"result\":{\"scope_vars\":102000,\"copy_constraints\":132000,\n  \"aliases\":[{\"ptr\":\"p\",\"may_alias\":[\"a\",\"b\"]}]}}\n```\n**Use when:** understanding aliasing for refactoring (rename a field safely), upstream of taint analysis.\n\n#### `cegio` — counterexample-guided inductive optimization\n**arXiv 1704.03738**. Given taint paths, synthesizes the minimum input that triggers a vulnerability.\n```\n$ codemap cegio --dir ./my-repo --json --quiet --taint-result \u003cprior-taint-output\u003e\n{\"ok\":true,\"result\":{\"trigger\":{\"input\":\"' OR 1=1--\",\"reaches_sink\":true}}}\n```\n**Use when:** turning a taint finding into a proof-of-concept exploit input.\n\n#### `bin-taint` — binary taint analysis (Phase A)\nLifts x86-64 ELF executable sections to a taint IR, builds CFG, propagates forward may-taint dataflow from PLT-resolved sources (read/recv/fread/getenv/strcpy/memcpy) to sinks (system/popen/exec/sprintf/dlopen), reports ranked source→sink paths. Stripped-binary fallback via bounded `.text` pathfinding. Proof: precision 1.00, recall 0.80 on 8-binary corpus (4 vuln classes detected, 0 false positives on 3 safe programs).\n```\n$ codemap bin-taint ./vulnerable-binary --json --quiet\n{\"ok\":true,\"result\":{\"findings\":[{\"source\":\"getenv\",\"sink\":\"system\",\"hops\":[\"env\",\"cmd\",\"system\"],\"confidence\":0.9},{\"source\":\"read\",\"sink\":\"sprintf\",\"hops\":[\"buf\",\"format\",\"sprintf\"],\"confidence\":0.7}]}}\n```\n**Use when:** binary taint analysis on stripped ELF, finding command injection / format string / exec injection paths in compiled code.\n\n---\n\n### Composite workflows\n\n#### `audit` — kitchen-sink security report\nSee \"Data flow \u0026 security\" section above.\n\n#### `validate` — sanity check (build + lint + tests + audit summary)\nSingle composite for \"is this repo broken?\"\n\n#### `changeset` — file-grouped diff summary\n```\n$ codemap changeset --dir ./my-repo --json --quiet HEAD~10 HEAD\n{\"ok\":true,\"result\":{\"changes\":{\"feat\":[...],\"fix\":[...],\"refactor\":[...]}}}\n```\n\n#### `handoff` — generate handoff document for a project\nDistills repo state into a single MD doc (status + open issues + recent work + next-steps).\n\n#### `pipeline` — multi-action pipeline runner\nRun several actions in sequence, accumulate results.\n```\n$ codemap pipeline --dir ./my-repo --json --quiet --target 'audit:./,trace:main,hotspots:'\n{\"ok\":true,\"result\":{\"audit\":{...},\"trace\":{...},\"hotspots\":{...}}}\n```\n**Use when:** scripted multi-step analysis.\n\n---\n\n## Architecture (1-paragraph)\n\ncodemap walks `--dir`, parses with tree-sitter, builds a file-level import graph and a function-level call graph, layers PE/ELF/Mach-O/WASM/Java binary parsers + x86/x64 disassembly, and exposes 610 actions through a uniform CLI registry (`inventory::submit!`). Cache: `.codemap/cache.bincode` next to the scanned dir. Pure static. No daemons, no network access at analysis time.\n\n## Repo layout\n\n- `codemap-core/` — parsing, graph, algorithms, actions\n- `codemap-cli/` — the `codemap` binary\n- `codemap-napi/` — Node.js bindings (optional)\n- `docs/` — REFERENCE.md, ACTION_CATALOG.md, SCHEMAS.md, HUMAN.md\n- `install.sh` — single install entry\n\n## License\n\nMIT. See [`LICENSE`](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcharleschenai%2Fcodemap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcharleschenai%2Fcodemap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcharleschenai%2Fcodemap/lists"}