An open API service indexing awesome lists of open source software.

https://github.com/tombaldwin/candor-rust

A type-aware capability/effect checker for Rust (dylint lint): what does each function actually touch — network, fs, db, …? Honest about what it can't see.
https://github.com/tombaldwin/candor-rust

capabilities dylint effects lint rust static-analysis

Last synced: 5 days ago
JSON representation

A type-aware capability/effect checker for Rust (dylint lint): what does each function actually touch — network, fs, db, …? Honest about what it can't see.

Awesome Lists containing this project

README

          

# candor

Beaky, the candor canary

**Enforce the capability and architectural boundaries that AI-generated code silently crosses — as a
CI gate you can trust.** candor is a Rust capability/effect checker built as a
[dylint](https://github.com/trailofbits/dylint) lint (the reference implementation of
[candor-spec](https://github.com/tombaldwin/candor-spec)). It knows which functions reach the network,
filesystem, a database, a subprocess, the clock, or the environment — *transitively, across crates* —
and turns invariants like *"this layer stays pure," "this service may only talk to Stripe," "the
domain layer must not depend on infra"* into rules that **fail the PR** when an edit breaks them.

**Site:** [candor.poly.io](https://candor.poly.io) — the measured case in five minutes: the
exhibits, the pre-registered evals, and the prove-it-on-your-own-repo path.

**Why this matters for AI-assisted development.** An agent's characteristic failure isn't a typo —
it's a locally-reasonable edit that crosses a boundary it never sees. It adds a feature in
`pricing.rs`, and the simplest way to get the data calls something that, three hops and one crate
away, opens a socket or hits the database. The file looks clean; a quick review looks clean. candor's
transitive, cross-crate analysis catches it and the gate blocks it — the one failure mode that gets
*worse*, not better, as agents write more code faster. In a
[pre-registered trial](eval/bet2/RESULTS.md), when the locally-simplest edit crossed a pure boundary,
candor took the shipped-violation rate from **80% to 0%**.

**A gate is only worth trusting if it never lies.** candor's contract is that it *never* silently
reports a function pure when it actually reaches an effect: anything it can't resolve becomes
`Unknown` (a sound over-approximation), never a false "clean." That contract is held by an adversarial
**soundness fuzzer in CI** that threads a known effect through every way Rust can hide a call — operator
overloads, `?`, `.await`, dynamic dispatch, closures and callbacks, macros, cross-crate boundaries —
and fails the build if any reachable function comes back pure. So when candor certifies a boundary
clean, you can act on it. `cargo candor policy` is the gate itself: forbidden effects, network host
allowlists, and layer-dependency rules (AS-EFF-006/008/009), enforced across a whole workspace.

**It maps, too.** The same analysis answers "what does this function transitively touch?" and "who
reaches `Net`?" instantly from a cached report — a cheap blast-radius tool for an agent or a human in
unfamiliar code. Handing an editing agent the *non-local* delta of its own change is real value too (a
[pilot](EVAL.md) had the agent report the full propagation **100% of the time vs 7% without it**) — but
as models get better at local call-graph tracing, candor's durable edge is the part a model *can't* do
for itself: hold the whole effect graph and **block the PR**.

### Get an agent using it — one paste, from nothing

Give your coding agent (Claude Code, Cursor, …) this:

```text
Read https://github.com/tombaldwin/candor-rust/blob/main/AGENTS.md and follow it to map this repo's effects.
```

[AGENTS.md](AGENTS.md) is self-contained — it installs candor, runs it on this project, and explains
the report and the trust rule (`inferred` is authoritative; `unresolved`/`Unknown` → read the
source). Single source of truth for agents.

### Claude Code: see it work, automatically

The paste above asks the agent to use candor — but you can't *see* whether it did. The
[Claude Code integration](integrations/claude-code/) gives you a deterministic, un-fakeable
**receipt** in your transcript whenever your Rust changes — function count, effect breakdown, a
freshness hash, and a coverage warning when a dependency looks effectful but isn't calibrated:

```text
candor · 143 fns · 54 Db, 16 Net, 27 Fs · 0 unresolved · fresh @8c4c9053 · coverage ✓
```

A `Stop` hook auto-refreshes it on every turn that touches Rust (silent otherwise); `/candor` shows
it on demand. Install: `integrations/claude-code/install.sh` from your project — it installs thin
stubs that delegate to this clone, so `cargo candor update` refreshes the engine, the scripts, and
`AGENTS.md` together (every receipt is stamped with the engine commit, so they can't silently
desync). See its [README](integrations/claude-code/README.md) for the trust model and honest limits.

**Opt-in edit-time self-review.** Set `CANDOR_REVIEW=1` (in `.candor/config`) and the Stop hook does
more than inform the human: when the agent's edits give a function a *new* effect vs your committed
baseline, it hands that delta *back to the agent* as a self-review checkpoint — "your edits gave
`foo` a new `Net` (which propagates to its callers); intended?". Each effect prompts once, it never
loops, and it's off by default. This is the difference between candor *informing* an agent and
*changing what it does* — see [BACKLOG.md](BACKLOG.md) P0.

**MCP server.** [`integrations/mcp/`](integrations/mcp/) exposes candor's instant queries
(`candor_effects` / `candor_where` / `candor_callers` / `candor_diff`) as native MCP tools, so an
agent calls candor reflexively — in one cheap call instead of grepping and reading source. Pair it
with `cargo candor watch` so every call serves from a fresh report.

**Where it earns its keep — and where it doesn't.** candor is sharpest as a **CI gate** enforcing
capability and architectural boundaries transitively — fail the PR that makes a parser open a socket,
or routes the domain layer into infra ([CI guardrail](#ci-guardrail-lowest-friction-adoption)), no
token-threading or rewrite required — and as the *non-local* delta handed to an editing agent (above).
Its value is **conditional**, stated honestly: it shows up when a codebase *has* boundaries worth
defending and an edit *would* cross one. If the code already affords a clean seam, a strong model
routes around the problem and candor is redundant — the same eval showed exactly that. It is
deliberately *not* a few things: not a security boundary ([SECURITY.md](SECURITY.md)); not a
codebase-quality grade (effect counts are domain-dependent — there is no "candor score" to chase);
this repo is Rust-only (the JVM — Java/Kotlin/Scala/Groovy — has
[candor-java](https://github.com/tombaldwin/candor-java), same spec, same report shape, same gate);
and the *sound* backend needs nightly — the zero-install [stable scanner](#two-backends-stable-scanner-zero-friction-vs-the-nightly-lint-soundness)
is best-effort and under-reports. Residual unsoundness (generic dispatch over non-local traits
assumed to honour its bound) is marked by `Unknown` and coverage warnings and listed under
[Known limitations](#known-limitations). A sharp, narrow, trustworthy instrument, not a quality platform.

*Humans:* [Quick start](#quick-start-humans) · *Detail:* [what it detects](#what-it-detects) ·
[PRINCIPLES](PRINCIPLES.md) · [CRITIQUE](CRITIQUE.md)

## Layout

| Path | What |
|---|---|
| `src/lib.rs` | the entire lint — classifier, per-function call-graph fixpoint, the three modes |
| `crates/candor-classify` | the effect classifier (`crate × path → effect`) — pure string logic, no `rustc`; the one source of truth the lint **and** the stable scanner both call |
| `crates/candor-scan` | the **stable-Rust** backend: a `syn`-based scanner that produces the same report JSON on stock `cargo`, no nightly/dylint (see below) |
| `crates/candor-report` | the report types + parsing, shared by every backend and the CLI (no `rustc_private`) |
| `crates/candor-query` | `cargo-candor`'s read-only queries (`audit`/`show`/`where`/`callers`/`map`/`diff`/`whatif`/`rewire`/`containment`/`reachable`/`path`/`impact`) as one typed binary |
| `cargo-candor` | the CLI wrapper — thin bash that orchestrates the backend (`cargo dylint` or `candor-scan`) and dispatches queries to `candor-query` |
| `sample/` | a small crate written in the capability discipline, for trying conformance mode |
| `rust-toolchain` | pins the nightly the lint links against (`rustc-dev`) |

## Setup

```sh
cargo install cargo-dylint dylint-link # once per machine
./install.sh # build + install — then `cargo candor` works in any project
```

`install.sh` is one-shot and idempotent: it builds the lint (rustup auto-fetches the pinned nightly +
`rustc-dev` from `rust-toolchain` — you never manage the toolchain by hand), stashes the dylib +
the `candor-query` binary under `~/.candor` (a stable home that survives a `cargo clean` in this
clone), and symlinks `cargo-candor` into `~/.cargo/bin` so `cargo candor …` resolves everywhere. Re-run
it (or `cargo candor setup`) any time to refresh; `cargo candor update` pulls + rebuilds + refreshes.
The pinned nightly is inherent to dylint (it links rustc internals) and runs only for the lint — it
does not touch your projects' toolchains.

### Two backends: stable scanner (zero-friction) vs the nightly lint (soundness)

candor produces the **same report JSON** two ways, and every read-only query (`show`/`where`/`callers`/`map`)
reads either one identically:

```sh
cargo candor scan # STABLE: a syntactic scan on stock `cargo` — no nightly, no dylint, no rustc-dev
cargo candor audit # NIGHTLY lint: the full rustc-backed analysis with the soundness contract
```

The stable scanner is the friction-killer, and it's a **one-line install** — no clone, no nightly:

```sh
cargo install candor-scan # https://crates.io/crates/candor-scan
candor-scan . # writes .candor/report..scan.json (or --json to stdout)
# (a workspace root: one report per member, same prefix)
```

It walks the crate's `.rs` files, parses them with [`syn`](https://docs.rs/syn), resolves `use`-aliased
call paths, and classifies them through the **same [`candor-classify`](crates/candor-classify) the lint
uses** — one source of truth, so the two backends can't drift on what counts as an effect. It needs
nothing but a stable toolchain, so it runs anywhere `cargo` does (CI without a nightly, a locked-down
box). Within a full candor install it's also `cargo candor scan`.

**Stable by default — you don't pick the backend.** The read-only queries (`show`/`where`/`callers`/`map`/
`audit`) and the [Claude Code receipt](integrations/claude-code/) prefer the nightly lint when it's
installed (for the soundness contract) and **automatically fall back to the stable scanner when it
isn't** — so candor works with zero install on any machine, and the receipt says `· stable backend` when
it's using the syntactic path. Enforcement (`guard`/`policy`/`snapshot`/`diff`) still requires the lint:
blocking a PR needs the soundness guarantee.

The trade is **precision, stated honestly**. The scanner is syntactic, so it sees what's *written*, not
what the compiler *resolves*. It catches path-qualified effect calls (`std::fs::read`, `Command::new`,
`reqwest::Client::execute`), `use`-aliases, intra-crate transitive propagation, and **local-trait
dispatch** (a `&dyn Store`/`impl Store`/`S: Store` receiver resolves to the trait's local implementors —
syntactic CHA, bounded like the JVM engine's — or reads honest `Unknown` when the trait has no visible
impl or too many). For dependencies, the receipt **names what the classifier can't see** (the
κ-coverage ledger: `κ doesn't know N dependencies this code calls into…`) and `--deps` **closes it**:
scan the whole `Cargo.lock` tree once (unbuilt registry sources, ~0.23s/dep measured) and the root
scan chains over the reports — effects cross every crate boundary without the classifier knowing the
crates (spec §2 report chaining). It **misses** —
*silently*, without emitting `Unknown` — effects reached only through a method call on a non-path-qualified
receiver, dispatch through an EXTERNAL trait, closures/fn-pointers, macros, and cross-crate propagation by
stable identity. So on resolution-heavy code it **under-reports** relative to the lint. Use `scan` for
zero-friction triage and CI on stable; use the nightly lint when you need the soundness contract
(`Unknown` over-approximation, conformance, the policy/guard gates).

By default `scan` reports only the crate's **library/binary** source — it skips `tests/`, `benches/`,
`examples/`, `build.rs`, and `#[cfg(test)]` modules, so the report is what the *crate* does, not what its
harness does (`--include-tests` keeps them). A [calibration on 35 real crates](eval/calibration/CALIBRATION.md)
found that with this in place the scanner has **no false positives in library code** — every effect it
reports is real; its only errors are under-reports through FFI, method dispatch, and macros (the lint's
job). E.g. it correctly catches chrono reading `/etc/localtime`+`$TZ` and `which` resolving `$PATH`,
and honestly shows `Net: 0` on reqwest (whose socket I/O hides behind hyper's resolved method calls).

## Quick start (humans)

After `install.sh`, use the wrapper from any Rust project (it self-heals — rebuilding the dylib if it
ever goes missing):

```sh
cargo candor scan # STABLE backend: produce the report on stock cargo (no nightly)
cargo candor audit # at-a-glance effect profile of the whole project (nightly lint)
cargo candor audit --all # the full per-function lint (spans in context)
cargo candor snapshot .candor/baseline # write a JSON report
cargo candor guard .candor/baseline # fail on functions that gained an effect
cargo candor diff .candor/baseline # describe the per-function effect delta (--json)
cargo candor watch # keep the report fresh in the background → instant `diff`
cargo candor show my_function # a function's effects, instant (read from the report)
cargo candor where Net # which functions perform an effect, instant
cargo candor callers my_function # which functions call this one, instant (who depends on it)
cargo candor explain my_function # trace WHY a function has each effect (the call path)
cargo candor containment [baseline] # effect-leakage diagnostic; with a baseline, a CI ratchet
cargo candor reachable # what the program does at runtime (union over entry points)
cargo candor path my_fn Net # the call chain by which a fn comes to perform an effect
cargo candor impact my_fn # blast radius: transitive callers + downstream entry points
cargo candor policy .candor/policy # enforce effect boundaries (deny/pure rules)
cargo candor risk # heuristic: effects on caller-derived input (advisory)
cargo candor strict my_module # conformance, scoped to a module
cargo candor no-ambient my_module # flag direct ambient-authority use
```

`cargo candor audit` aggregates the project's crates into a one-screen profile — how many
functions perform each effect, which make calls candor can't resolve, any uncalibrated
dependencies, and the functions with the broadest reach into the outside world:

```text
candor @62a9383
143 effectful functions · 7 pgman.Executable · 136 pgman.Rlib

effects 56 Db · 53 Clock · 47 Log · 37 Env · 27 Fs · 23 Exec · 21 Clipboard · 18 Net

broadest effect surface
app::App::run { Clipboard Clock Db Env Exec Fs Log Net }
main { Clipboard Clock Db Env Exec Fs Log Net }
run_batch { Clock Db Env Exec Fs Log Net }

```

`cargo candor policy` is candor's **architecture-as-code** layer — and the part that earns its keep as
models get better at local reasoning. A model advises; only a tool, holding the whole effect graph,
*blocks the PR*. It enforces the failure mode AI agents have most — editing one function without seeing
the whole effect graph — and does it at a scale nobody holds in their head: one command snapshots every
crate in the workspace, then enforces with the siblings loaded, so a boundary catches a violation whose
cause lives in *another crate*. A policy file declares invariants and candor flags any *transitive* one:

```text
# .candor/policy
deny Net Db Fs domain # the domain layer must reach no I/O — even through a helper
pure parse # parsing must be side-effect-free
deny Exec # nothing may spawn a subprocess
allow Net in billing api.stripe.com # billing may reach the network — but ONLY Stripe
allow Exec in build git # the build layer may run subprocesses — but ONLY git
allow Db in billing ledger.* # billing may touch the database — but ONLY the ledger schema
forbid domain -> infra # the domain layer must not depend on infrastructure
```

```text
[AS-EFF-006] `domain::checkout` performs { Db }, forbidden by policy (scope `domain`): `deny Net Db Fs domain`
[AS-EFF-008] `billing::record_activity` reaches { metrics.growthtracker.io:443 } outside the allowlist, forbidden by policy (scope `billing`): `allow Net in billing api.stripe.com`
[AS-EFF-009] `domain::checkout` reaches into a forbidden layer (via `infra::db::save`), violating policy: `forbid domain -> infra`
```

Three boundary kinds, all checked *transitively* so they catch what a local diff hides:

- **`deny` / `pure`** (AS-EFF-006) — *what* a layer may do. `checkout` need not touch the database
directly; candor catches it reaching `Db` through any callee.
- **`allow in …`** (AS-EFF-008) — *which values* an effect may reach: `Net`
hosts ("billing may only talk to Stripe"), `Exec` commands ("build may only run git"), `Fs` paths
("config may only read /etc/app"). A supply-chain boundary a model can't self-check, because the
literal is buried in a transitive, often **cross-crate**, callee (matched per-effect: host by name,
command by basename, path by prefix).
- **`forbid -> `** (AS-EFF-009) — *who* a layer may depend on. The domain layer must not reach
into infra, even through a chain of helpers.

`cargo candor policy` enforces all three across a whole workspace in one command — it snapshots every
crate, then enforces with the siblings loaded, so an effect or endpoint that lives in a *shared crate*
still gets caught at the boundary that forbids it. See [examples/candor-policy](examples/candor-policy)
and [eval/bet3](eval/bet3/RESULTS.md).

`cargo candor risk` is an **advisory, heuristic** nudge toward the injection class — an effect whose
argument derives from a function parameter (`fs::read(format!("/var/cache/{key}"))`, `Command::new(name)`):

```text
[AS-EFF-007] `read_user_file` performs { Fs } on caller-derived input (an injection surface — …)
```

It is *not* sound taint analysis: a syntactic, intra-procedural check that over- and under-flags
(it misses flow through struct fields and across functions, and flags a parameter that's actually
validated). Use it to find surfaces worth reviewing — never as a gate.

### `cargo candor containment` — an architecture signal that isn't a "score"

Raw effect *counts* are domain-dependent (a database app has lots of `Db` — not a defect), so there is
no single "candor score". But the **dispersion** of a boundary effect across layers *is*
domain-independent: `Db` all in one data layer is well-architected; `Db` smeared across `model`,
`actions`, *and* `dao` is leaky — regardless of how much DB it does. `containment` measures that, per
boundary effect (`Db`/`Net`/`Exec`/`Fs`/`Ipc`); `Log`/`Clock` are ambient (reported, not scored). Layers
are inferred from the module after the common crate root, no config:

```text
effect contained layers owner ← leaked into
Db 55% 3 conn (11) ← query:7, app:2 # pgman: DB is mostly in conn/query…
```

(Run on `pgman` — whose DB is *meant* to live in `conn` + `query` — candor independently found that
boundary **and** its one documented exception, the `app:2` leak.) Given a baseline prefix it's a
**ratchet** — gate on getting worse, note getting better:

```text
[containment] a boundary effect leaked into a layer it wasn't in: ← exit 1, fail the PR
Db → actions
✓ improved — a boundary effect left a layer: ← informational
Db ⊘ legacy
```

`cargo candor containment` for the diagnostic, `cargo candor containment .candor/baseline` for the gate.
Deliberately a **diagnostic + trend gate, not a single grade** — the absolute level is domain-dependent
and gameable, but "did a boundary effect leak into a new layer?" is a real, enforceable quality signal.

## All modes (explicit invocation)

From any Rust project root, with `LINT` set to the dylib's absolute path:

```sh
# AUDIT (default): every function's transitive effect set. No code changes needed.
cargo dylint --lib-path "$LINT"

# JSON: machine-readable report, one file per crate+type: ...json
CANDOR_JSON=/tmp/report cargo dylint --lib-path "$LINT"

# CONFORMANCE: enforce inferred ⊆ declared.
CANDOR_STRICT=1 cargo dylint --lib-path "$LINT" # whole crate
CANDOR_STRICT=mymod::sub cargo dylint --lib-path "$LINT" # one module (incremental adoption)

# ENFORCEMENT (cap-std-aligned): flag any DIRECT reach for ambient authority.
CANDOR_NO_AMBIENT=mymod cargo dylint --lib-path "$LINT" # AS-EFF-004 per direct ambient call

# REGRESSION GUARD: fail if any function gained an effect since a saved snapshot.
CANDOR_JSON=.candor/baseline cargo dylint --lib-path "$LINT" # 1. snapshot (commit it)
CANDOR_BASELINE=.candor/baseline cargo dylint --lib-path "$LINT" # 2. in CI: AS-EFF-005 on regressions

# Flags that combine with any mode:
CANDOR_CONFIG=candor.rules cargo dylint --lib-path "$LINT" # extra classifier rules
CANDOR_PARANOID=1 cargo dylint --lib-path "$LINT" # treat generic trait dispatch as Unknown
```

Or register it in a project's `Cargo.toml` so plain `cargo dylint` finds it — by local path,
or **straight from git with no clone** (dylint fetches and builds it against candor's pinned
toolchain). This is dylint's equivalent of a dependency; dylint loads libraries only from `git` or
`path` sources, not crates.io, so candor is **not** (and need not be) published there.

```toml
[workspace.metadata.dylint]
# clone-free — pin a tag/rev for reproducibility:
libraries = [{ git = "https://github.com/tombaldwin/candor-rust", tag = "v0.3.0" }]
# …or a local checkout:
libraries = [{ path = "/abs/path/to/candor" }]
```

## What it detects

candor answers two questions about a codebase:

1. **What effects does each function perform?** — network (AWS SDK, `reqwest`/`ureq`/`isahc`, raw
`std`/`tokio` sockets), databases (`sqlx`/`rusqlite`/`postgres`/…), local IPC (Unix sockets),
filesystem, process spawn, env, clock, randomness, logging, clipboard — including effects
inherited transitively through the functions it calls.
2. **Are the signatures honest?** — once you thread capability tokens (or use cap-std) through a
module, it flags any function performing an effect it doesn't declare.

It resolves every call's `DefId` and classifies the crate/path it lands in. That type resolution is
the point: a bare `.send()` is meaningless syntactically, but the resolved method tells us it belongs
to `aws_sdk_*` → a network effect.

## The capability discipline (conformance mode)

A function declares the effects it may perform by taking the matching **capability token** as a
parameter (`&Fs`, `&Env`, …). Tokens are unforgeable — a private field means they can only be
*received*, never constructed outside their defining module — and are minted once at the entry
point. See `sample/src/main.rs` for the pattern. The checker then flags:

- **AS-EFF-001** — a function performs an effect it does not declare.
- **AS-EFF-002** — a function declares a capability it never uses.
- **AS-EFF-003** — a function makes a call candor cannot resolve (dynamic dispatch, fn-pointer, or
callback through `impl Fn`), so its effect set is not provably complete and cannot be certified.
- **AS-EFF-004** (`CANDOR_NO_AMBIENT`) — a function reaches for *ambient authority* directly
(`std::fs`, `std::net`, `std::env`, `std::process`, the clock, …) instead of receiving a
capability. This is the cap-std-aligned, *enforceable* alternative to the advisory tokens: it
fires even on functions that hold a token, because holding `&Fs` doesn't stop you calling
`std::fs`. The fix is to route the call through an injected capability (e.g. a cap-std handle).
- **AS-EFF-005** (`CANDOR_BASELINE`) — an existing function *gained* an effect it didn't have in a
saved snapshot. The lowest-friction adoption path: no token threading, no rewrite — just catch the
PR that makes a previously-pure function start doing network/disk/etc. I/O. (New functions are not
flagged; they're reviewed as new code.)

Adopt incrementally: scope `CANDOR_STRICT` / `CANDOR_NO_AMBIENT` to one module, fix until it reports
zero, then move to the next.

### Or use real capabilities: cap-std

candor recognises [cap-std](https://github.com/bytecodealliance/cap-std) capability *types* as
declarations and its operations as the matching effect. A function that takes a `&Dir` and reads
through it (`dir.read_to_string(..)`) is conformant — its declared `Fs` matches its inferred `Fs` —
while a sibling that reaches for ambient `std::fs` is flagged. Unlike candor's own advisory tokens,
cap-std capabilities are unforgeable and compile-enforced; candor just makes the effect surface
*visible* on top. See `sample-capstd/`. Mapped today: `Dir`→Fs, `Pool`/`TcpStream`→Net,
`SystemClock`→Clock, `UnixStream`→Ipc.

## CI guardrail (lowest-friction adoption)

You don't have to adopt the capability discipline to get value. The cheapest win is the regression
guard: snapshot the effect report, commit it, and fail CI when a function's effect surface grows.

```sh
# once, on a known-good commit — then `git add .candor/`
CANDOR_JSON=.candor/baseline cargo dylint --lib-path "$LINT"

# in CI: fail only on AS-EFF-005 (a function gained an effect) — see examples/candor-guard.yml
out=$(CANDOR_BASELINE=.candor/baseline cargo dylint --lib-path "$LINT" 2>&1); echo "$out"
echo "$out" | grep -q AS-EFF-005 && { echo "effect surface grew"; exit 1; } || true
```

Now a PR that makes a parser suddenly open a socket, or a render function start reading the
filesystem, fails review automatically — no tokens, no rewrite. Refresh the baseline deliberately
(re-run the snapshot command) when a new effect is intended. This is equally useful to a human
reviewer and to an AI agent reviewing a diff.

## How well does it actually help an agent? (the honest version)

A controlled pilot ([EVAL.md](EVAL.md)) pitted a JSON-only agent against a source-only one on the
same scoping task. The JSON was ~3× cheaper and ~6.5× faster — *and* it surfaced a real lesson: the
source-only agent was more **accurate** in one spot, because candor had silently misclassified some
`reqwest` HTTP calls (a classifier gap, since fixed). So: the report is cheap and genuinely useful,
but **only as correct as its classifier** — which is exactly why `Unknown`/`unresolved` exists, and
why an agent should treat flagged-uncertain functions as "go read the source," not "trust me."

## Unresolved calls (honest soundness)

A call candor cannot trace to a concrete callee — `dyn Trait` dispatch, a function pointer, a
closure reached through a generic `impl Fn` parameter — could perform *any* effect. candor records
these as an **`Unknown`** effect rather than silently assuming purity. You'll see `Unknown` in audit
output and the JSON `unresolved` flag; in conformance mode it raises AS-EFF-003. (Measured cost of
*not* doing this: on a real ~8k-line codebase, 22% of functions make at least one unresolved call.)

Residual gap: statically-dispatched **generic** trait calls (`t.method()` where `t: T: Trait`) are
assumed to honour their bound rather than marked `Unknown` — otherwise every `.clone()` /
`.to_string()` / iterator adaptor would drown the report. See `CRITIQUE.md`.

## Extending the classifier

`classify()` in `src/lib.rs` is a curated table mapping crates/paths to effects. To recognise your
own effectful crates without rebuilding, point `CANDOR_CONFIG` at a rules file — one rule per line,
` `:

```
# project effect rules
Net crate reqwest
Fs path mycrate::storage::
```

Match the actual I/O boundary, not the whole crate — e.g. only `.send()` for an SDK, only
`Command`/`Child` for `std::process` — or you will over-report.

## Known limitations

- **Dynamic dispatch / fn-pointers / callbacks** can't be resolved to a concrete callee. These are
surfaced honestly as `Unknown` (→ AS-EFF-003) rather than silently dropped, but candor still can't
tell you *which* effects hide behind them. Exception: `dyn` over conventionally-pure std traits
(`Display`, `Debug`, `Error`, `ToString`, `Clone`, …) is treated as pure, not `Unknown` —
otherwise ubiquitous patterns like `dyn Error` formatting would flood reports with false positives.
- **Generic static dispatch** (`t.method()` for `t: T: Trait`) is assumed to honour its bound — a
deliberate residual unsoundness to keep the report readable (see `CRITIQUE.md`).
- **Advisory, not enforced**: a `&Fs` token doesn't actually gate `std::fs`; candor only reports.
For real enforcement use [cap-std](https://github.com/bytecodealliance/cap-std).
- **Macro-generated consts/statics are skipped** (to drop noise like tracing's per-log-site
`__CALLSITE` statics). Macro-generated *functions* (an `async_trait` method, a derive-impl method,
a decl-macro fn) **are** analyzed and reported — the earlier blanket skip was a real under-report,
fixed and held by a fuzzer lane (`macro_call` / macro-defined sinks).
- **Capabilities must be direct parameters.** `declared_caps` recognizes a capability (`&Fs`, a
cap-std `&Dir`) only as a top-level parameter. A capability reached *through* a struct field
(`fn f(ctx: &AppContext)` where `ctx` holds the `Dir`) is not counted as declared — that function
would be flagged in strict mode despite holding the capability.
- **Generic static dispatch over non-local traits** is assumed to honour its bound (CHA only sees
through *local* traits); `CANDOR_PARANOID` flags the rest at the cost of noise.
- Logging via macros is deduped per function but counts every function that logs.

## Documentation

- **[candor-spec](https://github.com/tombaldwin/candor-spec)** — the language-agnostic spec candor
implements (effect vocabulary, report schema, trust contract). Shared by the
[JVM engine](https://github.com/tombaldwin/candor-java) and the from-spec-alone
[TS engine](https://github.com/tombaldwin/candor-ts); a CI conformance suite holds all three to the
same answers.
- **[AGENTS.md](AGENTS.md)** — self-contained instructions for an AI agent (install → run → read).
- **[PRINCIPLES.md](PRINCIPLES.md)** — the ideas candor (and its development) are built on.
- **[CRITIQUE.md](CRITIQUE.md)** — an honest, critical self-assessment + comparison to prior art
(Cackle, cap-std, the Rust effects initiative).
- **[EVAL.md](EVAL.md)** — a controlled pilot of whether the report actually helps an AI agent.
- **[BACKLOG.md](BACKLOG.md)** — what's done, what's deferred, and the concrete reason for each.
- **[CONTRIBUTING.md](CONTRIBUTING.md)** — build/test, and how to teach the classifier a new crate.
- **[SECURITY.md](SECURITY.md)** — why candor is *not* a security boundary, and how to report a
false-negative (the bug class that matters most).

## Tests

`cargo test --workspace` runs unit tests over the *classifier* precision rules (e.g. `std::net::TcpStream`
is `Net` but `std::net::SocketAddr` is not) plus a load smoke-test, and the `candor-report` /
`candor-query` tooling tests (report parsing/discovery, the query commands). The **stateful core** (call-graph
fixpoint, CHA, conformance) isn't unit-tested — it needs the dylint harness, which has no bless
support — so it's covered instead by the `sample/`+`sample-capstd/` crates and a CI *behavioural*
check that asserts real audit output (so a "candor emits nothing" regression fails CI). The lint also
fails *gracefully* (never an ICE) on expressions outside a typechecked body.

The **soundness contract** — "never silently pure" — is its own gate, not a hope. An adversarial
fuzzer ([`soundness/`](soundness/)) generates compilable crates that thread a *known* effect from a
leaf up through a random chain of call forms, and asserts every reachable function is reported with
that effect or `Unknown` (a pure/omitted function is the bug it hunts). The fuzzer lanes cover the
forms that have historically hidden a call — direct/closure/generic/boxed callbacks, `dyn` and
arbitrary-self dispatch, UFCS, overloaded operators, `?`, `.await`, macros, implicit `Drop`, opaque
`impl Trait` returns, and the cross-crate boundary — each *teeth-verified* (reverting the fix makes
the lane fail). Two more
lanes run each program under `strace` and confirm candor's static prediction over-approximates the
effects the kernel actually observed — ground truth that trusts nothing about how the test was generated.

## Status

Prototype. Validated on a real ~8k-line codebase (the `ebman` AWS Elastic Beanstalk TUI):
audit tagged ~445 functions; a leaf module was converted to the capability discipline and brought to
zero conformance violations while still building on stable.

candor also **guards itself**: CI runs candor over candor against `.candor/baseline`. Its effectful
surface — five functions in the lint (config / baseline / cross-report reads + the report write, all
`Env`/`Fs`), plus `candor-report`'s `report_files` (`Fs`) and the build script (`Exec`/`Fs`) — can't
gain a *new* effect unnoticed. Note the guard's scope, honestly: per AS-EFF-005's design it flags
*regressions in existing functions*, not brand-new functions (those are reviewed as new code), so a
newly-added effectful function wouldn't trip it. Refresh with `cargo candor snapshot .candor/baseline`
when a new effect is intended.

## License

Dual-licensed under [MIT](LICENSE-MIT) or [Apache-2.0](LICENSE-APACHE), at your option.