{"id":50682290,"url":"https://github.com/FlorianBruniaux/ctxharness","last_synced_at":"2026-06-25T18:00:58.225Z","repository":{"id":355836419,"uuid":"1228904236","full_name":"FlorianBruniaux/ctxharness","owner":"FlorianBruniaux","description":"Catch stale versions, broken paths, and missing scripts in your AI instruction files (CLAUDE.md, rules)","archived":false,"fork":false,"pushed_at":"2026-06-03T05:50:02.000Z","size":388,"stargazers_count":13,"open_issues_count":0,"forks_count":3,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-03T07:29:42.425Z","etag":null,"topics":["ai","claude-code","cli","context-engineering","developer-tools","documentation","drift-detection","github-action","testing","typescript"],"latest_commit_sha":null,"homepage":"https://ctxharness.bruniaux.com","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FlorianBruniaux.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-04T13:53:19.000Z","updated_at":"2026-06-03T05:50:07.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/FlorianBruniaux/ctxharness","commit_stats":null,"previous_names":["florianbruniaux/ctxharness"],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/FlorianBruniaux/ctxharness","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlorianBruniaux%2Fctxharness","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlorianBruniaux%2Fctxharness/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlorianBruniaux%2Fctxharness/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlorianBruniaux%2Fctxharness/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FlorianBruniaux","download_url":"https://codeload.github.com/FlorianBruniaux/ctxharness/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlorianBruniaux%2Fctxharness/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34786231,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-25T02:00:05.521Z","response_time":101,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","claude-code","cli","context-engineering","developer-tools","documentation","drift-detection","github-action","testing","typescript"],"created_at":"2026-06-08T20:00:23.317Z","updated_at":"2026-06-25T18:00:58.213Z","avatar_url":"https://github.com/FlorianBruniaux.png","language":"TypeScript","funding_links":[],"categories":["TypeScript"],"sub_categories":[],"readme":"# ctxharness\n\n**AI documentation drift detection for teams using Claude Code, Cursor, Copilot, and any agent-driven workflow.**\n\nYour `CLAUDE.md` says the auth config lives at `src/config/auth.ts`. That file moved to `src/modules/auth/config.ts` six months ago. Your agent tries to import from a path that no longer exists, silently, on every session.\n\nOr: `CLAUDE.md` says `npm run typecheck`. The script was renamed to `npm run type-check` during a cleanup sprint. The agent runs a command that doesn't exist.\n\nctxharness catches this before it reaches your agents.\n\n```bash\nnpx ctxharness scan CLAUDE.md   # zero-config — detect drift instantly\n# or, with full config:\nnpx ctxharness init              # scaffold .ctxharness.yml\nnpx ctxharness run               # check all assertions\n```\n\n## Why put facts in CLAUDE.md at all?\n\nAgents work from the context window. Pointing an agent to `package.json` works, but it means reading that file on every session, and many facts can't be read from a single source file: architectural patterns, team conventions, file locations, which ORM you're using and why. ctxharness is for facts you've already decided to state explicitly. It keeps those statements accurate.\n\n## What it checks\n\n**L1 — Fact drift**: file existence, npm scripts, versions, counts, regex captures — any extractable fact from your codebase vs what your AI docs claim.\n\n**L2 — Instruction quality**: vague language that degrades agent reliability (\"be careful\", \"use your judgment\"), positive/negative instruction ratio, multi-file coherence, token budget.\n\n**L3 — Context assembly**: hook validation, skill loading, rule glob validity, coverage ratio.\n\n**No migration needed.** Works on your existing CLAUDE.md, AGENTS.md, and .cursorrules files as-is.\n\n## Install\n\n**npm/pnpm (Node.js projects):**\n\n```bash\nnpm install -g ctxharness\n# or\npnpm add -D ctxharness\n```\n\n\u003e **Standalone binary** for Python, Go, Rust and other non-Node projects: planned for v1.0.\n\n## Quick start\n\n```bash\nctxharness init          # creates .ctxharness.yml\nctxharness doctor        # full health check with L1/L2/L3 breakdown\n```\n\nExample output:\n\n```\nAI Context Test — 5 assertions\n\nfact                    expected       mentions  status\n────────────────────────────────────────────────────────────────────────\nauth-config-path        true                  1  ✗ 1 mismatch\ntypecheck-script        true                  1  ✗ 1 mismatch\nnode-version            22.14.0               1  ✓ 1/1 pass\nno-vague-language       check                 2  ✓ 2/2 pass\ninstruction-balance     check                 2  ✓ 2/2 pass\n────────────────────────────────────────────────────────────────────────\n\nMismatches\n────────────────────────────────────────────────────────────────────────\nauth-config-path        CLAUDE.md:18   true        false\ntypecheck-script        CLAUDE.md:34   true        false\n────────────────────────────────────────────────────────────────────────\n✗ 2 mismatch(es) — update the file(s) listed above\n```\n\n## Configuration\n\n`.ctxharness.yml` — minimal starter (one assertion per layer):\n\n```yaml\nversion: 1\n\nfiles:\n  include:\n    - 'CLAUDE.md'\n    - 'AGENTS.md'\n    - '.cursorrules'\n  exclude:\n    - 'node_modules/**'\n\nassertions:\n  # L1 — fact drift: auth config path still exists\n  - id: auth-config-path\n    extractor: fileExists\n    extractorArgs:\n      path: src/modules/auth/config.ts\n    scanner: literalInMd\n\n  # L1 — fact drift: typecheck script matches package.json\n  - id: typecheck-script\n    extractor: packageScript\n    extractorArgs:\n      script: type-check\n    scanner: literalInMd\n\n  # L2 — instruction quality: no vague language\n  - id: no-vague-language\n    extractor: constant\n    extractorArgs:\n      value: check\n    scanner: vaguenessPattern\n\n  # L3 — context assembly: hooks are valid\n  - id: hook-validity\n    extractor: constant\n    extractorArgs:\n      value: check\n    scanner: hookValidity\n```\n\n\u003cdetails\u003e\n\u003csummary\u003eAdvanced config — allowlist, scopeFiles, multi-version assertions\u003c/summary\u003e\n\n```yaml\nassertions:\n  # allowlist: skip known-intentional mismatches in specific files\n  - id: next-version\n    extractor: packageJson\n    extractorArgs:\n      package: next\n    scanner: inlineRegex\n    scannerArgs:\n      pattern: 'Next\\.js\\s+v?(\\d+(?:\\.\\d+(?:\\.\\d+)?)?)'\n    allowlist:\n      - CHANGELOG.md   # version history file — intentional old values\n\n  # scopeFiles: restrict an assertion to a subset of files\n  - id: instruction-balance\n    extractor: constant\n    extractorArgs:\n      value: check\n    scanner: negativeConstraintDensity\n    scannerArgs:\n      minRatio: 2.0\n    scopeFiles:\n      include:\n        - 'CLAUDE.md'\n        - 'AGENTS.md'\n      exclude:\n        - '.cursorrules'   # constraint-only file by design\n```\n\n\u003c/details\u003e\n\n## Extractors\n\nRead ground truth from your codebase. Common ones: `fileExists`, `packageScript`, `packageJson`, `nvmrc`, `gitStaleness`, `prismaModelList`, `goMod`, `cargoToml`.\n\n\u003cdetails\u003e\n\u003csummary\u003eFull extractor list (20)\u003c/summary\u003e\n\n| Name | What it reads | Args |\n|------|--------------|------|\n| `packageJson` | `dependencies`/`devDependencies` version | `package: string` |\n| `packageManager` | `packageManager` field (strips corepack hash) | — |\n| `nvmrc` | `.nvmrc` file | — |\n| `fileExists` | Whether a path exists (`\"true\"`/`\"false\"`) | `path: string` |\n| `regexScan` | Capture group from any file | `path`, `pattern`, `group?` |\n| `countMatches` | Count of pattern matches in a file | `path`, `pattern` |\n| `constant` | Fixed value (placeholder for quality scanners) | `value: string` |\n| `prismaModel` | Count of `model X {}` blocks in a Prisma schema | `path: string` |\n| `prismaModelList` | JSON array of model names from a Prisma schema | `path: string` |\n| `prismaEnum` | Count of values in a named Prisma enum | `path: string`, `enum: string` |\n| `trpcRouter` | Count of router entries in a tRPC root file | `path: string` |\n| `trpcRouterList` | JSON array of router names from a tRPC root file | `path: string` |\n| `gitStaleness` | Commits since a file was last changed (0 = up-to-date) | `path: string` |\n| `packageEngines` | Node/runtime version from `package.json` `engines` field (strips `\u003e=` operators) | `field?: string` (default `\"node\"`) |\n| `tsconfigPaths` | Count of path aliases in `tsconfig.json` `compilerOptions.paths` (JSONC-aware) | `path?: string` (default `\"tsconfig.json\"`) |\n| `pyprojectToml` | Version from `pyproject.toml` — Poetry and PEP 621 formats | `package?: string`, `field?: string` |\n| `requirementsTxt` | Package version from `requirements.txt` | `package: string`, `path?: string` |\n| `cargoToml` | Version from `Cargo.toml` — own version or dependency; supports Cargo workspaces (`[workspace.package]`, `[workspace.dependencies]`) | `package?: string`, `field?: string` |\n| `goMod` | Module version from `go.mod` | `module: string` |\n| `packageScript` | Returns `\"true\"`/`\"false\"` if a named npm script exists in `package.json` | `script` (required), `file` (optional, default `\"package.json\"`) |\n\nVersion normalization: `v22` matches `22.14.0` — partial mentions are valid.\n\n\u003c/details\u003e\n\n## Scanners\n\nFind and validate content in your AI doc files. Common ones: `inlineRegex`, `literalInMd`, `vaguenessPattern`, `hookValidity`, `coverageRatio`, `freshnessScore`.\n\n\u003cdetails\u003e\n\u003csummary\u003eFull scanner list (15)\u003c/summary\u003e\n\n### Drift scanners (compare against extractor value)\n\n| Name | What it scans | Args |\n|------|--------------|------|\n| `inlineRegex` | All lines matching a regex | `pattern`, `flags?` |\n| `codeBlockRegex` | Lines inside fenced code blocks only | `pattern`, `lang?`, `flags?` |\n| `yamlField` | YAML front matter or inline YAML | `field` (dot-path) |\n| `jsonField` | Inline JSON blocks | `field` (dot-path) |\n| `literalInMd` | Literal string presence | `literal` |\n| `pathReference` | File path reference | `path` |\n\n### Quality scanners (no extractor value needed, use `constant`)\n\n| Name | What it detects | Args |\n|------|----------------|------|\n| `vaguenessPattern` | Vague instructions (\"be careful\", \"as needed\", \"use your judgment\"…) | `patterns?: string[]` |\n| `negativeConstraintDensity` | Positive/negative instruction ratio below threshold | `minRatio?: number` (default 1.0) |\n| `contextBudget` | File token footprint — fails if estimated tokens exceed threshold | `maxTokens?: number` (default 3000), `followImports?: boolean` (follows `@file.md` chains up to depth 3) |\n| `ruleGlobValidity` | Claude Code rules file — checks for YAML frontmatter and optional `paths:` field | `requirePaths?: boolean` (default false) |\n| `hookValidity` | **Standalone.** Resolves `.claude/settings.json` from project root and validates each hook entry | — |\n| `backtickEntityPresence` | Checks that `` `entity` `` appears as inline code in the doc | `entity: string` |\n| `skillValidity` | **Standalone.** Globs `.claude/skills/**/*.md` from project root — validates YAML frontmatter has `name:` and `description:` | `requireDescription?: boolean` (default `true`) |\n| `freshnessScore` | **Standalone.** Interprets commit count from `gitStaleness` — returns pass/warn/fail based on thresholds | `warnAfter?: number` (default 30), `failAfter?: number` (default 100) |\n| `coverageRatio` | Checks what fraction of a JSON array (from `prismaModelList`/`trpcRouterList`) appears in the doc | `minRatio?: number` (default 0.8), `valueAllowlist?: string[]` |\n\n**Standalone scanners** (`hookValidity`, `skillValidity`, `freshnessScore`) bypass `files.include` and resolve their own targets from the project root. You do not add their paths to `files.include` — they run once regardless of how many files are in scope.\n\n`vaguenessPattern` accepts custom patterns via `scannerArgs.patterns` (array of regex strings).\n\n`contextBudget` estimates tokens as `chars ÷ 4`. Designed to run over `.claude/rules/**/*.md` or `CLAUDE.md` to catch bloated always-on context files. With `followImports: true`, it resolves `@file.md` references recursively up to depth 3 and includes their token footprint in the total.\n\n`ruleGlobValidity` is designed to run over `.claude/rules/**/*.md`. By default it fails if a rules file has no YAML frontmatter (meaning it loads at every session with no scoping). Set `requirePaths: true` to also fail if the frontmatter lacks a `paths:` field.\n\n`freshnessScore` works with `gitStaleness` extractor. `gitStaleness` returns the commit count since the file was last changed; `freshnessScore` compares it to your thresholds:\n\n```yaml\n- id: claude-md-freshness\n  extractor: gitStaleness\n  extractorArgs:\n    path: CLAUDE.md\n  scanner: freshnessScore\n  scannerArgs:\n    warnAfter: 20   # ⚠ warn if \u003e20 commits since last edit\n    failAfter: 50   # ✗ fail if \u003e50 commits\n```\n\n`coverageRatio` checks that a fraction of your actual entities (models, routers…) are mentioned in the docs — useful when you can't document everything but want to enforce a minimum:\n\n```yaml\n- id: prisma-model-coverage\n  extractor: prismaModelList\n  extractorArgs:\n    path: src/server/db/prisma/schema.prisma\n  scanner: coverageRatio\n  scannerArgs:\n    minRatio: 0.5          # at least 50% of models mentioned\n  valueAllowlist:\n    - MigrationVersion     # internal model, not required in CLAUDE.md\n```\n\n\u003c/details\u003e\n\n## CLI\n\n```bash\nctxharness run       # run all assertions, exit 1 on drift\nctxharness check     # alias for run --format text\nctxharness scan      # scan a markdown file for verifiable claims without a config file\nctxharness score     # run assertions and report a 0-100 health score with grade (S/A/B/C/D/F)\nctxharness trend     # show cross-run drift score history — sparkline, direction, per-run table\nctxharness populate  # scan declared files and suggest new assertions for uncovered claims\nctxharness snapshot  # save a quality snapshot to .ctxharness/snapshots/\nctxharness diff      # compare against latest snapshot — exit 1 on score regression\nctxharness fix       # auto-fix version drift — dry-run by default, --apply writes files\nctxharness doctor    # comprehensive health check with L1/L2/L3 breakdown and remediation advice\nctxharness init      # scaffold .ctxharness.yml\n```\n\n`ctxharness init --hooks` also installs Husky post-merge / post-checkout hook scripts alongside the config.\n\nOptions:\n\n```\n-c, --config \u003cpath\u003e    Config file path (default: .ctxharness.yml)\n-f, --format \u003cfmt\u003e     Output format: text | json | gha (default: text)\n-r, --root \u003cdir\u003e       Project root (default: cwd)\n-w, --watch            Re-run on file changes (run command only)\n```\n\n`ctxharness fix` finds every assertion where the actual version differs from expected on a specific line and shows what it would change. Pass `--apply` to write the files:\n\n```\n$ ctxharness fix\nCLAUDE.md:13  prisma-version  7.5 → 7.7.0\nCLAUDE.md:42  next-version    15.2.0 → 15.3.1\n\nRun ctxharness fix --apply to write changes.\n```\n\n### Zero-config scan\n\nBefore setting up a full `.ctxharness.yml`, you can scan any AI instruction file for verifiable claims:\n\n```bash\nnpx ctxharness scan CLAUDE.md\n```\n\nThis detects file paths, npm scripts, and version numbers mentioned in the file and checks each against your codebase. Paths and scripts first, because those are the claims most likely to silently break agent behavior when they drift.\n\n```\nScanning CLAUDE.md...\n\n  src/config/auth     fileExists   ✗ path not found (moved to src/modules/auth/config.ts?)\n  npm run typecheck   packageScript  ✗ script not found in package.json\n  Node.js 22.14.0    version      ✓ matches .nvmrc\n\n2 issues found. Run with --suggest-config to generate .ctxharness.yml.\n```\n\nSince v0.4.2, `scan` follows `@file.md` includes (Claude/Gemini/Cursor convention) up to depth 3. Claims in included files are detected and verified — a drift in `@agents.md` referenced from `CLAUDE.md` is no longer invisible.\n\n```bash\nnpx ctxharness scan CLAUDE.md --suggest-config   # generate a starter .ctxharness.yml\nnpx ctxharness scan CLAUDE.md --exit-zero        # warn without blocking (hooks / CI)\n```\n\nThe detector filters out common false positives: Claude Code slash commands (`/plan`, `/ship`), URL route patterns (`/api/chunk`, `/about/`), and template placeholders (`{slug}`, `[owner]`).\n\n**scan vs run:** `scan` is for discovery — zero-config, always informational. `run` is for enforcement — requires `.ctxharness.yml`, exits 1 on drift.\n\n**Husky hook** — the `post-merge` template automatically picks the right mode:\n\n```bash\n# .husky/post-merge (generated by ctxharness init --hooks)\nif [ -f \".ctxharness.yml\" ]; then\n  ctxharness check          # blocking — full enforcement\nelse\n  ctxharness scan --exit-zero  # informational — zero-config discovery\nfi\n```\n\n### Snapshot workflow\n\nTrack quality over time and block regressions in CI:\n\n```bash\n# Save baseline after initial setup\nctxharness snapshot\n\n# In CI: compare against the committed baseline\nctxharness diff     # exit 1 if score dropped\n```\n\nSnapshots are saved to `.ctxharness/snapshots/` with timestamp. Commit the latest snapshot file to use `diff` in CI.\n\n### Trend history\n\nEvery `run`, `check`, `score`, and `doctor` execution auto-records a drift score to `~/.ctxharness/history.jsonl`. The `trend` command shows your score trajectory over time:\n\n```bash\nctxharness trend\n\nTrend — myproject (8 runs)\n\n  Sparkline   ▃▅▆▇▇▇██\n  Direction   ↑ improving  (+14 pts over 8 runs)\n  Avg Score   91/100\n\n  Date                    Score      G      Pass   Fail   Time\n  ──────────────────────────────────────────────────────────────\n  May 06, 14:23:01       100/100    S      5      0      42ms\n  May 06, 11:45:22        98/100    S      5      0      38ms\n  May 05, 09:12:08        87/100    B      4      1      55ms\n  ...\n```\n\nDirection is computed by comparing the average of the first third of runs against the last third: `improving` (delta \u003e 3 pts), `worsening` (delta \u003c -3 pts), or `flat`.\n\n```bash\nctxharness trend --all          # all projects\nctxharness trend --limit 50     # last 50 runs (default: 20)\nctxharness trend --project api  # specific project name\n```\n\nIn CI, use `--no-trend` to skip recording — useful when you only want trend data from your main branch, not every PR run:\n\n```bash\nctxharness run --no-trend\n```\n\n### populate\n\nScan your already-declared files for verifiable claims (semver, paths, scripts) and suggest assertions for any claims not yet in your config.\n\n```bash\n# Dry-run (default): preview what would be added\nctxharness populate\n\n# Write changes to .ctxharness.yml\nctxharness populate --apply\n```\n\nTypical workflow: run `ctxharness init` once to bootstrap the config, then run `ctxharness populate --apply` any time you update your AI docs and want to capture new claims.\n\n### warn status\n\nAssertions can return three states: `pass`, `warn`, or `fail`. `warn` is counted as 0.5 in the score — useful for staleness checks where you want early signal without blocking CI:\n\n```\n⚠ 1 warn   — claude-md-freshness: 35 commits since last edit\n```\n\n`ctxharness doctor` categorizes all issues by layer, shows a per-layer score, and suggests next actions:\n\n```\nctxharness doctor\n\nL1  Doc Drift           ██████████  100/100\nL2  Instruction Quality ████████░░   80/100\nL3  Context Assembly    ██████░░░░   60/100\n\nScore: 80/100  Grade: B\n\nIssues:\n  L2  no-vague-language   AGENTS.md:14   vague pattern found: \"be careful\"\n  L3  hook-validity        .claude/settings.json   hook entry has empty matcher\n```\n\n## Plugin API\n\nRegister custom extractors and scanners programmatically:\n\n```typescript\nimport { definePlugin, loadPlugin } from '@florianbruniaux/ctxharness-core'\n\nconst myPlugin = definePlugin({\n  extractors: [\n    {\n      name: 'myExtractor',\n      fn: (root, args) =\u003e {\n        // read ground truth from codebase, return a string\n        return '1.2.3'\n      },\n    },\n  ],\n  scanners: [\n    {\n      name: 'myScanner',\n      fn: (filePath, expectedValue, args) =\u003e {\n        // check the file, return ScanResult[]\n        return [{ status: 'pass', line: 0, actual: expectedValue }]\n      },\n    },\n  ],\n})\n\nloadPlugin(myPlugin)\n```\n\nUse `name: myExtractor` and `scanner: myScanner` in your `.ctxharness.yml` like any built-in.\n\n## Stack presets\n\nReady-to-use config templates for common stacks:\n\n| Preset | Path | Covers |\n|--------|------|--------|\n| T3 (Next.js + Prisma + tRPC) | `templates/presets/t3.yml` | node, next, typescript, prisma, trpc versions + model/router counts |\n| Next.js App Router | `templates/presets/next-app-router.yml` | node, next, typescript, react versions + quality assertions |\n| Python | `templates/presets/python.yml` | python version, pyproject.toml deps + quality assertions |\n| Go | `templates/presets/go.yml` | go toolchain version, go.mod deps + quality assertions |\n| Rust | `templates/presets/rust.yml` | crate version, Cargo.toml deps + quality assertions (workspace-aware) |\n\nCopy a preset as your `.ctxharness.yml` starting point:\n\n```bash\ncp node_modules/ctxharness/templates/presets/t3.yml .ctxharness.yml\n```\n\n## CI integration\n\nGitHub Actions:\n\n```yaml\n- name: Check AI doc drift\n  uses: FlorianBruniaux/ctxharness@v0.4\n  with:\n    config: .ctxharness.yml\n    format: gha\n```\n\nOr copy `templates/ci/github-actions.yml` for a full workflow. GitLab CI and CircleCI templates are at `templates/ci/gitlab-ci.yml` and `templates/ci/circleci.yml`.\n\nHusky (post-merge, post-checkout): copy from `templates/husky/`.\n\n### `.claude/settings.json` vs `settings.local.json`\n\nClaude Code follows a two-file convention for project settings:\n\n- **`settings.json`** — committed to the repo. Contains config that should work for anyone who clones the project: hook definitions, permission rules, shared assertions. Use relative paths for hook commands (`.claude/hooks/my-hook.sh`, not `/Users/yourname/...`).\n- **`settings.local.json`** — gitignored. Contains machine-specific or personal overrides: your own keybindings, local path overrides, personal MCP servers.\n\n`hookValidity` validates `settings.json`. If a hook command contains an absolute path (e.g. `/Users/yourname/...`), it returns `status: warn` — those paths break for every other contributor.\n\nThe same layering applies to `CLAUDE.md` files: project-level goes in the repo root (committed), personal goes in `~/.claude/CLAUDE.md` (your machine only). [Claude Code memory docs](https://docs.anthropic.com/en/docs/claude-code/memory).\n\n## Ecosystem Positioning\n\n```\n                         FACTUAL ACCURACY\n                                ▲\n                                │\n                                │              ★ ctxharness\n                                │         \"Are the claims still true?\"\n                                │          paths · scripts · versions\n                                │\n  ──────────────────────────────┼──────────────────────────────────► RUNTIME\n  [vigiles]                     │                                    VERIFICATION\n  TS spec → CLAUDE.md           │\n                                │\n  [AgentLint]                   │     [cclint / cursor-doctor]\n  structural linter             │     syntax \u0026 format checks\n                                │\n  [Ruler / rulesync]            │\n  rule distribution             │\n                         STATIC / STRUCTURAL\n```\n\nctxharness does not compete with any of these tools. It validates what they write, compile, or distribute.\n\n## Taxonomy\n\nctxharness covers three layers of context engineering testing:\n\n| Layer | What |\n|-------|------|\n| **L1 Doc Drift** | Facts in AI docs vs code reality — file existence, npm scripts, versions, counts, regex captures |\n| **L2 Instruction Quality** | Vague language, positive/negative ratio, token budget, multi-file coherence |\n| **L3 Context Assembly** | Hook validation, skill loading, rule glob validity, coverage ratio |\n\nL4 (agent behavior eval) is out of scope — use [Promptfoo](https://promptfoo.dev) or [Braintrust](https://braintrust.dev) for that.\n\n**ctxharness vs Promptfoo**: Promptfoo evals what your agent *says* (output quality). ctxharness evals what your agent *reads* (input freshness). They're complementary, not competing.\n\n## Further Reading\n\nThe problem ctxharness addresses is well-documented. These are the sources worth reading.\n\n**Context engineering — why accuracy matters**\n\n- [Context Engineering](https://simonwillison.net/2025/Jun/27/context-engineering/) — Simon Willison, June 2025. Why \"context engineering\" is a better term than \"prompt engineering\" and what it means in practice.\n- [The Rise of Context Engineering](https://www.langchain.com/blog/the-rise-of-context-engineering) — LangChain, June 2025. *\"Most of the time when an agent is not performing reliably, the underlying cause is that the appropriate context has not been communicated to the model.\"*\n- [Context Engineering for Large Codebases](https://packmind.com/context-engineering-ai-coding/context-engineering-large-codebases/) — Packmind, April 2026. Documents \"context drift\" — stale instruction files referencing deprecated frameworks cause agents to silently generate code using wrong patterns.\n\n**What stale context does to LLMs**\n\n- [Contextual Drag: How Errors in the Context Affect LLM Reasoning](https://arxiv.org/abs/2602.04288) — arXiv, Feb 2026. Wrong context causes 10-20% performance drops across 11 models and 8 reasoning tasks. Self-refinement makes it worse, not better.\n- [Knowledge Conflicts for LLMs: A Survey](https://arxiv.org/abs/2403.08319) — EMNLP 2024. Temporal knowledge conflicts (outdated context vs. model knowledge) are a primary source of factually wrong outputs. LLMs may generate code using deprecated function signatures from older library versions.\n- [Lost in the Middle](https://arxiv.org/abs/2307.03172) — Stanford / ACL 2024. Relevant information placed in the middle of long contexts is systematically under-weighted by LLMs. Instruction files that accumulate stale content push critical facts into this dead zone.\n- [Your Agent's Context Is a Junk Drawer](https://www.augmentcode.com/blog/your-agents-context-is-a-junk-drawer) — Augment Code, Feb 2026. Documents \"context collapse\" — agents forget earlier constraints when context grows stale and unmanaged.\n\n**The CLAUDE.md / AGENTS.md problem specifically**\n\n- [Writing a Good CLAUDE.md](https://www.humanlayer.dev/blog/writing-a-good-claude-md) — HumanLayer, Nov 2025. *\"Don't include code snippets — they will become out-of-date quickly.\"* Direct practitioner warning on content drift.\n- [New Research Reassesses the Value of AGENTS.md Files](https://www.infoq.com/news/2026/03/agents-context-file-value-review/) — InfoQ, March 2026. ETH Zurich study: LLM-generated context files reduce task success by 3% on average and increase inference costs by 20%+. Authors recommend limiting instructions to non-inferable details — exactly the facts ctxharness verifies.\n- [When AGENTS.md Backfires](https://notchrisgroves.com/when-agents-md-backfires/) — Feb 2026. Only 14.5% of agent context files include security instructions. LLM-generated files reduced task success in 5 of 8 evaluation settings.\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFlorianBruniaux%2Fctxharness","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FFlorianBruniaux%2Fctxharness","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFlorianBruniaux%2Fctxharness/lists"}