{"id":49068333,"url":"https://github.com/thejefflarson/soundcheck","last_synced_at":"2026-06-10T11:00:41.623Z","repository":{"id":340914267,"uuid":"1168152494","full_name":"thejefflarson/soundcheck","owner":"thejefflarson","description":"Simple security reviews for AI agents","archived":false,"fork":false,"pushed_at":"2026-06-10T05:37:57.000Z","size":852,"stargazers_count":18,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-10T07:22:07.791Z","etag":null,"topics":["llms","security","skills"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thejefflarson.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-27T04:14:27.000Z","updated_at":"2026-06-10T05:38:47.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/thejefflarson/soundcheck","commit_stats":null,"previous_names":["thejefflarson/soundcheck"],"tags_count":25,"template":false,"template_full_name":null,"purl":"pkg:github/thejefflarson/soundcheck","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thejefflarson%2Fsoundcheck","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thejefflarson%2Fsoundcheck/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thejefflarson%2Fsoundcheck/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thejefflarson%2Fsoundcheck/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thejefflarson","download_url":"https://codeload.github.com/thejefflarson/soundcheck/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thejefflarson%2Fsoundcheck/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34149132,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-10T02:00:07.152Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llms","security","skills"],"created_at":"2026-04-20T06:11:06.684Z","updated_at":"2026-06-10T11:00:41.613Z","avatar_url":"https://github.com/thejefflarson.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Soundcheck\n\nAutomated security checks for Claude Code. 52 skills covering injection,\nauthentication, cryptography, access control, LLM-specific threats, and\nmore — drawn from OWASP, CWE, and real-world vulnerability patterns.\nWhen Claude writes vulnerable code, the matching skill fires, flags\nthe issue, rewrites the offending block, and hands the turn back to\nClaude.\n\n---\n\n## Install\n\n```bash\nclaude plugin marketplace add thejefflarson/soundcheck\nclaude plugin install soundcheck\n```\n\nAll 52 skills are active in every Claude Code session after install. To\ntry without installing (current session only):\n\n```bash\nclaude --plugin-dir /path/to/soundcheck\n```\n\n---\n\n## How you'll use it\n\n### Automatic reviews after each edit\n\nSoundcheck reviews every diff the moment Claude finishes writing it.\nWhen something looks risky, Claude reads the findings on the next turn\nand either fixes the code or pushes back — no manual `/security-review`,\nno PR comment cycle. You catch issues while the context is still fresh,\nnot the next day.\n\nA one-shot haiku triage decides whether the diff warrants a full\nreview, so most turns cost ~$0.003. Only diffs that plausibly introduce\na vulnerability trigger the full `pr-review` — a few cents when it\nfires.\n\nEnabled by default. To disable, export `SOUNDCHECK_AUTO_REVIEW=false`\nin your shell before launching Claude Code. See\n[docs/auto-review.md](docs/auto-review.md) for the staged flow, full\ncost table, and limitations.\n\n### On demand: three review modes for existing code\n\nWhen you want to scan existing code — a PR diff, a whole repo before a\nrelease, or a deep audit before shipping — reach for one of these:\n\n| Mode | When | Time | Cost | Catches |\n|---|---|---|---|---|\n| **`/pr-review`** | Every pull request, in CI | ≤1 min | a few cents | Critical/High OWASP in the diff |\n| **`/security-review`** | Nightly CI or monthly audit | ~20 min | ~$4 | All severities, whole repo, attack chains |\n| **`/contract-review`** | Pre-release or after big refactor | ~30 min | ~$10–20 | Bugs where a function does less than callers assume |\n\nRule of thumb: gate every PR on `pr-review`, schedule `security-review`\nnightly or weekly, and add `contract-review` on a slower cadence once\nthe obvious bugs are out of the way.\n\n#### `pr-review` — the CI gate\n\n\u003e **Security note:** `pr-review` passes untrusted repository content into an LLM\n\u003e context. Prompt-injection mitigations are instruction-level only — a crafted\n\u003e file in the PR could influence the model's output. Treat a clean gate result as\n\u003e \"no obvious Critical/High findings,\" not as a guarantee of correctness. Do not\n\u003e use `pr-review` output as the sole gate for high-stakes merges; pair it with\n\u003e human review for security-sensitive changes.\n\nUse the [Soundcheck GitHub Action](https://github.com/thejefflarson/soundcheck-action):\n\n```yaml\nname: Security Review\non: [pull_request]\n# contents:write is only required when autofix is enabled (apply-rewrites: 'true').\n# For read-only review, downgrade to contents:read.\n# Do NOT trigger on fork PRs — GITHUB_TOKEN from a fork cannot write back to the\n# base repo, and untrusted fork code runs with write permissions.\npermissions:\n  contents: write        # needed only for autofix commits; use contents:read otherwise\n  pull-requests: write   # needed to post the findings comment\njobs:\n  review:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n        with:\n          fetch-depth: 0\n      - uses: thejefflarson/soundcheck-action@v1\n        with:\n          anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}\n          github-token: ${{ secrets.GITHUB_TOKEN }}\n```\n\nThe action posts a severity-ranked findings table to the PR. Auto-fix\n(committing LLM-generated changes back to the branch) stays off by\ndefault; opt in with `apply-rewrites: 'true'`. Before turning it on in\nCI, gate the resulting commits behind branch-protection rules and human\nreview — the action ships no approval gate of its own. To preview the\nchanges without committing, run the script locally:\n\n```bash\npython scripts/security-review-action.py --repo-dir . --diff-base main\n```\n\nThis dry-run prints findings without writing any files.\n\n#### `security-review` — full repo audit\n\nIn a Claude Code session:\n\n```\n/security-review\n```\n\nOr from a checkout:\n\n```bash\npython scripts/security-review-action.py --repo-dir . --full-repo --model sonnet\n```\n\n#### `contract-review` — deep audit for subtler bugs\n\nFor caller/callee invariant gaps — bugs where two functions each look\nfine alone but break together (an auth helper named like an identity\ncheck but matching only by name; a \"verified\" predicate that fails open\non null input).\n\nIn a Claude Code session:\n\n```\n/contract-review\n```\n\nOr headless from a checkout:\n\n```bash\npython scripts/contract-review.py --repo-dir . --model opus\n```\n\nBoth produce the same findings table. Treat each finding as\n**hypothesis-grade** — read the code and write a PoC before filing. See\n[`docs/contract-review.md`](docs/contract-review.md) for hit-rate\nnumbers, the comparison against `security-review`, and known limitations.\n\n---\n\n## Does it actually work?\n\nShort answer: yes — with caveats per mode.\n\n### Head-to-head against bare Claude (auto-invoking skills)\n\nWe test against 130 deliberately broken fixtures — Flask login routes\nwith hardcoded passwords, SQL queries built from string concatenation,\nfile uploads without size limits. Each fixture carries a checklist of\nwhat a thorough review should catch and fix. Claude reviews each fixture\ntwice — once with Soundcheck loaded, once with a generic \"be a security\nreviewer\" prompt — and a judge model scores both against the checklist.\n*Full pass* means Claude satisfies every checklist item.\n\n| Model | With Soundcheck | Plain Claude | Lift |\n|---|---|---|---|\n| Haiku | **77%** full pass | 40% | +37 pts |\n| Sonnet | **90%** full pass | 58% | +32 pts |\n\nWhen the two reviews disagree, Soundcheck wins 6 of 7 times. The lift is\nsimilar across model tiers — loading the plugin raises baseline quality\neverywhere we tested.\n\nStatistical detail: Wilcoxon signed-rank on per-fixture score\ndifferences, p \u003c 1e-6 on haiku, p \u003c 1e-4 on sonnet. Methodology in\n[`docs/smoke-test-methodology.md`](docs/smoke-test-methodology.md).\n\n### External validation\n\nOur own fixtures could bias the result toward patterns we had in mind.\nTwo independent checks:\n\n- **[SecurityEval](https://github.com/s2e-lab/SecurityEval)** — 104\n  vulnerable Python samples published by academic researchers with\n  ground-truth CWE labels. Both arms hit **100% detection and 100% fix**\n  here. The snippets are obvious enough that bare Claude already catches\n  everything, so this benchmark doesn't discriminate further — but it\n  confirms Soundcheck doesn't break anything on code it wasn't designed\n  against.\n- **Real OWASP projects** — 13 vulnerable files pinned from OWASP Juice\n  Shop (TypeScript) and OWASP PyGoat (Python). Soundcheck caught all 13\n  and fully fixed **12 of 13 (92%)**. The one miss — open-redirect in\n  Juice Shop's `redirect.ts` — Soundcheck flagged but only partially\n  patched.\n\n### Review times (full-repo pipeline)\n\n`scripts/benchmark-eval.py` against three open-source projects at pinned\ncommits:\n\n| Repo | Language | Haiku | Sonnet |\n|---|---|---|---|\n| [redash](https://github.com/getredash/redash) | Python | **6.1 min** | 17.3 min |\n| [cal.com](https://github.com/calcom/cal.com) | TypeScript | **8.7 min** | 25.7 min |\n| [vaultwarden](https://github.com/dani-garcia/vaultwarden) | Rust | **5.7 min** | 14.7 min |\n\nFor very large monorepos (we ran haiku against gitea in 2 minutes\ndiff-scoped), use the PR-scoped `--diff-base` flow.\n\nIn practice:\n\n- **Haiku handles PR gates** — typical diff-scoped reviews finish in\n  1-2 minutes.\n- **Sonnet handles nightly or monthly deep scans** — higher-quality\n  findings, but 3-4× the per-call time.\n- **Soundcheck adds no latency over bare Claude.** On the SecurityEval\n  paired run, plugin-loaded reviews ran *faster* than bare (15.2s median\n  vs 17.7s) — the focused skill prompt converges sooner.\n\n### Where it's weakest\n\nTwo honest caveats:\n\n1. **Memory safety (kernels, codecs, crypto-lib internals).** Soundcheck's\n   `memory-api-misuse` and `crypto-library-misuse` skills catch local\n   patterns — unchecked `malloc`, AEAD nonce reuse — but Soundcheck does\n   not trace whole-program lifetimes. ASAN/UBSAN/Valgrind, fuzzers\n   (libFuzzer, OSS-Fuzz), and static analyzers (`clang-tidy`, CodeQL) own\n   that territory. Run them alongside Soundcheck on C/C++ codebases.\n2. **`contract-review` produces hypotheses, not CVEs.** When we built\n   PoCs for the findings, a meaningful share turned out to be false\n   positives — the findings described real code patterns, but the\n   exploit chain broke at a downstream check the audit never traced.\n   Investigate each finding before filing. Numbers in\n   [`docs/contract-review.md`](docs/contract-review.md).\n\nReproduce any of this with:\n- `python scripts/smoke-test-skills.py` — paired, ~2h on haiku\n- `python scripts/benchmark-securityeval.py --with-bare`\n- `python scripts/benchmark-realworld.py`\n- `python scripts/benchmark-contract-review.py` — ~$60, ~2h\n\n---\n\n## Configuration\n\n### Cost control\n\n`security-review` and `contract-review` burn API tokens. Cap the spend\nwith `--max-budget-usd` to prevent runaway costs in CI:\n\n```bash\n# Cap full-repo scan at $5\npython scripts/security-review-action.py --repo-dir . --full-repo \\\n  --model sonnet --max-budget-usd 5\n\n# Cap contract review at $15\npython scripts/contract-review.py --repo-dir . --model opus \\\n  --max-budget-usd 15\n```\n\nWhen a run hits the cap, the script exits non-zero and prints a partial\nfindings table. The default cap sits at **$20** for `security-review`\nand **$30** for `contract-review` — always pin an explicit budget in CI\nto rule out surprises.\n\n### Optional: reinforce triggers in your `CLAUDE.md`\n\nTo apply triggers across every project (not just ones with the plugin\nloaded), add this to `~/.claude/CLAUDE.md`:\n\n```markdown\n## Security\n\nWhen writing code, always invoke the soundcheck plugin skills for any\ncode involving: authentication, authorization, cryptography, SQL/shell/\ntemplate construction, error handling, logging, deserialization, LLM\nAPI calls, or agent workflows.\n```\n\n### Non-Anthropic providers (Bedrock, Vertex)\n\nSoundcheck shells out to `claude -p --model \u003cX\u003e`, so any model string\nthe `claude` CLI accepts works. For Bedrock or Vertex the model alone\nisn't enough — set the provider-selection env vars first:\n\n```bash\n# Bedrock\nexport CLAUDE_CODE_USE_BEDROCK=1\nexport AWS_REGION=us-east-1\n# plus AWS credentials (SSO, IAM role, or access keys)\n\npython scripts/security-review-action.py --repo-dir . --diff-base main \\\n  --model arn:aws:bedrock:us-east-1:...:application-inference-profile/...\n```\n\n```bash\n# Vertex\nexport CLAUDE_CODE_USE_VERTEX=1\nexport CLOUD_ML_REGION=us-east5\nexport ANTHROPIC_VERTEX_PROJECT_ID=your-gcp-project\n```\n\nWithout those env vars, `claude -p --model \u003cARN\u003e` hangs or fails\nsilently and Soundcheck surfaces a timeout. The script runs a short\npreflight call before the real review to catch this fast.\n\n---\n\n## Trigger reference\n\nThe 52 skills auto-invoke based on patterns in their `description`\nfrontmatter. You do *not* have to ask Claude to run them — they fire\nwhenever the code Claude is about to write matches a trigger.\n\n| Code pattern | Skill invoked | OWASP |\n|---|---|---|\n| Authorization checks, resource ownership, IDOR, SSRF | `broken-access-control` | A01:2025 |\n| Server config, CORS, debug flags, security headers, secrets | `security-misconfiguration` | A02:2025 |\n| `npm install`, `pip install`, dependency manifests, CI/CD pipelines | `supply-chain` | A03:2025 |\n| Encryption, password hashing, random token generation, TLS config | `cryptographic-failures` | A04:2025 |\n| SQL queries, shell commands, templates with user input, `eval`, ORM raw queries | `injection` | A05:2025 |\n| Rate limiting, login flows, business logic, multi-step workflows | `insecure-design` | A06:2025 |\n| Login, sessions, JWT, password storage, MFA, API key management | `authentication-failures` | A07:2025 |\n| Deserialization, pickle/yaml load, software update verification, CI artifacts | `integrity-failures` | A08:2025 |\n| Logging, audit trails, error handlers that log, security event recording | `logging-failures` | A09:2025 |\n| Error handlers, try/catch, API error responses, exception propagation | `exceptional-conditions` | A05:2025 |\n| LLM prompt construction with user input, RAG pipelines, system prompts | `prompt-injection` | LLM01:2025 |\n| Rendering LLM output to UI, executing LLM-generated code, downstream LLM output use | `insecure-output-handling` | LLM02:2025 |\n| Fine-tuning pipelines, dataset ingestion, training data from external sources | `training-data-poisoning` | LLM03:2025 |\n| LLM input limits, inference backends, chatbot request handling, token budgets | `model-dos` | LLM04:2025 |\n| Loading pre-trained models, model registries, third-party LLM providers | `llm-supply-chain` | LLM05:2025 |\n| Sending PII/secrets to LLM, system prompts with sensitive data, LLM memory | `sensitive-disclosure` | LLM06:2025 |\n| LLM tool definitions, function schemas, plugin access controls | `insecure-plugin-design` | LLM07:2025 |\n| Autonomous agents, LLM-triggered write/delete/send actions, multi-step pipelines | `excessive-agency` | LLM08:2025 |\n| Displaying LLM output as fact, LLM-driven consequential decisions, no human review | `overreliance` | LLM09:2025 |\n| Inference API endpoints, model access controls, rate limiting on model serving | `model-theft` | LLM10:2025 |\n| MCP server definitions, tool schemas, tool handlers with file/shell/network access | `mcp-security` | LLM07:2025 |\n| OAuth2/OIDC flows, JWT validation, redirect URI handling, token endpoints | `oauth-implementation` | A07:2025 |\n| RAG pipelines, vector store ingestion, external document retrieval for LLM context | `rag-security` | LLM01:2025 |\n| Implementation plans for features, APIs, or components touching user data or auth | `threat-model` | A06:2025 |\n| Storing credentials/tokens/PII to local files, prefs stores, SQLite, or temp dirs | `insecure-local-storage` | A02:2025 |\n| URL scheme handlers, exported Android activities, IPC sockets, XPC service handlers | `ipc-security` | A01:2025 |\n| Agent-to-agent calls, subagent spawning, multi-agent pipelines | `multi-agent-trust` | LLM08:2025 |\n| User-supplied strings to LLM with Unicode control chars, homoglyphs, RTL override | `token-smuggling` | LLM01:2025 |\n| ORM create/update from raw request body, spread/merge without field allowlist | `mass-assignment` | API3:2023 |\n| HTML forms with POST/PUT/DELETE, session cookies, CSRF middleware config | `csrf` | A01:2025 |\n| File upload handlers, multipart form data, user-supplied filenames | `file-upload` | A04:2025 |\n| HTTP requests to user-supplied URLs, webhook callbacks, URL preview features | `ssrf` | A10:2025 |\n| File open/read/write with paths from user input, static file serving by name | `path-traversal` | A01:2025 |\n| Third-party API calls, external response parsing, webhook/callback integration | `unsafe-api-consumption` | API10:2023 |\n| Regular expressions on user input, input validation patterns | `redos` | CWE-1333 |\n| Check-then-act on shared state, balance updates without locks, TOCTOU | `race-condition` | CWE-362 |\n| Redirect to URL from request params, login \"return to\" URLs | `open-redirect` | CWE-601 |\n| JS/TS deep merge, Object.assign, lodash merge with user input | `prototype-pollution` | CWE-1321 |\n| API keys, passwords, tokens as string literals in source | `hardcoded-secrets` | CWE-798 |\n| GraphQL schemas without depth limits, introspection in production | `graphql-security` | CWE-400 |\n| MongoDB/NoSQL queries with user input, operator injection | `nosql-injection` | CWE-943 |\n| User input in HTTP response headers, CRLF injection | `header-injection` | CWE-113 |\n| `malloc`/`free`/`pthread_*` misuse, missing return checks, double-free, fd leak across exec | `memory-api-misuse` | CWE-690 |\n| AEAD nonce reuse, ECDSA k reuse, length-extension, padding-oracle exception leakage | `crypto-library-misuse` | CWE-323 |\n| SUID drop, PATH/IFS/LD_* trust, temp-file races, symlink-follow on /tmp | `privilege-handling` | CWE-271 |\n| Lock held across IO/await, lock ordering, atomic memory order, double-checked locking | `concurrency-correctness` | CWE-833 |\n| Untrusted integer flows into length/index/auth comparison without bounds | `numeric-trust-boundary` | CWE-190 |\n\n### On-demand commands\n\n| Command | What it does |\n|---|---|\n| `/pr-review` | Fast Critical/High gate over changed files — usually run by the [GitHub Action](https://github.com/thejefflarson/soundcheck-action), not by hand |\n| `/security-review` | Full OWASP sweep — subagent pipeline with threat model, hotspot mapping, parallel auditors, design review, attack-chain analysis |\n| `/contract-review` | Deep audit reading each public function alongside its callers, flagging caller/callee invariant gaps |\n\n---\n\n## Contributing\n\n1. Read `CLAUDE.md` for dev conventions\n2. Copy `docs/skill-template.md` to `.claude/skills/\u003cname\u003e/SKILL.md`\n3. Fill in all fields — no TODO placeholders\n4. Add a test case to `docs/test-cases/\u003cname\u003e.\u003cext\u003e`\n5. Run the static validator — must pass:\n   ```bash\n   python scripts/validate-skills.py --skill \u003cname\u003e\n   ```\n6. Run the smoke test to confirm Claude detects the vulnerability:\n   ```bash\n   python scripts/smoke-test-skills.py --skill \u003cname\u003e --verbose\n   ```\n\nSkills must be under 600 words, include CWE references, and have a\nconcrete runnable code rewrite in `Fix immediately` (or `Procedure`\nfor analysis/orchestrator skills). Test cases should cover multiple\nlanguages where the vulnerable API differs. Audit status in\n`docs/test-case-audit.md`.\n\n## Nominating a new threat\n\nThe threat landscape moves faster than OWASP's publication cycle. To\nnominate an emerging threat:\n\n1. Open a GitHub Issue using the\n   **[Threat Nomination](.github/ISSUE_TEMPLATE/threat-nomination.md)**\n   template\n2. Include at least one real-world source (CVE, writeup, or incident)\n3. Paste a short code snippet showing the vulnerable pattern — if you\n   can't show code, the threat may not be detectable yet\n\nWe auto-label nominations `threat-candidate` and review them quarterly.\nThe backlog lives in [`docs/threat-radar.md`](docs/threat-radar.md),\nwhich tracks 14+ threats across `watching`, `candidate`, `in-progress`,\nand `shipped` tiers.\n\n---\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthejefflarson%2Fsoundcheck","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthejefflarson%2Fsoundcheck","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthejefflarson%2Fsoundcheck/lists"}