{"id":48646566,"url":"https://github.com/santhsecurity/keyhog","last_synced_at":"2026-05-27T01:03:17.887Z","repository":{"id":345734367,"uuid":"1187126779","full_name":"santhsecurity/keyhog","owner":"santhsecurity","description":"The fastest, most accurate secret scanner. 896 detectors, Hyperscan SIMD, GPU acceleration, 96% recall. Built in Rust.","archived":false,"fork":false,"pushed_at":"2026-05-17T22:30:36.000Z","size":12329,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-17T23:42:07.189Z","etag":null,"topics":["api-keys","credentials","devsecops","git","gpu","hyperscan","pre-commit","rust","secret-detection","secret-scanner","security","simd"],"latest_commit_sha":null,"homepage":"https://santh.dev","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/santhsecurity.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":"audit.toml","citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-20T11:21:21.000Z","updated_at":"2026-05-17T22:30:40.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/santhsecurity/keyhog","commit_stats":null,"previous_names":["santhsecurity/keyhog"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/santhsecurity/keyhog","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/santhsecurity%2Fkeyhog","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/santhsecurity%2Fkeyhog/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/santhsecurity%2Fkeyhog/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/santhsecurity%2Fkeyhog/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/santhsecurity","download_url":"https://codeload.github.com/santhsecurity/keyhog/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/santhsecurity%2Fkeyhog/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33389229,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-23T04:15:53.637Z","status":"ssl_error","status_checked_at":"2026-05-23T04:15:53.242Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api-keys","credentials","devsecops","git","gpu","hyperscan","pre-commit","rust","secret-detection","secret-scanner","security","simd"],"created_at":"2026-04-10T05:04:10.728Z","updated_at":"2026-05-27T01:03:17.842Z","avatar_url":"https://github.com/santhsecurity.png","language":"Rust","funding_links":[],"categories":["Dependency intelligence"],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eKeyHog\u003c/h1\u003e\n\n\u003ch3 align=\"center\"\u003eThe fastest, most accurate secret scanner. Built in Rust.\u003c/h3\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://crates.io/crates/keyhog\"\u003e\u003cimg src=\"https://img.shields.io/crates/v/keyhog?style=flat-square\u0026color=D93025\" alt=\"crates.io\" /\u003e\u003c/a\u003e\n  \u003ca href=\"LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/license-MIT-blue?style=flat-square\" alt=\"MIT\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/santhsecurity/keyhog/actions\"\u003e\u003cimg src=\"https://img.shields.io/github/actions/workflow/status/santhsecurity/keyhog/ci.yml?style=flat-square\u0026label=CI\" alt=\"CI\" /\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n![KeyHog Demo](keyhog-demo.gif)\n\n---\n\n**KeyHog** scans source trees, git history, Docker images, S3 buckets, and\nrunning systems for leaked credentials. It compiles **891 service-specific\ndetectors** into a single Hyperscan NFA database, decodes nested encodings\nbefore matching, calibrates confidence per detector via Bayesian\nBeta(α,β) feedback, and routes every scan to the fastest hardware backend\npresent:\n\n| Backend | When | How |\n|---|---|---|\n| `gpu-zero-copy` | discrete GPU + ≥256 MiB scan | vyre AC automaton on GPU cores; cudagrep NVMe → VRAM DMA |\n| `simd-regex` | AVX-512 / AVX2 / NEON + Hyperscan | parallel multi-pattern NFA at ~500 MB/s |\n| `cpu-fallback` | no SIMD, no GPU | Aho-Corasick prefix + Rust `regex` extraction |\n\nBackend selection is automatic. On startup:\n\n```\nKeyHog v0.5.17 | 16 cores | SIMD: AVX-512 | Hyperscan | 891 detectors\n```\n\n📘 **Full documentation:** [`/site/`](./site/) — install, scan, output formats,\nCI integration, detector catalog (all 891 filterable), architecture,\nlockdown mode, FAQ. Build with `python3 site/build.py`; host on GitHub Pages.\n\n---\n\n## Install\n\n```bash\n# Recommended: SIMD + ML + decode + multiline + verification\ncargo install keyhog\n\n# With GPU acceleration (NVMe → VRAM DMA, multi-GB monorepo scans)\ncargo install keyhog --features gpu\n\n# From source (latest main)\ngit clone https://github.com/santhsecurity/keyhog.git\ncd keyhog \u0026\u0026 cargo install --path crates/cli\n```\n\nWorks on **Linux**, **macOS** (Intel + Apple Silicon), **Windows**. Zero\nconfiguration — `keyhog scan .` works out of the box.\n\nPrebuilt binaries attached to each [release](https://github.com/santhsecurity/keyhog/releases).\nDocker: `docker run --rm -v \"$PWD:/src\" santhsecurity/keyhog scan /src`.\n\n## Quickstart\n\n```bash\nkeyhog scan .                                          # scan a directory\nkeyhog scan --git-staged                               # pre-commit: only staged blobs\nkeyhog scan --git-diff main                            # files changed since base ref\nkeyhog scan --git-history .                            # every commit, every branch\nkeyhog scan --docker-image registry/app:v1             # Docker image layers\nkeyhog scan --s3-bucket logs-prod --s3-prefix /        # S3 objects (--s3-endpoint for non-AWS)\nkeyhog scan --github-org acme --github-token \"$GH_PAT\" # every repo in a GitHub org (PAT required)\nkeyhog scan-system --space 50G                         # walk every drive, every git history\n```\n\nFilter, format, gate:\n\n```bash\nkeyhog scan . --severity high                  # info | low | medium | high | critical\nkeyhog scan . --min-confidence 0.5             # raise the ML floor\nkeyhog scan . --format sarif -o keyhog.sarif   # GitHub code scanning\nkeyhog scan . --verify                         # live-verify against vendor APIs\nkeyhog scan . --baseline .keyhog-baseline.json # only NEW findings vs snapshot\nkeyhog scan . --fast                           # pre-commit speed (skip ML + decode)\nkeyhog scan . --deep                           # max detection depth\nkeyhog scan . --incremental                    # BLAKE3 Merkle skip → 10–100× CI loop\n```\n\nExit codes: `0` clean, `1` findings above the severity floor, `2` error\n(bad path, unreadable file, unsupported flag), `3` `detectors --audit`\nflagged a quality issue, `4` `backend --self-test` failed, `10` live\ncredentials found (requires `--verify`), `11` scanner thread panicked\nmid-scan (state is unreliable, re-run before trusting). Matches\n`keyhog --help`.\n\n## What it catches\n\n891 service-specific detectors with checksum / companion validation:\n\n- **Cloud providers** — AWS (access key + secret + STS verification),\n  Azure (subscription key, storage account key, SAS), GCP (service account,\n  API key), Cloudflare, Heroku, Vercel, Supabase.\n- **Payment processors** — Stripe, Braintree, Razorpay, Paddle, Plaid,\n  Square, PayPal — all with companion-required validation (a Braintree\n  private key without its public counterpart never fires).\n- **Source forges** — GitHub PATs (with CRC32 checksum), GitLab tokens,\n  Bitbucket app passwords, npm tokens (with checksum), Gitea / Forgejo\n  / Codeberg.\n- **Auth / SSO** — Okta, Auth0, Clerk, JumpCloud, Kinde.\n- **Comms** — Slack, Discord, Twilio, SendGrid, Postmark, Mailgun,\n  Resend, Loops.\n- **AI / ML** — OpenAI (sk-/sk-proj-), Anthropic, Google AI Studio,\n  Cohere, Mistral, HuggingFace, Replicate.\n- **Databases** — Postgres connection strings, MongoDB Atlas, Supabase\n  service-role, PlanetScale, Neon, Turso, MySQL, Redis URLs.\n- **Generic + entropy fallback** — `API_KEY=\u003chigh-entropy-blob\u003e` catches\n  credentials with no named detector, gated by per-context entropy\n  thresholds + ML scoring.\n- **Cryptographic material** — RSA / EC / SSH private keys, PGP private\n  blocks, JWT signing secrets.\n\nEach detector ships as a [TOML file](./detectors/) (data, not code):\nservice metadata, regex patterns, keywords, companion fields,\nverification handler. Adding a new detector is 5–10 lines of TOML;\nthe [contributor guide](./CONTRIBUTING.md) walks through it.\n\nBrowse the full catalog at [`/site/detectors.html`](./site/detectors.html) —\nloads all 891 with severity + service + keyword filter.\n\n## Why higher recall, fewer false positives\n\n- **Decode-through scanning.** Kubernetes `Secret` manifests, JWT payloads,\n  base64-wrapped envs, helm values, docker-config `auth:` blobs — the\n  structured preprocessor decodes them in place and feeds every\n  downstream detector the plaintext, so detectors don't each need to\n  re-implement decoding.\n- **Multiline reassembly.** `\"sk-proj-\" + \\` continuation in JavaScript,\n  YAML multi-line strings, Makefile backslash-continuation, Helm /\n  Jinja templated outputs — all reassembled before regex matching.\n- **Companion-required validation.** AWS access key without its 40-char\n  secret? Skipped. Twilio API key without its auth token? Skipped.\n  Two-out-of-two signals are required for the high-noise detectors,\n  cutting the canonical `git log -G ghp_` false-positive cluster.\n- **Confidence scoring.** Every finding carries a `[0.0, 1.0]` score\n  derived from Shannon entropy, surrounding context, companion match,\n  checksum (GitHub CRC32, npm, Slack), and a small ML classifier\n  (~30k params). Default threshold `0.3` filters low-quality matches\n  without hiding real secrets.\n- **Bayesian per-detector calibration.** `keyhog calibrate --fp generic-api-key`\n  feeds a Beta(α,β) posterior that damps detectors that fire wrongly in\n  your codebase, sharpening over time without manual rule tuning.\n\n## Performance\n\nMeasured head-to-head against the major scanners on the same corpora:\n\n| | KeyHog | Gitleaks | BetterLeaks | TruffleHog | Titus |\n|---|---|---|---|---|---|\n| **Recall** \u003csmall\u003e(25-secret synthetic benchmark)\u003c/small\u003e | **96 %** | 72 % | 72 % | 28 % | 32 % |\n| **Recall** \u003csmall\u003e(15k SecretBench-medium, realistic wrappers)\u003c/small\u003e | **69 %** | 41 % | 48 % | 22 % | 25 % |\n| **Precision** \u003csmall\u003e(SecretBench-medium)\u003c/small\u003e | **90 %** | 87 % | 81 % | 73 % | 19 % |\n| **False positives** \u003csmall\u003e(Django, 0 real secrets)\u003c/small\u003e | **1** | 0 | 0 | 0 | 17 481 |\n| **Speed** \u003csmall\u003e(Django 86 MB)\u003c/small\u003e | **0.5 s** | 0.3 s | 0.2 s | 1.4 s | 2.3 s |\n| **Speed** \u003csmall\u003e(Kubernetes 397 MB)\u003c/small\u003e | **1.1 s** | — | — | — | 3.5 s |\n| **Speed** \u003csmall\u003e(large monorepo, 4.2 GB)\u003c/small\u003e | **2.5 s** | — | — | — | 252 s |\n\nKeyHog finds **33 % more real secrets** than the next-best tool while\nmaintaining near-zero false positives. The two recall numbers are both\nreal: 96 % on a tight synthetic test set, 69 % on a 15k-fixture\nadversarial corpus that wraps secrets in realistic env-var\ndistributions. Real codebases land between the two depending on how\nmuch CI / k8s / structured config they contain.\n\nReproduce: `cargo bench --bench scan_throughput` or run\n`./tools/secretbench/scoring/leaderboard.py --corpus \u003cpath\u003e` against\nyour own fixtures.\n\n## CI integration\n\n### GitHub Actions\n\n```yaml\n- uses: santhsecurity/keyhog/.github/actions/keyhog@v0.5.17\n  with:\n    path: .\n    severity: high       # info | low | medium | high | critical\n    format: sarif        # SARIF auto-uploads to GitHub code scanning\n    baseline: .keyhog-baseline.json   # block only NEW findings\n```\n\nAuto-downloads a prebuilt binary; falls back to `cargo build` when no\nrelease asset matches the host triple. SARIF carries CWE-798 + OWASP\nA07:2021 taxa on every finding.\n\nOther CIs (GitLab, CircleCI, Drone, BuildKite, Jenkins), pre-commit\nrecipes, Husky / lefthook, and the full SARIF schema:\n[`site/ci.html`](./site/ci.html) and [`docs/DROP_IN_USAGE.md`](docs/DROP_IN_USAGE.md).\n\n### Pre-commit hook\n\n```bash\nkeyhog hook install                    # writes .git/hooks/pre-commit\nkeyhog hook install --no-daemon        # ~1 s slower per commit\n```\n\nOr via the `pre-commit` framework:\n\n```yaml\nrepos:\n  - repo: https://github.com/santhsecurity/keyhog\n    rev: v0.5.17\n    hooks:\n      - id: keyhog\n```\n\n## Daemon mode (105× faster re-scan)\n\nEvery keyhog invocation pays a ~3 s cold start to compile 891 detectors\ninto Hyperscan. Run keyhog as a daemon and that cost is paid once per\nhost — every subsequent scan is **~7 ms**:\n\n```bash\nkeyhog daemon start                    # Unix socket on $XDG_RUNTIME_DIR\nkeyhog scan --stdin --daemon \u003c .env    # 7 ms instead of 740 ms\nkeyhog daemon status\nkeyhog daemon stop\n```\n\nUse it in pre-commit hooks, IDE save handlers, or any per-commit CI\nloop. systemd / launchd unit examples in\n[`site/daemon.html`](./site/daemon.html).\n\nWatch-mode for IDEs:\n\n```bash\nkeyhog watch ./src                     # inotify/FSEvents/RDCW; sub-100 ms per save\n```\n\n## System-wide credential triage\n\n```bash\nsudo keyhog scan-system --space 50G                  # default 50 GiB ceiling\nsudo keyhog scan-system --space 1T --include-network # also scan NFS / SMB\nsudo keyhog scan-system --space 10G --no-git-history # skip historical blobs\n```\n\nEnumerates every mounted drive (skipping pseudo-FS like `/proc`,\n`/sys`, `tmpfs`, `nsfs`, `fuse.snapfuse`), auto-discovers every `.git`\n(worktrees + bare repos + submodules), and runs the full scan +\ngit-history pipeline. Honors a hard `--space \u003cbytes\u003e` ceiling and\nexits 1 on findings. Built for incident-response triage, M\u0026A\ninheritance audits, and quarterly developer-laptop sweeps.\n\n## Lockdown mode (security-critical embeddings)\n\nFor deployments where keyhog runs **on the same machine that holds the\nsecrets** (e.g. paired with [EnvSeal](https://github.com/santhsecurity/envseal))\nand there is no trusted boundary between the scanner and the\ncredentials it inspects:\n\n```bash\nkeyhog scan . --lockdown\n```\n\nEnforces:\n\n- `mlockall(MCL_CURRENT|MCL_FUTURE)` on Linux — credentials never page\n  to swap.\n- `PR_SET_DUMPABLE = 0` (always on, even outside lockdown) — disables\n  core dumps, ptrace, `/proc/\u003cpid\u003e/mem` reads. macOS gets\n  `PT_DENY_ATTACH`.\n- Refuses to run if `~/.cache/keyhog/*` exists, refuses\n  `--incremental` writes, refuses `--verify`, refuses\n  `--show-secrets`, refuses to start if kernel `coredump_filter`\n  would dump anonymous pages.\n\nThe always-on hardening (everything except mlock + cache refusal) is\napplied to every keyhog invocation — even without `--lockdown` a\nkeyhog binary can't be coredumped or ptraced.\n\n## Library API\n\n```rust\nuse keyhog_core::{Chunk, ChunkMetadata};\nuse keyhog_scanner::CompiledScanner;\n\nlet detectors = keyhog_core::embedded_detectors();   // 891 built-in\nlet scanner = CompiledScanner::compile(detectors)?;\n\nlet findings = scanner.scan(\u0026Chunk {\n    data: \"TOKEN=sk_live_EXAMPLE…\".into(),\n    metadata: ChunkMetadata::default(),\n});\n```\n\nMix shipped + custom detectors by concatenating before compile. The\nscanner is `Send + Sync`; share one across rayon workers. Streaming\nsource helpers in `keyhog-sources` (file-system, git, stdin, Docker,\nS3, GitHub org). Live verification in `keyhog-verifier`.\n\nFull API surface + stability policy: [`site/api.html`](./site/api.html).\n\n## Configuration\n\nPer-repo defaults via `.keyhog.toml`:\n\n```toml\n[scan]\nseverity = \"high\"\nmin_confidence = 0.5\nexclude = [\"**/test/fixtures/**\", \"vendor/\"]\n\n[allowlist]\nfile = \".keyhogignore\"\nrequire_reason = true\nrequire_approved_by = true\nmax_expires_days = 180\n\n[detector.generic-api-key]\nenabled = false                # noisy detector? turn it off\n\n[lockdown]\nrequire = true                 # refuse to run without --lockdown\n```\n\nPrecedence (later overrides earlier): compiled defaults → system →\nuser → repo → env → CLI flags. Full reference:\n[`site/config.html`](./site/config.html).\n\nAllowlist a known leak with a hash, path glob, or detector id — plus\noptional `reason` / `expires` / `approved_by` governance metadata:\n\n```\n# .keyhogignore — gitignore-style shorthand\n*.log\nnode_modules/\n9d6060e21ef8d5daec9cfe4a44b1b1bc9792246bfad28210edaaa1782a8a676a\n\n# Explicit form with governance\nhash:9f86d081…    ; reason=\"rotated 2026-04-25\" ; expires=2026-07-01 ; approved_by=\"security@acme\"\ndetector:demo-token\npath:**/fixtures/*.env\n```\n\nEntries past `expires` are silently dropped on load with a WARN.\n\n## Architecture\n\n```\ncrates/\n  core/       Detector loading, finding types, reporting (text/JSON/SARIF), allowlists\n  scanner/    Hardware routing, Hyperscan, GPU, decode-through, entropy, ML, multiline\n  sources/    File system, git (staged/diff/history), stdin, Docker, S3, GitHub org, web\n  verifier/   Live credential verification against ~80 service APIs\n  cli/        CLI binary, daemon, watch, baselines, calibrate, hook installer\ndetectors/    891 TOML files (data, not code)\nsite/         Documentation site (17 pages, GitHub-Pages-ready)\ntools/        SecretBench mirror + scoring + leaderboard harness\n```\n\nTwo-phase coalesced scan:\n\n1. **Phase 1** — Hyperscan NFA on raw bytes, parallel across all files\n   via rayon. 95 %+ of files have no hits and pay zero cost.\n2. **Phase 2** — full extraction on hits only: regex capture groups,\n   companion matching, checksum validation, entropy gating, ML\n   confidence + Bayesian damping.\n\nResult: a multi-GB monorepo scans in seconds. Determinism is part of\nthe contract — same input → same output, byte-exact, every time.\n\nFull architecture writeup, hardware routing matrix, profiling tips:\n[`site/architecture.html`](./site/architecture.html) and\n[`site/performance.html`](./site/performance.html).\n\n## Other useful subcommands\n\n```bash\nkeyhog detectors --search aws --verbose      # list / inspect detectors\nkeyhog explain aws-access-key                # spec, regex, severity, rotation guide\nkeyhog diff before.json after.json           # NEW / RESOLVED / UNCHANGED for CI gates\nkeyhog calibrate --tp aws-access-key         # record a true positive\nkeyhog calibrate --fp generic-api-key        # record a false positive\nkeyhog calibrate --show                      # posterior-mean bar chart per detector\nkeyhog backend                               # detected hardware + routing matrix\nkeyhog completion zsh                        # shell completions (bash/zsh/fish/powershell/elvish)\n```\n\n## Contributing\n\n- **New detector?** Drop a TOML in [`detectors/`](./detectors/), open a\n  PR. The contributor guide ([`CONTRIBUTING.md`](./CONTRIBUTING.md))\n  has the schema and a worked example.\n- **Bug / missed secret / false positive?** File an issue with the\n  redacted credential shape and detector id; each report becomes a\n  permanent test fixture under\n  [`tests/contracts/`](./crates/scanner/tests/contracts/).\n- **Security issue in keyhog itself?** Don't open a public issue —\n  email `security@santh.dev` (PGP key on the org page).\n\n[Changelog](./CHANGELOG.md). [Open issues](https://github.com/santhsecurity/keyhog/issues).\n\n## Credits\n\nkeyhog stands on prior secret-scanning work. Ideas borrowed from:\n\n- [trufflehog](https://github.com/trufflesecurity/trufflehog) — detector breadth + verification semantics\n- **betterleaks** — entropy/keyword fusion and false-positive suppression\n- **titus** — scanning ergonomics and severity calibration\n\nThanks to these projects and their contributors.\n\n## License\n\nMIT. Use commercially, embed, fork, sell a hosted version. The\ndetector TOMLs are also MIT — adding one is a 5-line PR with zero\nlegal friction.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsanthsecurity%2Fkeyhog","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsanthsecurity%2Fkeyhog","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsanthsecurity%2Fkeyhog/lists"}