{"id":50613098,"url":"https://github.com/conduktor/kafka-security-scanner","last_synced_at":"2026-06-06T05:31:09.508Z","repository":{"id":356764888,"uuid":"1233981780","full_name":"conduktor/kafka-security-scanner","owner":"conduktor","description":"Audit Apache Kafka clusters against a YAML-driven catalogue of security and reliability controls. Maps findings to PCI-DSS, SOC2, NIST 800-53, ISO 27001. Outputs SARIF, HTML, CSV, PDF, JSON.","archived":false,"fork":false,"pushed_at":"2026-05-09T15:51:55.000Z","size":46,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-09T17:40:53.532Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/conduktor.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-09T15:42:40.000Z","updated_at":"2026-05-09T15:51:59.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/conduktor/kafka-security-scanner","commit_stats":null,"previous_names":["conduktor/kafka-security-scanner"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/conduktor/kafka-security-scanner","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/conduktor%2Fkafka-security-scanner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/conduktor%2Fkafka-security-scanner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/conduktor%2Fkafka-security-scanner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/conduktor%2Fkafka-security-scanner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/conduktor","download_url":"https://codeload.github.com/conduktor/kafka-security-scanner/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/conduktor%2Fkafka-security-scanner/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33971106,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-06T02:00:07.033Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-06T05:31:08.570Z","updated_at":"2026-06-06T05:31:09.497Z","avatar_url":"https://github.com/conduktor.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# kafka-security-scanner\n\n**Status: work in progress.** Catalogue, collectors and report formats are still moving. APIs and exit codes may change between commits.\n\nSite: **https://kafka-security-scanner.dev/**\n\nScan a Kafka cluster against a catalogue of security and reliability controls. Works against Apache Kafka and anything that speaks the protocol.\n\nGive it a bootstrap server (plus credentials if the cluster needs them) and a principal that can `Describe` brokers, topics, and ACLs. You get back a graded report. Hook it into CI to fail PRs that introduce regressions.\n\n## What you get\n\n```\n$ kafka-security-scanner --bootstrap broker.prod:9092 --policy enterprise \\\n    --collectors adminclient,filesystem,tls,siem,alerts,connect,schemaregistry,docs,jmx \\\n    --kafka-config-dir /etc/kafka --docs-dir ./governance \\\n    --prometheus-url http://prom:9090 --connect-url http://connect:8083 \\\n    --schema-registry-url http://sr:8081 --jmx-host-port broker.prod:9999\n\n=== Kafka Security Scanner ===\nBootstrap: broker.prod:9092\nPolicy:    enterprise-default.yaml  (138 controls)\nKafka flavor: vanilla  (hostname:broker.prod)\n\nCollecting cluster data... brokers=3 topics=42 acls=17\nEvaluating 138 controls...\n\n  Score: 72/100  |  Pass: 94  |  Fail: 19  |  N/A: 25  |  Pass Rate: 83%\n\n  Top findings:\n    critical  KAFKA-NET-004  PLAINTEXT listener detected\n    critical  KAFKA-ENC-001  Inter-broker communication is not encrypted\n    high      KAFKA-ACL-006  Default allow policy — unauthenticated users can access resources\n    high      KAFKA-AUDIT-010  Audit log layout does not capture principal and/or client identity\n    medium    KAFKA-MON-005  Consumer lag is not monitored\n    ...\n\nWrote: reports/report.json   reports/report.sarif   reports/report.html\n       reports/report.csv    reports/report.pdf\n```\n\nThe same run also writes the HTML control-center report, with in-scope/out-of-scope tabs, readiness and theme filters, search, and per-control evidence for auditors.\n\n![Kafka Security Scanner HTML control-center report](docs/report-control-center.png)\n\nIt also writes a SARIF file for GitHub Code Scanning, a CSV the auditors can filter by `pci_dss`/`soc2`/`iso27001`, and a PDF with a cover page if someone has to sign it off.\n\n## Why a policy engine\n\nMost scanners in this space ship a fixed list of checks compiled into the binary. When your auditor asks \"show me every control that maps to PCI-DSS 4.1,\" you read source code.\n\nHere, controls are data. Each one is a YAML entry with a condition, a severity, a remediation, and the regulations it covers. Want a stricter prod policy and a permissive dev one? Two files. Need to know which controls satisfy a given clause? It's already in the finding's `compliance` block. None of it needs a rebuild.\n\nThe control catalogue, and its mappings to CWE, NIST 800-53, PCI-DSS 4.0, SOC2, and ISO 27001, lives in [`conduktor/kafka-security-controls`](https://github.com/conduktor/kafka-security-controls). That's where the regulation discussion happens. This repo runs the result.\n\n## What the scanner actually sees\n\nThe scanner refuses to lie. Every control evaluates to a real boolean against collected data OR is explicitly covered by a managed-service contract. There is no \"attestation\" status — silent placeholder controls (`condition: \"true\"` with no escape hatch) are rejected at policy load.\n\nStatus is one of: `pass`, `fail`, `na` (required collector unavailable), `covered_by_flavor` (managed-service SLA), `error` (CEL eval failure — never happens in steady state).\n\nThe split between automatic and N/A shifts as you enable more collectors. The reference catalogue has 138 controls; on a single-broker plaintext cluster with only `--collectors=adminclient`, ~16 pass / ~41 fail / ~63 N/A. Wire in the cloud-native collectors (`--cc-api-key`, `--aws-region`, `--aiven-token`, ...) and the N/A bucket shrinks toward zero.\n\n## Collectors\n\nEach collector populates a slice of the cluster snapshot. Controls declare `requires: [...]` (or `requires_per_mode:` for ZK/KRaft-branched checks) and the engine returns `na` when a required collector isn't running. No collector → no silent pass.\n\nCollectors run concurrently on virtual threads — adding a slow probe doesn't block the cheap ones.\n\n### Core (cluster + host)\n\n| Collector       | Flag(s)                                                       | What it sees                                                                                          |\n|-----------------|---------------------------------------------------------------|------------------------------------------------------------------------------------------------------|\n| `adminclient`   | enabled by default                                            | Broker configs (incl. `config_int` numeric mirror), topic configs, ACLs, KRaft state, system topics  |\n| `filesystem`    | `--kafka-config-dir /etc/kafka`                               | server.properties, log4j layout pattern parser, retention proof, /proc/mounts cryptsetup probe       |\n| `jmx`           | `--jmx-host-port host:9999[,host2:9999]`                      | Multi-target broker MBeans: URP, OfflinePartitions, RequestHandlerIdle, GC, FD                       |\n| `tls`           | `--collectors=...,tls`                                        | TLS handshake to bootstrap host, leaf cert chain, expiry, key size, SAN, cipher                      |\n| `process`       | `--collectors=...,process` (Linux)                            | /proc/\u003cpid\u003e/cmdline + limits: JVM flags, heap, GC, ulimits, Kafka version                            |\n| `consumerjmx`   | `--consumer-jmx-host-ports host:1099,...`                     | consumer-fetch-manager-metrics records-lag-max per client                                            |\n| `streams`       | `--streams-jmx-host-ports`, `--streams-state-dir`             | Streams app JMX + state.dir POSIX audit                                                              |\n\n### Ecosystem (REST APIs)\n\n| Collector         | Flag(s)                                                                | What it sees                                                                              |\n|-------------------|------------------------------------------------------------------------|------------------------------------------------------------------------------------------|\n| `connect`         | `--connect-url http://host:8083`                                       | Per-connector config: transforms, MM2 security, DLQ, REST auth posture                   |\n| `schemaregistry`  | `--schema-registry-url http://host:8081`                               | Per-subject schema (annotations: `@encrypt` / `@tokenized` / `@owner`); write-auth probes require `--allow-active-probes` |\n| `restproxy`       | `--rest-proxy-url http://host:8082`                                    | REST Proxy auth posture                                                                  |\n| `alerts`          | `--prometheus-url http://prom:9090`                                    | Prometheus rule scan: auth-failure / ACL-change / quota-breach / anomaly / consumer-lag  |\n| `siem`            | `--collectors=...,siem`                                                | Local process + 127.0.0.1 port probe for vector / fluentd / filebeat / splunkforwarder   |\n| `zk`              | `--zk-admin-host-port host:2181`                                       | ZK 4lw probe — sensitive_commands_leaked (dump/envi/wchs/...)                           |\n| `docs`            | `--docs-dir ./governance`                                              | Governance artefact presence + age (key-rotation-log, admin-principals, ...)            |\n| `cis`             | `--cis-report ./cis.json`                                              | cis-cat / kube-bench / inspec JSON ingest → pass_ratio + failed_ids                     |\n\n### Cloud-native (vendor APIs)\n\n| Collector          | Auto-activated on flavor | Flag(s)                                                                                       | What it sees                                                                                |\n|--------------------|--------------------------|----------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|\n| `confluentcloud`   | `*.confluent.cloud`      | `--cc-api-key`, `--cc-api-secret`, `--cc-cluster-id` (env CC_API_KEY/CC_API_SECRET/CC_CLUSTER_ID) | api.confluent.cloud + Metrics API auth; cluster spec (dedicated/enterprise, private network, BYOK) |\n| `awsmsk`           | `*.kafka*.amazonaws.com` | `--aws-region`, `--aws-msk-cluster-arn` (default cred chain: AWS_PROFILE / IRSA / IMDSv2)    | MSK cluster spec, EC2 SG ingress, CloudWatch URP/OfflinePartitionsCount                    |\n| `aiven`            | `*.aivencloud.com`       | `--aiven-token`, `--aiven-project`, `--aiven-service` (env AIVEN_TOKEN)                      | api.aiven.io auth; service spec (plan, ip_filter, kafka_authentication_methods)            |\n| `rpcloud`          | `*.cloud.redpanda.com`   | `--rp-token`, `--rp-cluster-id` (env RP_TOKEN)                                               | api.redpanda.com auth; cluster spec (connection_type, region, type)                         |\n| `azure`            | `*.servicebus.windows.net` | `--azure-token`, `--azure-subscription-id`, `--azure-resource-group`, `--azure-namespace`  | management.azure.com auth; namespace spec (TLS version, public network access, private endpoints) |\n| `gcp`              | n/a (creds-driven)       | `--gcp-token`, `--gcp-project` (env GCP_TOKEN/GCP_PROJECT; obtain via `gcloud auth print-access-token`) | compute.googleapis.com firewall scan: any 0.0.0.0/0 ingress on broker ports               |\n| `k8s`              | n/a                      | `--k8s-namespace ns` (uses local `kubectl`)                                                   | NetworkPolicy + kafka-pod selectors + default-deny detection                                 |\n\n### Derived\n\n| Collector | Flag    | What it does                                                                                    |\n|-----------|---------|------------------------------------------------------------------------------------------------|\n| `kms`     | always  | Walks broker / connect / fs configs for `${provider:path}` placeholders; classifies file/env vs external (vault/aws/gcp/azure) |\n\n### Cross-validation\n\nThe same fact can be checked by multiple collectors. Example: TLS posture on the inter-broker listener is reported by AdminClient (`security.inter.broker.protocol`) AND by the TLS collector's handshake (`tls.handshake_ok` + `tls.protocol`). Controls can `\u0026\u0026` both sides, so config drift between what the broker thinks it's serving and what it actually serves becomes visible.\n\nThe cloud-native cross-validation is the killer use case: AdminClient says encryption-in-transit is on, AwsMskCollector confirms it via the MSK API, FilesystemCollector confirms /proc/mounts has dm-crypt — drift in any one trips the AND.\n\n### Adding a collector\n\nImplement `io.kafkascanner.collectors.Collector`:\n\n```java\npublic final class CloudIamCollector implements Collector {\n    public String name() { return \"cloud-iam\"; }\n    public boolean isAvailable(CollectorContext c) { return c.cloudCreds() != null; }\n    public Map\u003cString, Object\u003e collect(CollectorContext c) {\n        return Map.of(\"cloud_iam\", iamSnapshot(c.cloudCreds()));\n    }\n}\n```\n\nWire it in `Main.java` behind a `--collectors=cloud-iam` flag, expose `cloud_iam` to CEL in `PolicyEngine`, and write controls with `requires: [cloud-iam]`. PRs welcome.\n\n### Flavors\n\nAuto-detected from the first hostname in `--bootstrap`:\n\n| Pattern                              | Flavor              |\n|--------------------------------------|---------------------|\n| `*.confluent.cloud`                  | `confluent-cloud`   |\n| `*.kafka.\u003cregion\u003e.amazonaws.com`     | `aws-msk`           |\n| `*.aivencloud.com`                   | `aiven`             |\n| `*.cloud.redpanda.com`               | `redpanda-cloud`    |\n| `*.servicebus.windows.net`           | `azure-eventhubs`   |\n| `*.warpstream.com`                   | `warpstream`        |\n| `*.conduktor.io` / `.cloud`          | `conduktor-gateway` |\n| anything else                        | `vanilla`           |\n\nOverride with `--kafka-flavor confluent-cloud` if your hostname doesn't match (private DNS, on-prem with a vanity name, etc.). Flavor is included in every finding's evidence and at the top of the report.\n\n## Quick start\n\nSelf-hosted Kafka, broad coverage:\n\n```bash\n./install.sh\nkafka-security-scanner \\\n  --bootstrap localhost:9092 \\\n  --policy enterprise \\\n  --collectors adminclient,filesystem,tls,siem,docs \\\n  --kafka-config-dir /etc/kafka \\\n  --docs-dir ./governance \\\n  --format terminal,json,sarif,html\n```\n\nWith SASL:\n\n```bash\nkafka-security-scanner \\\n  --bootstrap broker:9092 \\\n  --security-protocol SASL_PLAINTEXT \\\n  --sasl-mechanism SCRAM-SHA-512 \\\n  --sasl-username admin \\\n  --sasl-password \"$KAFKA_PASSWORD\" \\\n  --policy enterprise\n```\n\nFor production clusters, prefer a real Kafka client properties file when you\nneed truststores, keystores, mTLS, OAuth callback handlers, or custom client\nsettings:\n\n```bash\nkafka-security-scanner \\\n  --bootstrap broker:9093 \\\n  --kafka-client-config ./client.properties \\\n  --policy enterprise\n```\n\nThe scanner is non-mutating by default. Probes that may write to a target\nsystem, such as Schema Registry anonymous-write verification, only run when\n`--allow-active-probes` is passed in a controlled environment.\n\n**Confluent Cloud:**\n\n```bash\nexport CC_API_KEY=...; export CC_API_SECRET=...\nkafka-security-scanner \\\n  --bootstrap pkc-XXXXX.us-east-1.aws.confluent.cloud:9092 \\\n  --collectors adminclient,confluentcloud,connect,schemaregistry \\\n  --cc-cluster-id lkc-XXXXX \\\n  --connect-url https://api.confluent.cloud --schema-registry-url ...\n```\n\n**AWS MSK (uses default AWS credential chain):**\n\n```bash\nkafka-security-scanner \\\n  --bootstrap b-1.cluster.kafka.us-east-1.amazonaws.com:9098 \\\n  --collectors adminclient,awsmsk \\\n  --aws-region us-east-1 \\\n  --security-protocol SASL_SSL --sasl-mechanism AWS_MSK_IAM\n```\n\n**Aiven / Redpanda Cloud / Azure EventHubs / GCP** — same pattern, see the table above. Each collector self-activates when the bootstrap host matches the flavor's pattern OR when its credential flags are passed.\n\n**Kubernetes-deployed Kafka (Strimzi or otherwise):**\n\n```bash\nkafka-security-scanner \\\n  --bootstrap kafka.kafka:9092 \\\n  --collectors adminclient,k8s \\\n  --k8s-namespace kafka\n```\n\n**With a CIS hardening report:**\n\n```bash\ncis-cat-pro --benchmark \"CIS Apache Kafka Benchmark\" --output cis-out.json\nkafka-security-scanner -b localhost:9092 \\\n  --collectors adminclient,cis \\\n  --cis-report cis-out.json\n```\n\nExit codes are picked for CI gates:\n\n- `0` — clean below the `--fail-on` threshold (default `high`)\n- `1` — findings at or above the threshold (block the merge)\n- `2` — scan itself failed (cluster unreachable, broken policy)\n\n## Reports\n\n| Format     | Audience                                         |\n|------------|--------------------------------------------------|\n| `terminal` | Engineer running the scan                        |\n| `json`     | Pipelines, dashboards, anything downstream       |\n| `sarif`    | GitHub Code Scanning, Defender, any SAST tool    |\n| `html`     | Stakeholders skimming for the red items          |\n| `csv`      | Auditors filtering by control ID or framework    |\n| `pdf`      | Sign-off document with cover page and signatures |\n\nPass any combination via `--format`.\n\nJSON, HTML, and CSV reports include `control_results` for every control, not\nonly failures. Each result includes redacted evidence, collector availability,\nthe evaluated condition, flavor coverage proof when relevant, and compliance\nmappings.\n\n## Policies\n\nBuilt-in:\n\n- `enterprise` → `policies/enterprise-default.yaml`, full 118-control catalogue\n- `community`, `baseline` → `policies/test-minimal-valid.yaml`, 12-control smoke test\n- Or pass a path to your own YAML\n\nWhat a control looks like:\n\n```yaml\n- id: SEC-001\n  title: Broker TLS encryption is enabled\n  severity: critical\n  category: security\n  condition: brokers.all(b, b.listeners.all(l, l.protocol in ['SSL', 'SASL_SSL']))\n  message: One or more listeners use PLAINTEXT\n  remediation: Configure listeners with SSL:// or SASL_SSL://\n  compliance:\n    pci_dss: [\"3.4\", \"4.1\"]\n    soc2: [\"CC6.1\"]\n```\n\nConditions are CEL expressions evaluated by [cel-java](https://github.com/google/cel-java) over the cluster snapshot (`brokers`, `topics`, `acls`, `cluster`). Adding a check means editing YAML; no Java involved.\n\nIf you want a new control in the shared catalogue, the PR goes to [`conduktor/kafka-security-controls`](https://github.com/conduktor/kafka-security-controls). The YAML here is the projection.\n\n## CI integration\n\n```yaml\n- run: kafka-security-scanner --bootstrap $KAFKA --format sarif --out reports --fail-on high\n- uses: github/codeql-action/upload-sarif@v3\n  with:\n    sarif_file: reports/report.sarif\n```\n\n## Test matrix\n\n`docker-compose.test-matrix.yaml` ships six broker variants: Apache Kafka 3.9 and 4.2 in PLAINTEXT, SASL_PLAINTEXT, and ACL flavours, plus two Kafka-API-compatible alternatives. The script boots each, scans it, and asserts the expected fail count per variant. Handy when you start tweaking policies and want to know what changed.\n\n```bash\nscripts/test-all-variants.sh\nscripts/test-all-variants.sh kafka-42-sasl\n```\n\n## Build\n\nJava 25 (preview) and Gradle 8.12+, or use the Gradle wrapper.\n\n```bash\ngradle build           # compile, linters, tests\ngradle installDist     # produces build/install/kafka-security-scanner/\ngradle test            # JUnit + Testcontainers; SKIP_INTEGRATION_TESTS=1 to skip\n```\n\n## License\n\nApache 2.0. See [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconduktor%2Fkafka-security-scanner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fconduktor%2Fkafka-security-scanner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconduktor%2Fkafka-security-scanner/lists"}