{"id":47666013,"url":"https://github.com/ababic/dumpling","last_synced_at":"2026-05-03T16:02:29.132Z","repository":{"id":344384128,"uuid":"1176749896","full_name":"ababic/dumpling","owner":"ababic","description":"Fast, flexibile, powerful static data anonymisation for SQL dumps","archived":false,"fork":false,"pushed_at":"2026-05-02T17:04:54.000Z","size":164,"stargazers_count":0,"open_issues_count":3,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-02T17:08:18.871Z","etag":null,"topics":["anonymisation","cli","data-analysis","data-science","pii","pii-redaction","postgres","privacy","rust","rust-lang","scrubber","scrubbing","security","tooling"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ababic.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-03-09T10:38:10.000Z","updated_at":"2026-05-02T17:06:06.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ababic/dumpling","commit_stats":null,"previous_names":["ababic/dumpling"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/ababic/dumpling","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ababic%2Fdumpling","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ababic%2Fdumpling/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ababic%2Fdumpling/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ababic%2Fdumpling/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ababic","download_url":"https://codeload.github.com/ababic/dumpling/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ababic%2Fdumpling/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32575115,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-03T06:36:36.687Z","status":"ssl_error","status_checked_at":"2026-05-03T06:36:09.306Z","response_time":103,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anonymisation","cli","data-analysis","data-science","pii","pii-redaction","postgres","privacy","rust","rust-lang","scrubber","scrubbing","security","tooling"],"created_at":"2026-04-02T11:57:27.140Z","updated_at":"2026-05-03T16:02:29.122Z","avatar_url":"https://github.com/ababic.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/logo.svg\" width=\"140\" height=\"140\" alt=\"Dumpling logo: a dumpling with steam\" /\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003eDumpling\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eSanitize SQL dumps before they go anywhere.\u003c/strong\u003e\u003cbr /\u003e\n  Turn huge \u003ccode\u003epg_dump\u003c/code\u003e / SQLite / SQL Server exports into shareable, test-friendly snapshots — no DB connection, no secrets left by accident.\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://pypi.org/project/dumpling-cli/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/v/dumpling-cli.svg\" alt=\"PyPI version\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/dumpling-cli/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/pyversions/dumpling-cli.svg\" alt=\"Python versions\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/dumpling-cli/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/l/dumpling-cli.svg\" alt=\"PyPI license\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/ababic/dumpling/actions/workflows/tests.yml\"\u003e\u003cimg src=\"https://github.com/ababic/dumpling/actions/workflows/tests.yml/badge.svg\" alt=\"Tests\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/ababic/dumpling/actions/workflows/ci.yml\"\u003e\u003cimg src=\"https://github.com/ababic/dumpling/actions/workflows/ci.yml/badge.svg\" alt=\"Lint\" /\u003e\u003c/a\u003e\n  \u003cimg src=\"https://img.shields.io/badge/rust-stable-orange?logo=rust\" alt=\"Rust stable\" /\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://ababic.github.io/dumpling/\"\u003e\u003cstrong\u003eDocumentation\u003c/strong\u003e\u003c/a\u003e\n  \u0026nbsp;·\u0026nbsp;\n  \u003ca href=\"https://github.com/ababic/dumpling\"\u003e\u003cstrong\u003eGitHub\u003c/strong\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003csub\u003e\u003cem\u003eDisclaimer: This project is entirely vibe-coded, but with strong human guidance, review, and attention to quality and safety.\u003c/em\u003e\u003c/sub\u003e\n\u003c/p\u003e\n\n---\n\n**Dumpling** reads plain-text SQL dumps (PostgreSQL `pg_dump`, SQLite `.dump`, SQL Server / MSSQL scripts) and rewrites sensitive columns using rules you define in TOML. Everything runs offline on files — ideal for CI, staging share-outs, and compliance-minded workflows.\n\n## Why Dumpling?\n\n- **Offline by design** — works on dump files only; nothing connects to your database.\n- **Streams giant files** — line-by-line processing keeps multi‑GB dumps reasonable on modest hardware.\n- **Fails loud, not silent** — missing config exits non‑zero and lists where Dumpling looked; use `--allow-noop` only when you mean it.\n- **Stable pseudonyms** — optional domain mappings keep the same source value as the same fake value across tables (foreign keys stay consistent).\n- **Pipeline-ready** — `--check`, strict coverage, JSON reports, and residual PII scans fit pre-merge gates and release automation.\n- **Configure once** — `.dumplingconf` or `[tool.dumpling]` in `pyproject.toml`; install via **Rust** (`cargo`) or **`pip install dumpling-cli`**.\n\n---\n\n## Install\n\n### Rust (from source)\n\n```bash\ncargo build --release\n./target/release/dumpling --help\n```\n\n### Python / pip (`dumpling-cli`)\n\nDumpling is also published as a pip-installable CLI package:\n\n```bash\npip install dumpling-cli\n```\n\nOr install from local source (requires [maturin](https://www.maturin.rs/) as PEP 517 backend):\n\n```bash\npip install .\n```\n\nAfter install the CLI command is the same:\n\n```bash\ndumpling --help\n```\n\n---\n\n## Usage\n\n```bash\ndumpling -i dump.sql -o sanitized.sql           # read from file, write to file\ndumpling -i dump.sql --in-place                 # overwrite the input file (atomic swap)\ncat dump.sql | dumpling \u003e sanitized.sql         # stream from stdin to stdout\ndumpling -i dump.sql -c .dumplingconf           # use explicit config path\ndumpling --check -i dump.sql                    # exit 1 if changes would occur, no output\ndumpling --stats -i dump.sql -o out.sql         # print summary to stderr\ndumpling --report report.json -i dump.sql       # write detailed JSON report of changes/drops\ndumpling --strict-coverage --report report.json -i dump.sql --check  # fail on uncovered sensitive columns\ndumpling --scan-output --report report.json -i dump.sql               # scan transformed output for residual PII-like patterns\ndumpling --scan-output --fail-on-findings --report report.json -i dump.sql --check  # fail if scan thresholds are exceeded\ndumpling --include-table '^public\\\\.' -i dump.sql -o out.sql\ndumpling --exclude-table '^audit\\\\.' -i dump.sql -o out.sql\ndumpling --allow-ext dmp -i data.dmp            # restrict processing to specific extensions\ndumpling --allow-noop -i dump.sql -o out.sql    # explicitly allow no-op when config is missing\ndumpling --format sqlite -i data.db.sql -o out.sql  # process a SQLite .dump file\ndumpling --format mssql  -i backup.sql -o out.sql   # process a SQL Server plain-SQL dump\ndumpling --security-profile hardened -i dump.sql -o sanitized.sql  # hardened CSPRNG + HMAC mode\ndumpling lint-policy                          # lint the anonymization policy config\ndumpling lint-policy --config .dumplingconf   # lint with explicit config path\n```\n\nConfiguration is loaded in this order:\n\n1. `--config \u003cpath\u003e` if provided\n2. `.dumplingconf` in the current directory\n3. `pyproject.toml` `[tool.dumpling]` section\n\nIf no configuration is found, Dumpling fails closed by default and exits non-zero.\nThe error output lists every checked location. Use `--allow-noop` to explicitly\npermit no-op behavior.\n\n---\n\n## Configuration (TOML)\n\nBoth `.dumplingconf` and `[tool.dumpling]` inside `pyproject.toml` use the same schema:\n\n```toml\n# Optional global salt for strategies that support it (e.g. hash)\n# Prefer env-backed secret references over plaintext.\nsalt = \"${DUMPLING_GLOBAL_SALT}\"\n\n# Rules are keyed by either \"table\" or \"schema.table\"\n[rules.\"public.users\"]\nemail = { strategy = \"email\", domain = \"customer_identity\", unique_within_domain = true }\nname  = { strategy = \"name\", locale = \"de_de\" }   # German-locale name\nssn   = { strategy = \"hash\", salt = \"${env:DUMPLING_USERS_SSN_SALT}\", as_string = true }   # SHA-256 of original (salted)\nage   = { strategy = \"int_range\", min = 18, max = 90 }\n\n[rules.\"orders\"]\ncredit_card = { strategy = \"redact\", as_string = true }\n\n# Optional explicit sensitive columns policy list (for strict coverage)\n[sensitive_columns]\n\"public.users\" = [\"employee_number\", \"tax_id\"]\n\n[output_scan]\n# optional allowlist; if omitted, all built-in categories are enabled\nenabled_categories = [\"email\", \"ssn\", \"pan\", \"token\"]\ndefault_threshold = 0\ndefault_severity = \"high\"\nfail_on_severity = \"low\"\nsample_limit_per_category = 5\n\n[output_scan.thresholds]\nemail = 0\nssn = 0\npan = 0\ntoken = 0\n\n[output_scan.severities]\nemail = \"medium\"\nssn = \"high\"\npan = \"critical\"\ntoken = \"high\"\n```\n\n### Anonymization strategies\n\n| Strategy | Description |\n|---|---|\n| `null` | Set field to SQL `NULL` |\n| `redact` | Replace with `REDACTED` (string) |\n| `uuid` | Random UUIDv4-like string |\n| `hash` | SHA-256 hex of original value; supports per-column `salt` and global `salt` |\n| `email` | Safe email address (same generator as `faker = \"internet::SafeEmail\"`); supports `locale` |\n| `name` | Full name (same as `faker = \"name::Name\"`); supports `locale` |\n| `first_name` | First name (same as `faker = \"name::FirstName\"`); supports `locale` |\n| `last_name` | Last name (same as `faker = \"name::LastName\"`); supports `locale` |\n| `phone` | Locale-aware fake phone number (configurable via `locale`); defaults to English format |\n| `faker` | Values from the Rust [`fake`](https://crates.io/crates/fake) crate ([docs.rs](https://docs.rs/fake/latest/fake/), [`faker` modules](https://docs.rs/fake/latest/fake/faker/index.html)), chosen by a **string identifier** only (`faker = \"module::Type\"`, e.g. `internet::SafeEmail`). Config is **data only**: nothing from TOML is compiled or executed as Rust at runtime. Use `locale` for locale-aware generators; optional `min`/`max`, `length`, `format` as documented. Unsupported targets fail at config load. New generators require a **new Dumpling release** (or your own fork), not config-side code. |\n| `int_range` | Random integer in `[min, max]` |\n| `string` | Random alphanumeric string (`length = 12` by default) |\n| `date_fuzz` | Shifts a date by a random number of days in `[min_days, max_days]` (defaults: `-30..30`) |\n| `time_fuzz` | Shifts a time-of-day by a random number of seconds in `[min_seconds, max_seconds]` with 24h wraparound (defaults: `-300..300`) |\n| `datetime_fuzz` | Shifts a timestamp/timestamptz by a random number of seconds in `[min_seconds, max_seconds]` (defaults: `-86400..86400`) |\n\n**`faker` reference (upstream `fake` crate):** Dumpling’s `faker = \"module::Type\"` strings mirror the Rust [`fake`](https://crates.io/crates/fake) crate’s [`faker`](https://docs.rs/fake/latest/fake/faker/index.html) module layout. Use these when picking or extending generators:\n\n- [docs.rs — `fake` crate root](https://docs.rs/fake/latest/fake/) (overview, `Fake` / `Dummy` traits, locales)\n- [docs.rs — `fake::faker` module index](https://docs.rs/fake/latest/fake/faker/index.html) (per-domain submodules: `address`, `internet`, `name`, …)\n- [GitHub — `cksac/fake-rs`](https://github.com/cksac/fake-rs) (source, README with the CLI’s generator name list)\n\n### Secret references\n\nDumpling resolves secret references in string config fields so plaintext salts/keys\nnever need to be committed to version control.\n\n| Syntax | Description |\n|---|---|\n| `${ENV_VAR}` | Value of environment variable `ENV_VAR` |\n| `${env:ENV_VAR}` | Value of environment variable `ENV_VAR` (explicit provider prefix) |\n| `${file:/path/to/secret}` | Contents of a file (trailing newlines stripped); works with Docker Swarm secrets, Kubernetes mounted secrets, and Vault Agent injected files |\n\n- Missing env references and unreadable/empty files fail fast with a non-zero startup error that includes the config path.\n- Plaintext `salt` values still work for backwards compatibility, but Dumpling prints a startup warning because plaintext secrets are insecure.\n\n```toml\n# .dumplingconf — keep salts out of source control\nsalt = \"${DUMPLING_GLOBAL_SALT}\"\n\n[rules.\"public.users\"]\nssn   = { strategy = \"hash\", salt = \"${env:DUMPLING_USERS_SSN_SALT}\" }\nemail = { strategy = \"hash\", salt = \"${file:/run/secrets/dumpling_email_salt}\" }\n```\n\n```bash\n# Local dev\nexport DUMPLING_GLOBAL_SALT='local-dev-salt'\nexport DUMPLING_USERS_SSN_SALT='users-ssn-salt'\ndumpling --input dump.sql --check\n\n# CI (injected from your secret store)\nexport DUMPLING_GLOBAL_SALT=\"$CI_DUMPLING_GLOBAL_SALT\"\nexport DUMPLING_USERS_SSN_SALT=\"$CI_DUMPLING_USERS_SSN_SALT\"\ndumpling --input dump.sql --check --strict-coverage --report coverage.json\n\n# Docker / Kubernetes (file-mounted secrets)\n# salt = \"${file:/run/secrets/dumpling_hmac_key}\" in .dumplingconf\n# secret mounted at /run/secrets/dumpling_hmac_key by the orchestrator\ndumpling --security-profile hardened --input dump.sql --check\n```\n\n### Common column options\n\n- `as_string`: if true, forces the anonymized value to be rendered as a quoted SQL string literal. By default Dumpling preserves the original quoting where possible.\n- `domain`: deterministic mapping domain. When set, the same source value always maps to the same pseudonym inside that domain (across tables/columns). **SQL `NULL` inputs are always preserved as `NULL`** — a null FK reference has no source value to map, so no pseudonym is fabricated.\n- `unique_within_domain`: when true, different source values are assigned unique pseudonyms within the configured `domain`. NULL values are unaffected and always remain NULL.\n- `min_days` / `max_days`: used by `date_fuzz`.\n- `min_seconds` / `max_seconds`: used by `time_fuzz` and `datetime_fuzz`.\n- `locale`: selects the language/regional format for `email`, `name`, `first_name`, `last_name`, `faker`, and `phone`. Supported values: `en`, `fr_fr`, `de_de`, `it_it`, `pt_br`, `pt_pt`, `ar_sa`, `zh_cn`, `zh_tw`, `ja_jp`, `cy_gb`. Defaults to `en` when not specified.\n- `faker`: required when `strategy = \"faker\"`. A plain string `\"module::Type\"` (case-insensitive) that maps to a **built-in** generator compiled into Dumpling—not arbitrary Rust or expressions. Names follow [`fake::faker`](https://docs.rs/fake/latest/fake/faker/index.html) (e.g. `internet::SafeEmail` → `faker::internet::SafeEmail` in the crate).\n- `format`: used with `faker = \"number::NumberWithFormat\"`; pattern uses `#` (0–9) and `^` (1–9) per the [`fake` crate docs](https://docs.rs/fake/latest/fake/).\n\n\u003e **Note:** `table_options` are no longer supported; use explicit `rules` and optional `column_cases`.\n\n---\n\n## Strict coverage\n\n`--strict-coverage` enforces that all detected sensitive columns have an explicit anonymization rule.\n\nSensitive columns are detected via:\n- Built-in column-name heuristics (the same patterns used by auto-detection).\n- Explicit lists under `[sensitive_columns]`.\n\nA column is considered **covered** only when it has an explicit `rules` entry or at least one `column_cases` entry. When strict coverage fails, Dumpling exits non-zero and reports the uncovered columns.\n\n### Coverage reporting\n\nWhen `--report \u003cfile\u003e` is used, the JSON output includes:\n\n- `sensitive_columns_detected`\n- `sensitive_columns_covered`\n- `sensitive_columns_uncovered`\n- `deterministic_mapping_domains` (columns configured with deterministic domain mapping)\n- `output_scan` (when `--scan-output` is enabled), including category counts and sample locations\n\n### CI gate pattern\n\n```bash\ndumpling --input dump.sql --check --strict-coverage --report coverage.json\n```\n\nThis command exits non-zero if:\n- Data changes/drops are detected (`--check` semantics), or\n- Strict coverage finds uncovered sensitive columns.\n\n---\n\n## Residual PII scan\n\n```bash\ndumpling \\\n  --input dump.sql \\\n  --check \\\n  --scan-output \\\n  --fail-on-findings \\\n  --report scan-report.json\n```\n\n`--scan-output` scans the transformed output for built-in detector categories:\n\n- `email`: email-address-like strings\n- `ssn`: U.S. SSN-like values\n- `pan`: payment-card-like numbers (Luhn validated)\n- `token`: common secret/token formats (JWT, AWS access key IDs, GitHub PAT prefixes, etc.)\n\nWhen `--fail-on-findings` is set, Dumpling exits non-zero if any configured category exceeds its threshold and meets the configured severity gate.\n\n---\n\n## Input format\n\nDumpling processes plain-text SQL dump files from multiple sources. Use `--format` to select the dialect (default: `postgres`).\n\n### PostgreSQL (`--format postgres`)\n\nProduced by `pg_dump --format=plain`. Handles:\n\n- `INSERT INTO schema.table (col1, col2, ...) VALUES (...), (...), ...;`\n- `COPY schema.table (col1, col2, ...) FROM stdin; ... \\.` (tab-delimited with `\\N` as NULL)\n- `\"double-quoted\"` identifiers\n- `''`-escaped string literals\n\nBinary, custom, and directory formats from `pg_dump` are not parsed directly — Dumpling’s SQL pipeline expects plain text. Use either:\n\n- **`pg_dump --format=plain`** when you control capture, or\n- **`dumpling --dump-decode`** with `--input` set to a **custom-format** (`.dump`) or **directory-format** folder: Dumpling runs `pg_restore -f -` and streams the resulting SQL (same as a manual `pg_restore` “script” output, no database required). Requires PostgreSQL client tools on `PATH` (`pg_restore`), or set `--pg-restore-path`. Use `--dump-decode-arg` to pass extra flags (e.g. `--no-owner --no-acl`). **By default** the archive is removed after a fully successful run; pass **`--dump-decode-keep-input`** to retain it. **`--check`** requires **`--dump-decode-keep-input`** so the archive still exists if changes would be detected.\n\nExample (e.g. after `heroku pg:backups:download`):\n\n```bash\ndumpling --dump-decode -i latest.dump -c .dumplingconf -o anonymized.sql\n```\n\n### SQLite (`--format sqlite`)\n\nProduced by the SQLite CLI `.dump` command or equivalent. Handles:\n\n- Standard `INSERT INTO table (col1, ...) VALUES (...);`\n- `INSERT OR REPLACE INTO table (...) VALUES (...);`\n- `INSERT OR IGNORE INTO table (...) VALUES (...);`\n- `\"double-quoted\"` identifiers\n- `''`-escaped string literals\n\nThe `OR REPLACE` / `OR IGNORE` variant keyword is preserved verbatim in the output.\n\n### SQL Server / MSSQL (`--format mssql`)\n\nProduced by SSMS \"Script Table as → INSERT To\", `mssql-scripter`, or similar tools. Handles:\n\n- `INSERT INTO [schema].[table] ([col1], [col2], ...) VALUES (...), ...;`\n- `[bracket]`-quoted identifiers (stripped to unquoted names in output)\n- `N'...'` Unicode string literals (the `N` prefix is transparently discarded; value is preserved)\n- `nvarchar(n)` and `nchar(n)` column-length declarations (used to truncate generated values)\n- `''`-escaped string literals\n\n---\n\n## Row filtering\n\nYou can retain or delete rows for specific tables using explicit predicate lists.\n\n- If `retain` is non-empty, a row is kept only if it matches at least one predicate.\n- Regardless of `retain`, a row is dropped if it matches any predicate in `delete`.\n\nSupported predicate operators:\n\n| Operator | Description |\n|---|---|\n| `eq` / `neq` | String compare (case-insensitive if `case_insensitive = true`) |\n| `in` / `not_in` | List of values (string compare) |\n| `like` / `ilike` | SQL-like patterns (`%` and `_`) |\n| `regex` / `iregex` | Rust regex (`iregex` is case-insensitive) |\n| `lt` / `lte` / `gt` / `gte` | Numeric compare (values parsed as numbers) |\n| `is_null` / `not_null` | No value needed |\n\nPredicates can target nested JSON values using dot notation (`payload.profile.tier`) or Django-style notation (`payload__profile__tier`). For JSON arrays, path segments are evaluated against each element, so list-of-dicts structures can be matched naturally.\n\n### JSON path list targeting\n\nJSON list/array traversal is automatic once a path segment resolves to an array.\n\n- **All elements in an array**: use the next field name directly.\n  - `payload.items.kind` or `payload__items__kind`\n  - Matches/rewrites `kind` for every object in `items`.\n- **Specific array index**: use a numeric segment.\n  - `payload.items.0.kind` or `payload__items__0__kind`\n  - Targets only the first element.\n- **Nested arrays**: combine field and index segments as needed.\n  - `payload.groups.members.email`\n  - `payload.groups.1.members.0.email`\n\nThis path behavior is shared by both `row_filters` predicates and JSON-path anonymization rules in `[rules]`.\n\n```toml\n[row_filters.\"public.users\"]\nretain = [\n  { column = \"country\", op = \"eq\",  value = \"US\" },\n  { column = \"email\",   op = \"ilike\", value = \"%@myco.com\" },\n  { column = \"profile.flags.plan\", op = \"eq\", value = \"gold\" }\n]\ndelete = [\n  { column = \"is_admin\", op = \"eq\", value = \"true\" },\n  { column = \"email\",    op = \"ilike\", value = \"%@example.com\" },\n  { column = \"devices__platform\", op = \"eq\", value = \"android\" }\n]\n```\n\nRow filtering works for both `INSERT ... VALUES (...)` and `COPY ... FROM stdin` rows.\n\n---\n\n## Conditional per-column cases\n\nDefine default strategies in `rules.\"\u003ctable\u003e\"` and add ordered per-column cases in `column_cases.\"\u003ctable\u003e\".\"\u003ccolumn\u003e\"`. For each row and column, Dumpling applies the first matching case; if none match, it falls back to the default from `rules`.\n\n```toml\n[rules.\"public.users\"]\nemail = { strategy = \"hash\", as_string = true }   # default\nname  = { strategy = \"name\" }\n\n[[column_cases.\"public.users\".email]]\nwhen.any = [{ column = \"is_admin\", op = \"eq\", value = \"true\" }]\nstrategy = { strategy = \"redact\", as_string = true }\n\n[[column_cases.\"public.users\".email]]\nwhen.any = [{ column = \"country\", op = \"in\", values = [\"DE\",\"FR\",\"GB\"] }]\nstrategy = { strategy = \"hash\", salt = \"eu-salt\", as_string = true }\n```\n\n- `when.any` is OR, `when.all` is AND; you can use either or both. If both are empty, the case matches unconditionally.\n- First-match-wins per column; there is no merge or fallthrough.\n- Row filtering (`row_filters`) is evaluated before cases; deleted rows are not transformed.\n\n---\n\n## Hardened security profile\n\nFor adversarial risk environments — where an internal or external actor may have partial auxiliary data — use `--security-profile hardened`:\n\n```bash\ndumpling --security-profile hardened -i dump.sql -o sanitized.sql\n```\n\n### What changes in hardened mode\n\n| Aspect | Standard | Hardened |\n|---|---|---|\n| Random generation | xorshift64\\* seeded from system time | OS CSPRNG (`getrandom`) — non-predictable |\n| `hash` strategy | SHA-256(salt \\|\\| input) | HMAC-SHA-256(key=salt, data=input) |\n| Deterministic domain byte stream | SHA-256 CTR-mode | HMAC-SHA-256 CTR-mode |\n| Report `security_profile` field | `\"standard\"` | `\"hardened\"` |\n| `--seed` / `DUMPLING_SEED` | Seeds the PRNG | Ignored (warning emitted) |\n\n### Why this matters\n\n- **Non-predictable output**: xorshift64\\* is seeded from system time, which is guessable. The OS CSPRNG cannot be predicted from timing alone.\n- **Proper keyed hashing**: `SHA-256(key || data)` is vulnerable to length-extension attacks and weak as a MAC. HMAC-SHA-256 uses the salt as a genuine cryptographic key, providing provable PRF security.\n- **Domain separation**: HMAC construction ensures outputs from one salt/key cannot be confused with another.\n\n### Key management guidance\n\nConfigure a per-environment secret via an env-backed reference to prevent key leakage:\n\n```toml\n# .dumplingconf\nsalt = \"${DUMPLING_HMAC_KEY}\"\n\n[rules.\"public.users\"]\nssn = { strategy = \"hash\", as_string = true }\nemail = { strategy = \"email\", domain = \"users\" }\n```\n\n```bash\nexport DUMPLING_HMAC_KEY=\"$(openssl rand -base64 32)\"\ndumpling --security-profile hardened -i dump.sql -o sanitized.sql\n```\n\n**Key rotation**: Changing `DUMPLING_HMAC_KEY` will produce entirely different pseudonyms for all salted/domain-mapped columns. If you rely on referential consistency across separately-processed dumps (e.g., snapshots over time), keep the same key or re-anonymize all related dumps together. Rotate keys when:\n- A key may have been compromised.\n- You intentionally want to break prior referential linkability.\n\n### Report metadata\n\nThe JSON report always includes the active security profile:\n\n```json\n{\n  \"security_profile\": \"hardened\",\n  \"total_rows_processed\": 1000,\n  ...\n}\n```\n\n---\n\n## Policy linting\n\nThe `lint-policy` subcommand statically analyses your configuration and flags common issues before they affect a production pipeline.\n\n```bash\ndumpling lint-policy                          # auto-discover config\ndumpling lint-policy --config .dumplingconf   # explicit config path\n```\n\n| Check | Severity | Description |\n|---|---|---|\n| `empty-rules-table` | warning | A `[rules]` entry has no column rules |\n| `empty-column-cases-table` | warning | A `[column_cases]` entry has no column cases |\n| `unsalted-hash` | warning | `hash` strategy used without any salt — reversible for low-entropy inputs |\n| `inconsistent-domain-strategy` | error | Same domain name used with different strategies — breaks referential integrity |\n| `uncovered-sensitive-column` | error | A column in `[sensitive_columns]` has no matching rule or case |\n\nExits `0` if no violations are found, `1` if any violations exist. Plug it into CI as a pre-merge gate:\n\n```yaml\n- run: ./target/release/dumpling lint-policy\n```\n\nSee the [CI guardrails documentation](docs/src/ci-guardrails.md) for full pipeline recipes including strict-coverage enforcement, residual PII scan gating, and report diffing.\n\n---\n\n## Notes\n\n- This is a streaming transformer; memory usage stays small even for large dumps.\n- For CI/CD and production-like workflows, prefer the default fail-closed mode and avoid `--allow-noop` unless a no-op run is intentional.\n- For best results, configure strategies compatible with column data types. If you hash an integer column, Dumpling will render a string; most databases can coerce this, but explicit `as_string = false` may help in some cases.\n- For length-restricted text columns (`varchar(n)`, `character varying(n)`, `char(n)`, `character(n)`), Dumpling reads `CREATE TABLE` definitions and truncates generated text values to fit within the declared limit.\n- Deterministic anonymization for tests: pass `--seed \u003cu64\u003e` or set env `DUMPLING_SEED` to make fuzz strategies reproducible across runs. Note: `--seed` has no effect in `--security-profile hardened`.\n- Domain mappings (`domain = \"...\"`) are deterministic by source value + domain (+ optional salt), so referential joins stay stable across tables within the same dump.\n\n---\n\n## Full documentation\n\nDetailed docs, including the configuration reference and release process, are available at the project's [GitHub Pages site](https://ababic.github.io/dumpling/) (built from `docs/src/`).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fababic%2Fdumpling","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fababic%2Fdumpling","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fababic%2Fdumpling/lists"}