https://github.com/mathu97/dbveil
A local read-only, PII-redacting proxy that lets AI agents query your database safely.
https://github.com/mathu97/dbveil
claude-code pii pii-redaction postgresql safety sql
Last synced: about 5 hours ago
JSON representation
A local read-only, PII-redacting proxy that lets AI agents query your database safely.
- Host: GitHub
- URL: https://github.com/mathu97/dbveil
- Owner: mathu97
- License: mit
- Created: 2026-06-12T01:27:48.000Z (1 day ago)
- Default Branch: main
- Last Pushed: 2026-06-12T02:37:05.000Z (1 day ago)
- Last Synced: 2026-06-12T04:15:15.794Z (1 day ago)
- Topics: claude-code, pii, pii-redaction, postgresql, safety, sql
- Language: Python
- Homepage:
- Size: 25.4 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# veil
**A local read-only, PII-redacting proxy that lets AI agents query your database safely.**
Point Claude Code (or any MCP client) at `veil` instead of your database. Every query is
forced through three deterministic guarantees before a single row reaches the model:
1. **Read-only guard** — the query is parsed with Postgres's real grammar (`libpg_query`).
Only `SELECT` / `SHOW` / `EXPLAIN` survive. Writes, DDL, multi-statements, data-modifying
CTEs, `SELECT INTO`, and row locks are rejected *before execution* — not by asking the model
nicely, by refusing to run them.
2. **PII redaction** — results are scrubbed before they leave your machine: deterministic
column rules + always-on regex for structured PII (emails, phones, cards, SSNs), with an
optional NER/LLM backstop for free-text.
3. **Audit** — every query and verdict is appended to a log you can tail live in a TUI.
A guarded chokepoint in front of the DB, shrunk to a single open-source command with zero
infrastructure to stand up.
```
Claude Code ──MCP──▶ veil ──READ ONLY txn──▶ your database
│
├─ guard: parse → allow SELECT only
├─ redact: column rules + regex + (optional) NER/LLM
└─ audit: veil-audit.jsonl
```
## Why
You want an agent to act as a data analyst over real tables — "compare what we drafted vs what
was actually sent" — without (a) risking a destructive query or (b) shipping customer PII to a
model provider. Handing an agent raw DB credentials and hoping it only writes `SELECT` is not a
control. `veil` makes the unsafe paths impossible at the layer the agent can't talk its way past.
## Install
```bash
pip install dbveil # or: uv pip install dbveil
# optional extras:
pip install 'dbveil[ner]' # Presidio NER backstop for names/addresses
pip install 'dbveil[llm]' # local-LLM (Ollama) redaction
```
## Quickstart
```bash
veil init # interactive: pick a secret source (1Password / URL / env) + auto-detect PII
veil doctor # verify guard, connectivity, and that READ ONLY actually blocks writes
veil test-query "SELECT email, created_at FROM users LIMIT 5" # try it without an agent
veil up # run the MCP proxy on stdio (what Claude Code connects to)
```
Try a write to see the guard refuse it:
```bash
veil test-query "DELETE FROM users"
# BLOCKED — write or DDL operation detected: DELETE
```
### Connect Claude Code
```bash
claude mcp add veil -- veil up
```
or commit a `.mcp.json` so your whole team gets it:
```json
{
"mcpServers": {
"veil": { "command": "veil", "args": ["up"], "env": { "VEIL_CONFIG": "veil.yaml" } }
}
}
```
Now the agent has these tools — `query`, `list_tables`, `describe_table`, `list_databases` —
and physically cannot write or see raw PII.
## Databases & secret resolution
You never put a DSN (or a secret) directly in the file. Each database's `url` is a *reference*
resolved at connect time, chosen by its scheme:
| Reference | Resolved by |
|---|---|
| `op://vault/item/field` | **1Password** CLI (`op read`) — uses your existing 1Password auth; `OP_SERVICE_ACCOUNT_TOKEN` in CI |
| `env://VAR_NAME` | environment variable |
| `${VAR}` | inline env expansion |
| `postgresql://…` | a literal DSN |
Configure one or many named instances; the agent picks one per query (`query(sql, database="prod")`),
or the CLI with `--db`:
```yaml
databases:
staging: { url: "op://Engineering/veil-staging-db/dsn" }
prod: { url: "op://Engineering/veil-prod-db/dsn" }
default: staging
```
```bash
veil instances list # show configured DBs and how each resolves (no secrets read)
veil instances add # add another database instance (interactive)
veil doctor --db prod # verify a specific instance
veil test-query --db prod "…" # query a specific instance
```
A single `database.url:` still works and becomes the lone `default` instance.
`veil init` sets up a single database interactively: it asks how to source the connection
string — **browse 1Password** (lists your vaults → items → fields and writes the `op://`
reference), paste a URL/reference, or name an env var. Choosing 1Password requires the `op` CLI
(≥ 2.0) installed and signed in; if it isn't, init tells you exactly how to fix it and lets you
paste a reference instead. To run against multiple environments, add a `databases:` map by hand
(see above) — init configures one.
If your config uses 1Password, `veil setup` checks that the `op` CLI is connected and prints the
exact desktop-app steps to enable if it isn't — without triggering Touch ID, so it's safe to run
after install. When `op` is set up but locked, veil tells you to unlock it (`op whoami`) rather
than to re-configure.
### Watch it live
```bash
veil monitor # TUI tailing veil-audit.jsonl: allowed / blocked / redaction counts
```
### Debugging
Add `--debug` (or set `VEIL_DEBUG=1`) to print logs to stderr — config loading, secret
resolution, the exact `op` commands run and their errors, guard decisions, and row/redaction
counts. Secrets are never logged (DSNs are shown host-only, `op read` output as a byte count).
```bash
veil --debug doctor
veil --debug test-query "SELECT 1"
VEIL_DEBUG=1 veil up # logs go to stderr, safe alongside the MCP stdout protocol
```
## Configuration
`veil init` writes a commented `veil.yaml`. Full reference in
[`examples/veil.example.yaml`](examples/veil.example.yaml). The essentials:
```yaml
database:
url: ${DATABASE_URL} # env refs kept out of the file
guard:
allow_select_star: false # block SELECT * on PII tables; force explicit columns
max_rows: 1000
statement_timeout_ms: 15000
pii_tables: [contacts, users]
redact:
builtin_patterns: { email: true, phone: true, credit_card: true, ssn: true, ip: false }
columns:
- { column: email, strategy: hash } # sha256, still join-able
- { column: full_name, strategy: mask } # -> [redacted]
- { column: ssn, strategy: partial, keep: 4 }
ner: { enabled: false, engine: presidio } # optional backstop
```
## How redaction is layered (and its honest limits)
`veil` defends from the **deterministic** side first, because that's the only kind you can trust
not to leak:
| Layer | What it catches | Deterministic? |
|---|---|---|
| **Column rules** | Known PII columns (`email`, `ssn`, …) by name | ✅ yes |
| **Built-in regex** | Emails, phones, Luhn-valid cards, SSNs, IPs — even aliased or in free-text | ✅ yes |
| **NER (Presidio)** | Names / addresses in free-text the above miss | ⚠️ probabilistic |
| **LLM (Ollama)** | Same, via a local model | ⚠️ probabilistic, experimental |
**Use the probabilistic layers only as a backstop.** ML/NER *will* eventually miss a name or an
oddly-formatted address — that's a leak. For columns you already know are sensitive, the column
rules are the real control. The LLM redactor fails *closed*: if the model errors, the cell is
masked, never passed through.
## Security model
- **Two independent read-only layers.** The parser rejects non-reads, *and* every query runs
inside a `SET TRANSACTION READ ONLY` transaction — so even a parser gap can't write.
- **Give veil a least-privilege credential.** Best practice is a `GRANT SELECT`-only database
role (ideally on a read replica). Then "read-only" is enforced by the database itself, and the
credential `veil` holds is low-blast-radius: a leak exposes already-masked reads and can write
nothing. `veil doctor` confirms the READ ONLY transaction rejects writes against your DB.
- **PII never leaves your machine unmasked.** Redaction happens in-process, before results are
serialized to the MCP client.
## Secure connectivity
`veil` connects to whatever DSN you give it, so the network path is yours to choose:
- **Tailscale** — put your DB behind a tailnet and point `database.url` at the tailnet host. No
public DB port.
- **Short-lived credentials** — `${DATABASE_URL}` is expanded at load, so you can inject an
ephemeral token (RDS IAM auth, Cloud SQL IAM, a Vault dynamic user) instead of a static
password.
- **Railway / managed PaaS** — use the provided TLS endpoint with a dedicated read-only role.
## Development
```bash
uv venv && source .venv/bin/activate
uv pip install -e '.[dev]'
pytest
```
## License
MIT