{"id":51012627,"url":"https://github.com/matinfo/pii-airlock","last_synced_at":"2026-06-21T05:30:29.659Z","repository":{"id":366209510,"uuid":"1275431644","full_name":"matinfo/pii-airlock","owner":"matinfo","description":"Keep real PII out of your AI tools — local, reversible PII scrubbing via CLI, a universal provider gateway, or Claude Code hooks. Built on Microsoft Presidio.","archived":false,"fork":false,"pushed_at":"2026-06-20T21:58:45.000Z","size":129,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-20T22:13:14.950Z","etag":null,"topics":["ai-agents","anonymization","claude-code","gateway","llm","pii","presidio","privacy","security","spacy"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/pii-airlock/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/matinfo.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-06-20T17:19:52.000Z","updated_at":"2026-06-20T21:58:47.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/matinfo/pii-airlock","commit_stats":null,"previous_names":["matinfo/pii-scrub","matinfo/pii-airlock"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/matinfo/pii-airlock","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matinfo%2Fpii-airlock","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matinfo%2Fpii-airlock/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matinfo%2Fpii-airlock/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matinfo%2Fpii-airlock/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/matinfo","download_url":"https://codeload.github.com/matinfo/pii-airlock/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matinfo%2Fpii-airlock/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34596046,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-21T02:00:05.568Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","anonymization","claude-code","gateway","llm","pii","presidio","privacy","security","spacy"],"created_at":"2026-06-21T05:30:29.135Z","updated_at":"2026-06-21T05:30:29.651Z","avatar_url":"https://github.com/matinfo.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pii-airlock\n\n\u003e **Keep real personal data out of AI prompts in minutes — local, reversible, and provider-agnostic.**\n\n[![CI](https://github.com/matinfo/pii-airlock/actions/workflows/ci.yml/badge.svg)](https://github.com/matinfo/pii-airlock/actions/workflows/ci.yml)\n[![PyPI](https://img.shields.io/pypi/v/pii-airlock.svg)](https://pypi.org/project/pii-airlock/)\n![Python](https://img.shields.io/badge/python-3.10%E2%80%933.14-blue.svg)\n![Platforms](https://img.shields.io/badge/platforms-macOS%20%7C%20Linux%20%7C%20Windows-lightgrey.svg)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![License: MIT](https://img.shields.io/badge/license-MIT-yellow.svg)](LICENSE)\n\n**[2-min quickstart](#2-minute-quickstart) · [Install](#install) · [Gateway](#universal-gateway-any-provider) · [CLI](#cli-usage) · [doctor](#doctor-health-checks) · [Troubleshooting](#troubleshooting) · [All agents →](AGENTS.md)**\n\n`pii-airlock` is a local privacy layer for AI tools. It replaces personal data with placeholders before requests leave your machine, then restores originals in responses.\n\nExample:\n\n```text\nJohn Smith (john@acme.com) → \u003cPERSON_1\u003e (\u003cEMAIL_ADDRESS_1\u003e)\n```\n\nRuns on **macOS, Linux, Windows** (Python ≥ 3.10). Built on [Microsoft Presidio](https://microsoft.github.io/presidio/).\n\n---\n\n## Why people use pii-airlock\n\n| Problem | What pii-airlock gives you |\n|---|---|\n| You use multiple AI clients/providers | One local gateway URL for all of them |\n| You need reversible anonymization | Stable placeholders + local mapping file |\n| You use Claude Code tools/files | Native hooks for prompt + tool-data checks |\n| You want to avoid vendor lock-in | Provider adapters: OpenAI-style, Anthropic, Gemini |\n\nDetection happens locally. Gateway traffic is forwarded only after PII is replaced.\n\n---\n\n## 2-minute quickstart\n\nIf you just want the safest default for most users:\n\n```bash\npipx install \"pii-airlock[proxy]\"\npii-airlock init\npii-airlock proxy\n```\n\nThen set your client base URL:\n\n```bash\nexport OPENAI_BASE_URL=http://127.0.0.1:8745/openai\n```\n\nPowerShell:\n\n```powershell\n$env:OPENAI_BASE_URL = \"http://127.0.0.1:8745/openai\"\n```\n\nNow prompts are scrubbed before provider calls, and responses are restored automatically.\n\n---\n\n## Install\n\nRequires Python ≥ 3.10 and [pipx](https://pipx.pypa.io/) (recommended) or pip.\nWorks the same on macOS, Linux and Windows.\n\n```bash\npipx install \"pii-airlock[proxy]\"   # recommended (gateway included)\npii-airlock init                    # guided setup (models + env hints)\n```\n\nLatest from source (before a release lands on PyPI):\n\n```bash\npipx install git+https://github.com/matinfo/pii-airlock\n```\n\nOptional format support (plain text, CSV and JSON work out of the box):\n\n```bash\npipx inject pii-airlock 'pii-airlock[docx]'   # Word .docx\npipx inject pii-airlock 'pii-airlock[pdf]'    # PDF (text extraction)\npipx inject pii-airlock 'pii-airlock[all]'    # everything\n```\n\nIf you installed without gateway support and need it later:\n\n```bash\npipx inject pii-airlock 'pii-airlock[proxy]'\n```\n\nIf `download-models` reports that your interpreter has no `pip` (common in\n`pipx`), run the printed `pipx inject pii-airlock \"\u003cmodel-wheel-url\u003e\"` commands.\nThe command now prints exact model wheel URLs for you.\n\n### doctor health checks\n\n```bash\npii-airlock doctor\n```\n\nThis checks proxy dependencies, model installation, and mapping roundtrip in one command.\n\n### Quick verification (2 commands)\n\n```bash\necho \"Contact John Smith at john@example.com\" | pii-airlock scrub --map /tmp/test.pii-map.json\necho \"Replied to \u003cPERSON_1\u003e on \u003cEMAIL_ADDRESS_1\u003e.\" | pii-airlock restore --map /tmp/test.pii-map.json\n```\n\n---\n\n## Universal gateway (any provider)\n\nThe gateway is the easiest mode for non-technical users: point your AI client at\n`localhost`, and pii-airlock handles scrub/restore automatically.\n\n```\n  your app ──http──▶ pii-airlock gateway ──https──▶ provider API\n           ◀───────────  (restore)  ◀───────────  (scrub)\n```\n\n```bash\n# already included if you installed with \"pii-airlock[proxy]\"\n# pipx inject pii-airlock 'pii-airlock[proxy]'\npii-airlock proxy            # listens on http://127.0.0.1:8745\n```\n\nThen set the base URL in whatever client you use:\n\n```bash\n# OpenAI SDK / Codex CLI / most OpenAI-compatible tools\nexport OPENAI_BASE_URL=http://127.0.0.1:8745/openai\n\n# Anthropic SDK\nexport ANTHROPIC_BASE_URL=http://127.0.0.1:8745/anthropic\n\n# Google Gemini — use base path\n#   https://generativelanguage.googleapis.com  -\u003e  http://127.0.0.1:8745/gemini\n```\n\n**Why this is practical:** configure once, keep existing clients, and avoid changing prompt habits.\n\n- **No TLS interception.** Your client talks plain HTTP to `localhost`; the proxy makes\n  the real HTTPS call upstream. No certificates to install. Bind stays on `127.0.0.1` by default.\n- **Auth passes through** untouched and pii-airlock never logs request headers or bodies.\n  (uvicorn runs at log level `warning`, so request lines aren't logged either.)\n- A **fresh in-memory mapping per request** — the gateway writes nothing to disk.\n- **Concurrency-safe:** detection is serialized with a lock, so the engine is shared\n  safely across simultaneous requests.\n\n**Provider coverage** — three wire formats, each with an adapter in `pii_scrub/payload.py`:\n\n| Route | Wire format | Covers | Streaming |\n|---|---|---|---|\n| `/openai` | OpenAI Chat Completions | OpenAI, Codex, Cursor, Continue, Ollama, LiteLLM, vLLM, … | SSE ✅ |\n| `/anthropic` | Anthropic Messages | Claude SDKs, Claude-compatible tools | SSE ✅ |\n| `/gemini` | Gemini generateContent | Google Gemini | SSE ✅ · array-stream buffered |\n\nNeed setup for a specific client? Use **[AGENTS.md](AGENTS.md)** (Claude Code, Codex, Cursor, Continue, Aider, Gemini).\n\n---\n\n## CLI usage\n\n### Reversible scrub → restore pipe\n\n```bash\n# 1. Scrub — replaces PII with tokens, saves a mapping file\necho \"Contacte Jean Dupont à jean@acme.fr\" \\\n  | pii-airlock scrub --map /tmp/m.pii-map.json\n# → Contacte \u003cPERSON_1\u003e à \u003cEMAIL_ADDRESS_1\u003e\n\n# 2. Send the scrubbed text to your LLM …\n\n# 3. Restore — swap tokens back in the model's response\necho \"J'ai répondu à \u003cPERSON_1\u003e via \u003cEMAIL_ADDRESS_1\u003e.\" \\\n  | pii-airlock restore --map /tmp/m.pii-map.json\n# → J'ai répondu à Jean Dupont via jean@acme.fr.\n```\n\nSame value always gets the same token (`\u003cPERSON_1\u003e`) so the model still sees coreference.\n\n### Detect without changing text\n\n```bash\npii-airlock detect notes.txt\n# PERSON           0.85  [9:21]    'Jean Dupont'\n# EMAIL_ADDRESS    0.99  [24:38]   'jean@acme.fr'\n```\n\n### File formats\n\n```bash\npii-airlock scrub report.csv  -o report.scrubbed.csv\npii-airlock scrub data.json   -o data.scrubbed.json\npii-airlock scrub contract.docx -o contract.scrubbed.docx   # requires [docx]\npii-airlock scrub scan.pdf                                   # requires [pdf] → text on stdout\n```\n\n### Other options\n\n```bash\npii-airlock scrub prompt.txt --no-map          # irreversible one-way scrub\npii-airlock scrub --lang fr,en                 # explicit language list\npii-airlock scrub --threshold 0.7              # raise confidence cutoff\npii-airlock scrub input.txt -o out.txt --map secrets.pii-map.json\n```\n\n---\n\n## Claude Code hooks\n\nRegister guardrails that intercept PII before it reaches the model:\n\n```bash\npii-airlock install-hook                  # both events, project .claude/settings.json\npii-airlock install-hook --scope user     # ~/.claude/settings.json (all projects)\npii-airlock install-hook --event tool     # PreToolUse only\npii-airlock install-hook --event prompt   # UserPromptSubmit only\n```\n\n| Leak vector | Covered by |\n|---|---|\n| PII you type in a prompt | `UserPromptSubmit` |\n| PII in a file Claude reads | `PreToolUse` |\n| PII in a shell command Claude runs | `PreToolUse` |\n\nThe hooks **detect and warn/ask** — they don't silently rewrite payloads (the hook API doesn't support in-place rewriting). For a silent, reversible rewrite, use the CLI pipe above.\n\nSet `hook_decision: deny` in config to block instead of asking.\n\n---\n\n## Configuration\n\nOverride chain (lowest → highest priority):\n\n```\nbundled defaults → ~/.config/pii-airlock/config.yaml → ./.pii-airlock.yaml → CLI flags\n```\n\nDefault config (`config.default.yaml`):\n\n```yaml\nlanguages: [en, fr]\nmodels:\n  en: en_core_web_lg\n  fr: fr_core_news_lg\nscore_threshold: 0.5\nentities: []          # empty = all entities Presidio recognizes\nhook_decision: ask    # ask (surface + confirm) | deny (block)\n\n# Advanced (proxy runtime):\nmapping_backend: memory   # memory (default) | file\n# mapping_dir: /var/lib/pii-airlock/maps\n```\n\nFor higher-volume/self-hosted setups, `mapping_backend: file` can be useful for\ndebugging or operational inspection. `memory` stays the safest default for local\nuse because mappings are ephemeral.\n\n### Adding a language\n\n```bash\npython -m spacy download de_core_news_lg\n```\n\n```yaml\n# .pii-airlock.yaml\nlanguages: [en, fr, de]\nmodels:\n  de: de_core_news_lg\n```\n\n---\n\n## Guarantees \u0026 limitations\n\n**What pii-airlock guarantees**\n\n- **Reversibility is exact.** Any value the engine tokenized is restored byte-for-byte\n  via the mapping. `restore(scrub(text))` round-trips for tokenized spans.\n- **Determinism.** The same value gets the same token within a mapping, so coreference\n  is preserved for the model.\n- **Tokens are opaque \u0026 safe.** Restored values are re-inserted through proper JSON\n  encoding; values containing quotes, newlines or `\u003c…\u003e` won't corrupt payloads.\n- **Local-only detection.** Detection never makes a network call. Only the gateway\n  forwards (already-scrubbed) traffic onward.\n\n**What it does *not* guarantee — read this**\n\n- **Detection is best-effort, not complete.** Presidio + spaCy are statistical; they\n  miss and mis-tag entities (more so with `_sm` models, or for phone numbers with odd\n  spacing). pii-airlock reduces exposure — it is **not** a guarantee that every piece of\n  PII is removed. Review sensitive material; raise `score_threshold` or add custom\n  recognizers as needed.\n- **Gateway scope.** Only generation endpoints are scrubbed (chat/messages/generateContent).\n  Embeddings and other endpoints pass through unchanged. Tokens are restored in message\n  **content**, not inside tool-call/function arguments a model may emit.\n- **Gemini array streaming** (without `alt=sse`) is buffered, then restored as one\n  response rather than streamed live.\n- **`.docx`** scrubbing rewrites changed paragraphs into a single run, so inline\n  formatting within those paragraphs is not preserved. **PDF** is extract-only → scrubbed\n  text out (no PDF re-render).\n\n---\n\n## Platform support\n\nTested in CI on **Linux, macOS and Windows** (Python 3.10–3.14 on Linux; 3.12–3.13 on\nmacOS/Windows). One platform nuance:\n\n- Mapping files are created with mode **`0600` on POSIX** (macOS/Linux). **On Windows**\n  `chmod` is a no-op; the file inherits your account's default ACLs — typically already\n  user-private in a home directory. Treat mapping files as secrets regardless (below).\n\n---\n\n## Troubleshooting\n\nRun this first for a quick diagnosis:\n\n```bash\npii-airlock doctor\n```\n\n**`.venv/bin/pytest: bad interpreter .../pii-scrub/...`**\n\nYour venv points to an old repo path. Recreate it:\n\n```bash\nrm -rf .venv\npython3 -m venv .venv\n.venv/bin/python -m pip install -U pip\n.venv/bin/python -m pip install -e '.[dev]'\n```\n\n**`No module named spacy` when running `python -m spacy ...`**\n\nYou are likely using system Python instead of the `pii-airlock` environment.\nUse:\n\n```bash\npii-airlock download-models\n```\n\n**`download-models` says this interpreter has no pip**\n\nThis is normal in many `pipx` environments. Run the exact `pipx inject` command\nprinted by `download-models` for each model wheel URL.\n\n**`pii-airlock proxy` fails with missing `httpx` / `starlette` / `uvicorn`**\n\nInstall gateway dependencies:\n\n```bash\npipx inject pii-airlock 'pii-airlock[proxy]'\n```\n\n**`OSError: [E050] Can't find model ...` after download**\n\nInstall the model wheel directly into the same environment where `pii-airlock`\nruns (the command prints the exact wheel URL in this case).\n\n---\n\n## ⚠ Security: the mapping file holds real PII\n\n`*.pii-map.json` contains the **original personal data** in plain text.\n\n- Created owner-only (`0600` on POSIX; default user ACLs on Windows — see above).\n- The bundled `.gitignore` excludes `*.pii-map.json` and `*.pii-map.*`.\n- **Never commit mapping files.**\n- Delete them when you no longer need to restore.\n- Use `--no-map` when reversibility isn't required.\n- The gateway keeps its mapping in memory only and discards it after each response.\n\n---\n\n## Development\n\n```bash\ngit clone https://github.com/matinfo/pii-airlock\ncd pii-airlock\npip install -e \".[dev]\"\nruff check .\npytest -q\n```\n\nUnit tests stub out Presidio/spaCy and run entirely offline. For a live end-to-end check, run `pii-airlock download-models` first, then:\n\n```bash\necho \"Call John Smith at john@example.com\" | pii-airlock scrub --map /tmp/test.pii-map.json\n```\n\n---\n\n## Community\n\n- 🧩 **[Integrate with your agent](AGENTS.md)** — Claude Code, Cursor, Codex, Gemini, Continue, Aider, …\n- ❓ **Need help?** Open a **[bug report](https://github.com/matinfo/pii-airlock/issues/new/choose)** with:\n  - `pii-airlock --version`\n  - your install method (`pipx` or `pip`)\n  - the exact command and full error output\n- 🤝 **[Contributing](CONTRIBUTING.md)** — adding a language or a provider adapter is a great first PR\n- 🔒 **[Security policy](SECURITY.md)** — responsible disclosure + the honest threat model\n- 📜 **[Changelog](CHANGELOG.md)** · **[Code of Conduct](CODE_OF_CONDUCT.md)**\n\nIf pii-airlock helps you keep PII out of your AI tools, a ⭐ helps others find it.\n\n---\n\n## License\n\n[MIT](LICENSE) © pii-airlock contributors\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmatinfo%2Fpii-airlock","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmatinfo%2Fpii-airlock","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmatinfo%2Fpii-airlock/lists"}