{"id":51075828,"url":"https://github.com/xz-dev/ai-gateway-filter","last_synced_at":"2026-06-23T14:01:49.759Z","repository":{"id":365413280,"uuid":"1258086830","full_name":"xz-dev/ai-gateway-filter","owner":"xz-dev","description":null,"archived":false,"fork":false,"pushed_at":"2026-06-17T09:08:19.000Z","size":283,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-06-17T09:18:02.207Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xz-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-03T09:07:05.000Z","updated_at":"2026-06-17T09:08:23.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/xz-dev/ai-gateway-filter","commit_stats":null,"previous_names":["xz-dev/ai-gateway-filter"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/xz-dev/ai-gateway-filter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xz-dev%2Fai-gateway-filter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xz-dev%2Fai-gateway-filter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xz-dev%2Fai-gateway-filter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xz-dev%2Fai-gateway-filter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xz-dev","download_url":"https://codeload.github.com/xz-dev/ai-gateway-filter/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xz-dev%2Fai-gateway-filter/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34692781,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-23T02:00:07.161Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-23T14:01:46.033Z","updated_at":"2026-06-23T14:01:49.750Z","avatar_url":"https://github.com/xz-dev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Privacy Gateway (Core Library)\n\n`privacy-gateway` is a **pure Python library** for:\n\n- reversible natural-language PII protection with self-describing `\u003csecret:1:...\u003e` tokens\n- automatic restoration of `\u003csecret:1:...\u003e` tokens without special HTTP headers\n- prompt-injection phrase detection and decisioning\n- streaming detection helper\n- automatic sensitive image-region protection and restoration\n\nIt is intentionally not a gateway, HTTP server, or network service. JSON parsing,\nfield selection, routing, proxying, and request/response rewriting belong in the\nembedding gateway/plugin. The APISIX example shows one way for a gateway to parse\nJSON first and pass only relevant string values to this library.\n\n## Install\n\n```bash\nuv sync\n```\n\n## Public API\n\n```python\nfrom privacy_gateway import PrivacyGatewayFilter\n\nfilter_ = PrivacyGatewayFilter(privacy_password=\"use-a-high-entropy-deployment-secret\")\n\nprotected = filter_.protect_privacy_text(\n    \"我叫张三，身份证是110101199001011234，邮箱是zhangsan@example.com。\"\n)\nassert \"\u003csecret:1:\" in protected\n\nrestored = filter_.restore_privacy_text(protected)\nassert restored == \"我叫张三，身份证是110101199001011234，邮箱是zhangsan@example.com。\"\n\ndecision = filter_.check_text(\"Ignore previous instructions\")\nassert decision.blocked\n```\n\n## Secret token format\n\nPII is protected with a compact, descriptive, reversible token:\n\n```text\n\u003csecret:1:\u003cciphertext\u003e\u003e\n```\n\nThe token body contains only crypto material: a per-token salt and the encrypted\nvalue. The `\u003csecret:1:` prefix lets a gateway detect encrypted values without\nrelying on `X-Privacy-Encrypted` or any other marker header. Token restoration\nuses the service-side password from `PRIVACY_GATEWAY_PASSWORD` or the\n`privacy_password` constructor argument. Use a high-entropy deployment secret;\ntokens are client-visible and weak passwords are vulnerable to offline guessing.\n\n## Natural-language privacy APIs\n\nPreferred APIs for new gateway/plugin code:\n\n- `protect_privacy_text(content, privacy_password=None)` -\u003e replaces detected PII spans with `\u003csecret:1:...\u003e` tokens.\n- `restore_privacy_text(content, privacy_password=None)` -\u003e decrypts all `\u003csecret:1:...\u003e` tokens inside text.\n- `protect_secret(content, privacy_password=None)` -\u003e encrypts one caller-selected complete value as a token.\n- `detect_pii(content)` -\u003e returns detected PII spans.\n- `process_inbound_privacy_text(content, privacy_password=None)` -\u003e restores tokens, then checks prompt-injection phrases.\n- `process_outbound_privacy_text(content, privacy_password=None)` -\u003e checks prompt-injection phrases, then tokenizes detected PII.\n\nExample:\n\n```python\nfrom privacy_gateway import PrivacyGatewayFilter\n\nfilter_ = PrivacyGatewayFilter.from_settings()\n\ninbound = filter_.process_inbound_privacy_text(\"hello \u003csecret:1:...\u003e\")\nif inbound.error:\n    ...\nif inbound.decision.blocked:\n    ...\nplaintext_for_ai = inbound.content\n\noutbound = filter_.process_outbound_privacy_text(\"User 张三 can be reached at zhangsan@example.com\")\nprotected_for_client = outbound.content\n```\n\n`TextProcessingResult` contains `content`, `decision`, and optional normalized\n`error` details. It never exposes the password/key used for encryption.\n\n## Detection behavior\n\nThe library uses Presidio Analyzer backed by a prepared spaCy model. By default\nit expects `en_core_web_sm` to be installed before startup and refuses to\ndownload models at runtime. Prepare it with:\n\n```bash\nuv run python scripts/prepare_spacy_model.py en_core_web_sm\n```\n\nPresidio recognizers cover common English PII patterns such as:\n\n- `EMAIL_ADDRESS`\n- `PHONE_NUMBER`\n- `CREDIT_CARD`\n- `CRYPTO`\n- `IBAN_CODE`\n- `IP_ADDRESS`\n- `LOCATION`\n- `PERSON` when available from configured analyzers\n- US identifiers such as `US_SSN`, `US_PASSPORT`, `US_DRIVER_LICENSE`, etc.\n\nThe library also adds deterministic rules for common Chinese/business text:\n\n- Chinese mainland ID card numbers\n- Chinese mobile numbers\n- common Chinese name contexts such as `我叫张三` / `姓名是张三`\n- common Chinese address contexts such as `住在北京市...` / `地址是...`\n- password/secret contexts such as `password=...` / `密码是...`\n\nNo PII detector is perfect. Gateways that know a string is sensitive because of\nits JSON field name should pass the complete field value to `protect_secret(...)`.\nThis keeps JSON/field logic outside the library while still using the same token\nformat and crypto.\n\n## JSON and gateway integration\n\nThe core library does **not** parse or format JSON. For JSON APIs, the gateway\nmust:\n\n1. parse JSON first (`json.loads` or framework equivalent),\n2. walk the resulting object/list,\n3. pass selected string values to `process_inbound_privacy_text`,\n   `process_outbound_privacy_text`, or `protect_secret`,\n4. serialize JSON again.\n\nThis prevents unsafe raw-string rewriting of JSON and lets gateway code decide\nwhich message/tool-call/AI-output fields are relevant.\n\n## Image privacy APIs\n\nImages are never protected by encrypting the whole image. The image API analyzes\nthe image for sensitive OCR/PII bounding boxes, protects only those pixel regions,\nand leaves the rest of the image viewable.\n\nPreferred image APIs:\n\n- `protect_image(content, crypto_key)` -\u003e detects sensitive regions in a base64 image and returns a base64 PNG with protected rectangles.\n- `restore_image(content, crypto_key)` -\u003e restores protected rectangles from the region cache or embedded fallback metadata.\n\nCompatibility payload helpers also use this image behavior:\n\n- `encrypt_payload(\"image\", content, crypto_key)` protects detected regions; it does not encrypt the full image.\n- `decrypt_payload(\"image\", content, crypto_key)` restores protected regions.\n- `restore_payload(\"image\", content, crypto_key)` is an alias for `decrypt_payload`.\n\nFor each detected region, the service encrypts the full-quality crop and stores\nit in a process-local LRU cache keyed by the encrypted crop's SHA-256 hash. The\ncache keeps the newest 1000 region entries. The returned PNG embeds the region\nhash and an encrypted low-resolution fallback crop. Restore behavior is:\n\n1. hash cache hit -\u003e restore the original full-quality region,\n2. cache miss -\u003e decrypt the embedded low-resolution fallback and scale it back\n   into place.\n\nImage region detection uses `presidio-image-redactor`/OCR plus the same prepared\nPresidio analyzer configuration as text PII detection. If OCR/region detection\nfails, image protection fails closed with `ImageCryptoError` rather than silently\nreturning an unprotected image. If detection succeeds and finds no sensitive\nregions, the original image base64 is returned unchanged.\n\nDeployments must provide the OCR runtime expected by `presidio-image-redactor`\n(for example Tesseract in container images) in addition to the prepared spaCy\nmodel.\n\n## Backward-compatible text crypto\n\nExisting whole-text APIs remain available for older callers and tests:\n\n- `encrypt_text(content, crypto_key=None)` -\u003e `str`\n- `decrypt_text(content, crypto_key=None)` -\u003e `str`\n- `encrypt_payload(\"text\", content, crypto_key)`\n- `decrypt_payload(\"text\", content, crypto_key)`\n- `restore_payload(\"text\", content, crypto_key)`\n\nText crypto requires AES key byte lengths `{16, 24, 32}`. New automatic privacy\nflows should prefer `\u003csecret:1:...\u003e` tokenization instead of whole-body payload\nencryption.\n\n## Filter / detection\n\n- `check_text(text)` -\u003e `FilterDecision`\n- `stream_matcher(max_window=None).feed(chunk)` -\u003e `FilterDecision`\n\n`check_text` masks `\u003csecret:1:...\u003e` token bodies before prompt-injection checks,\nso ciphertext is not misinterpreted as plaintext instructions.\n\n## HTTP adapter primitives\n\n`privacy_gateway.adapters.http` exposes pure data helpers for HTTP gateways\nwithout importing FastAPI, Flask, APISIX, or any networking framework:\n\n```python\nfrom privacy_gateway.adapters.http import build_block_error\n```\n\nLegacy encrypted-header helpers are still exported for old integrations, but new\nautomatic token flows should not depend on them.\n\n## Settings\n\nRead environment variables:\n\n- `PRIVACY_GATEWAY_PASSWORD`: password for reversible `\u003csecret:1:...\u003e` tokens.\n- `PRIVACY_GATEWAY_CRYPTO_KEY`: optional legacy full-text AES key; also used as token password fallback when `PRIVACY_GATEWAY_PASSWORD` is not set.\n- `PRIVACY_GATEWAY_PII_ENTITIES`: comma-separated Presidio entity types to enable.\n- `PRIVACY_GATEWAY_SPACY_MODEL`: prepared spaCy model name/path, default `en_core_web_sm`.\n- `PRIVACY_GATEWAY_REQUIRE_SPACY_MODEL`: require the model at startup, default `true`.\n- `PRIVACY_GATEWAY_SENSITIVE_PHRASES`: comma-separated prompt-injection phrase list.\n- `PRIVACY_GATEWAY_MAX_SENSITIVE_STREAM_WINDOW`: integer window size, default `4096`.\n\nCreate a configured filter from env:\n\n```python\nfrom privacy_gateway import PrivacyGatewayFilter\nfrom privacy_gateway.config import get_settings\n\nfilter_ = PrivacyGatewayFilter.from_settings(get_settings())\n```\n\n## Validation\n\nRun behavior tests on core library APIs:\n\n```bash\nuv run python scripts/prepare_spacy_model.py en_core_web_sm\nuv run behave\nuv run python -m compileall -q src/privacy_gateway\n```\n\nTo include the APISIX example files in syntax validation:\n\n```bash\nuv run python -m compileall -q src/privacy_gateway \\\n  apisix-plugin-example/init \\\n  apisix-plugin-example/privacy_proxy \\\n  apisix-plugin-example/runner/apisix/plugins \\\n  apisix-plugin-example/upstream \\\n  apisix-plugin-example/tests\n```\n\n## Error classes\n\n- `TextCryptoError`\n- `TextCryptoKeyError`\n- `ImageCryptoError`\n- `UnsupportedPayloadTypeError`\n- `PrivacyGatewayError`\n\n## Notes\n\n- This repository intentionally provides **library primitives only**.\n- Gateway plugins should import and embed `PrivacyGatewayFilter`.\n- Gateway plugins should own JSON parsing/field selection and use this library for natural-language string protection/restoration.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxz-dev%2Fai-gateway-filter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxz-dev%2Fai-gateway-filter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxz-dev%2Fai-gateway-filter/lists"}