{"id":49673421,"url":"https://github.com/screenpipe/privacy-filter","last_synced_at":"2026-05-12T00:02:01.496Z","repository":{"id":353236655,"uuid":"1218498703","full_name":"screenpipe/privacy-filter","owner":"screenpipe","description":"PII token classifier (openai/privacy-filter) for Tinfoil confidential-compute deploys","archived":false,"fork":false,"pushed_at":"2026-05-07T00:38:44.000Z","size":56,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-07T01:27:18.939Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/screenpipe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-23T00:03:36.000Z","updated_at":"2026-05-07T00:38:47.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/screenpipe/privacy-filter","commit_stats":null,"previous_names":["screenpipe/privacy-filter"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/screenpipe/privacy-filter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/screenpipe%2Fprivacy-filter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/screenpipe%2Fprivacy-filter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/screenpipe%2Fprivacy-filter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/screenpipe%2Fprivacy-filter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/screenpipe","download_url":"https://codeload.github.com/screenpipe/privacy-filter/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/screenpipe%2Fprivacy-filter/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32917885,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-11T17:09:15.040Z","status":"ssl_error","status_checked_at":"2026-05-11T17:08:45.420Z","response_time":120,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-07T01:15:00.590Z","updated_at":"2026-05-12T00:02:01.491Z","avatar_url":"https://github.com/screenpipe.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# privacy-filter\n\nCPU-only HTTP wrapper around two PII models, in one Tinfoil container:\n\n1. [`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter) — 1.5B-param MoE (50M active) token classifier for **text PII**. Endpoint `POST /filter`.\n2. [`screenpipe/pii-image-redactor`](https://huggingface.co/screenpipe/pii-image-redactor) (`rfdetr_v8`) — RF-DETR-Nano detector for **image PII** in screenshots. Endpoint `POST /image/detect`.\n\nBoth deploy inside the same [Tinfoil](https://tinfoil.sh) confidential-compute enclave so neither pixels nor text leave an attested runtime. Intended to sit in front of screenpipe's outbound LLM calls (text path) and behind screenpipe's image-PII reconciliation worker (image path) so user data is masked before it reaches anywhere else.\n\nOne container, one image hash, one attestation measurement, one client config (one URL, one auth token).\n\n## API\n\n```\nGET  /health         → {\"status\": \"ok\", \"model_ready\": true, \"image_model_ready\": true, ...}\n\nPOST /filter         → {\"text\": \"My email is alice@foo.com\"}\n                    ←  {\"redacted\": \"My email is [EMAIL]\",\n                        \"spans\": [{\"label\": \"private_email\", \"start\": 12, \"end\": 25,\n                                   \"text\": \"alice@foo.com\", \"score\": 0.99}],\n                        \"latency_ms\": 180,\n                        \"model\": \"openai/privacy-filter\"}\n\nPOST /image/detect   → {\"image_b64\": \"\u003cb64-jpg-or-png\u003e\", \"threshold\": 0.30}\n                    ←  {\"detections\": [{\"bbox\": [x, y, w, h], \"label\": \"private_person\", \"score\": 0.95},\n                                       {\"bbox\": [x, y, w, h], \"label\": \"secret\",        \"score\": 0.91}],\n                        \"latency_ms\": 32, \"model\": \"rfdetr_v8\",\n                        \"width\": 2880, \"height\": 1800}\n```\n\nBbox is `[x, y, w, h]` in ORIGINAL-image pixel space (the server un-resizes from its 320×320 internal input). Labels are the canonical 12-class screenpipe PII taxonomy.\n\n## Local development\n\n```bash\n# build\ndocker build -t privacy-filter:dev .\n\n# run\ndocker run --rm -p 8080:8080 privacy-filter:dev\n\n# smoke test\ncurl -s http://localhost:8080/health\ncurl -s -X POST http://localhost:8080/filter \\\n     -H 'Content-Type: application/json' \\\n     -d '{\"text\":\"Call Alice at +1 415 555 0100 about alice@example.com\"}' | jq\n\n# Image: send a JPG/PNG as base64.\nB64=$(base64 -i some_screenshot.png)\ncurl -s -X POST http://localhost:8080/image/detect \\\n     -H 'Content-Type: application/json' \\\n     -d \"$(jq -nc --arg img \"$B64\" '{image_b64: $img, threshold: 0.30}')\" | jq\n```\n\nFirst build pre-downloads the 1.5B text model (~3 GB bf16) AND the 108 MB rfdetr_v8 ONNX into the image, so expect a 5–10 min initial build. Subsequent builds hit Docker's layer cache. The image-model `ADD --checksum=` directive verifies the SHA-256 against the value pinned in the `Dockerfile` so a rebuild can't silently drift to a different upstream weight.\n\n## Deploy to Tinfoil\n\n1. **Push the image to a public registry** (GitHub Container Registry):\n\n   ```bash\n   VERSION=v0.1.0\n   docker build -t ghcr.io/screenpipe/privacy-filter:$VERSION .\n   docker push ghcr.io/screenpipe/privacy-filter:$VERSION\n\n   # Grab the digest for tinfoil-config.yml (Tinfoil requires pinned digests).\n   docker inspect --format='{{index .RepoDigests 0}}' \\\n     ghcr.io/screenpipe/privacy-filter:$VERSION\n   ```\n\n2. **Pin the digest** in `tinfoil-config.yml` — replace the `REPLACE_WITH_DIGEST`\n   sentinel with the full `sha256:...` from the previous step. Commit and tag:\n\n   ```bash\n   git add tinfoil-config.yml\n   git commit -m \"release: privacy-filter $VERSION\"\n   git tag $VERSION \u0026\u0026 git push origin main --tags\n   ```\n\n3. **Click-through in the Tinfoil dashboard** (https://dash.tinfoil.sh):\n   - create an org (if you haven't already)\n   - connect the GitHub app to this repo\n   - pick the tag to deploy\n   - wait for status = `Running` (cold start ~30–60 s for the first model load)\n\n4. **Verify** — the service is now reachable at\n   `https://privacy-filter.\u003corg\u003e.containers.tinfoil.dev/health`.\n\n## Resource sizing (GPU)\n\n**Text model (OPF) — BF16 on CUDA:**\n\n| Metric | Value |\n|---|---|\n| Weights (BF16) | ~3 GB VRAM |\n| Active params per token | 50 M (MoE top-4 of 128 experts) |\n| Attention window | 257 tokens (banded, O(N)) |\n| H100 latency (512 tokens) | ~50–100 ms |\n| H100 latency (~2 KB OCR row, 600 tok) | ~150–300 ms |\n\n**Image model (rfdetr_v8) — TensorRT/CUDA EP:**\n\n| Metric | Value |\n|---|---|\n| Weights (FP32 ONNX) | 108 MB |\n| Params | ~25 M |\n| Input resolution | 320×320 |\n| H100 latency (per frame) | ~25–35 ms |\n| Bench accuracy | 95.3% zero-leak / 0% oversmash on screenpipe-pii-bench-image val |\n\n**Combined runtime** (32 GB CVM RAM + 80 GB H100 VRAM): ~5 GB GPU working set, ~30 req/sec short-text or ~30 frames/sec image, more if interleaved.\n\nThe CPU-only build (preserved on git history before v0.2.0) was the original deploy target — fits 8 vCPU / 32 GB RAM but at ~10× the latency. Switch back if cost dominates and the worker queue isn't backing up.\n\n## Security properties\n\n- **Tinfoil remote attestation** covers the exact image digest, so clients can verify the specific model bits + server code that handled their request.\n- **Only `/health`, `/filter`, and `/image/detect` are exposed** — Tinfoil's `shim.paths` allowlist blocks every other URL at the enclave boundary, so no introspection / debug endpoints can leak.\n- **Model weights are baked in.** Text model: build-time HF download then `TRANSFORMERS_OFFLINE=1`. Image model: build-time HF download with SHA-256 verification via Docker's `ADD --checksum=`. No runtime HuggingFace calls, so an attacker who subverts DNS can't swap weights out from under the enclave.\n- **Runs as UID 10001 non-root** inside the container.\n\n## Limitations\n\n- **Text:** English-primary. Multilingual coverage per upstream model card varies.\n- **Text:** 128K context upstream; we cap at `MAX_INPUT_TOKENS=8192` (override via env) to keep enclave memory bounded.\n- **Image:** the rfdetr_v8 model was trained on screenpipe-pii-bench-image, which covers Slack / Outlook / Cursor / Terminal / Confluence / GitHub / 1Password / calendars / browsers. Apps with very different UI chrome (e.g. Zoom name overlays on video tiles) are not yet learned — failures should be added back into the bench's synthetic templates rather than fine-tuned on real captures.\n- **Image:** payload cap `MAX_IMAGE_BYTES=20 MB` (override via env). Decoded RGB working buffer is ~3× the payload.\n- Not a compliance certification. One layer in a privacy-by-design stack.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscreenpipe%2Fprivacy-filter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscreenpipe%2Fprivacy-filter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscreenpipe%2Fprivacy-filter/lists"}