{"id":49963781,"url":"https://github.com/shiftbloom-studio/openai-privacy-filter-api","last_synced_at":"2026-05-18T03:38:21.719Z","repository":{"id":356204671,"uuid":"1231186669","full_name":"shiftbloom-studio/openai-privacy-filter-api","owner":"shiftbloom-studio","description":"A small, inspectable FastAPI service and Next.js sandbox for running openai/privacy-filter. It detects privacy-related spans in text, applies configurable redaction, and exposes a minimal API that can be deployed behind a server-side web proxy.","archived":false,"fork":false,"pushed_at":"2026-05-07T03:16:45.000Z","size":1194,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-07T04:37:48.300Z","etag":null,"topics":["aisafety","classification","llm","openai","privacy"],"latest_commit_sha":null,"homepage":"https://privacy.shiftbloom.studio/","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shiftbloom-studio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-06T18:03:39.000Z","updated_at":"2026-05-07T03:16:15.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/shiftbloom-studio/openai-privacy-filter-api","commit_stats":null,"previous_names":["shiftbloom-studio/openai-privacy-filter-api"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/shiftbloom-studio/openai-privacy-filter-api","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shiftbloom-studio%2Fopenai-privacy-filter-api","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shiftbloom-studio%2Fopenai-privacy-filter-api/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shiftbloom-studio%2Fopenai-privacy-filter-api/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shiftbloom-studio%2Fopenai-privacy-filter-api/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shiftbloom-studio","download_url":"https://codeload.github.com/shiftbloom-studio/openai-privacy-filter-api/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shiftbloom-studio%2Fopenai-privacy-filter-api/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33163847,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-17T22:39:12.733Z","status":"online","status_checked_at":"2026-05-18T02:00:06.436Z","response_time":71,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aisafety","classification","llm","openai","privacy"],"created_at":"2026-05-18T03:38:17.863Z","updated_at":"2026-05-18T03:38:21.699Z","avatar_url":"https://github.com/shiftbloom-studio.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OpenAI Privacy Filter API\n\n[![CI](https://github.com/shiftbloom-studio/openai-privacy-filter-api/actions/workflows/ci.yml/badge.svg)](https://github.com/shiftbloom-studio/openai-privacy-filter-api/actions/workflows/ci.yml)\n[![Deploy AWS](https://github.com/shiftbloom-studio/openai-privacy-filter-api/actions/workflows/deploy-aws.yml/badge.svg)](https://github.com/shiftbloom-studio/openai-privacy-filter-api/actions/workflows/deploy-aws.yml)\n[![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](LICENSE)\n\nA small, inspectable FastAPI service and Next.js sandbox for running\n[`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter). It detects privacy-related\nspans in text, applies configurable redaction, and exposes a minimal API that can be deployed behind\na server-side web proxy.\n\nThis project is intentionally narrow: it is a testable wrapper around a privacy-detection model, not\na production policy engine, classifier benchmark, or complete data governance system.\n\n## Features\n\n- FastAPI API with `/health` and `/v1/filter`.\n- Next.js sandbox UI for testing sample text and inspecting detected spans.\n- Redaction modes: `mask`, `remove`, and `annotate`.\n- Optional internal-token protection between the web proxy and API.\n- Docker images for the API and web app.\n- AWS App Runner deployment workflow with path-aware API/web redeploys.\n- Unit tests for schemas, redaction, API behavior, Lambda adapter, web proxy, and UI behavior.\n- Model files are kept outside source control and can be restored from deployment artifacts.\n\n## Architecture\n\n```mermaid\nflowchart LR\n  User[\"Browser\"] --\u003e Web[\"Next.js sandbox\"]\n  Web --\u003e Proxy[\"/api/filter server route\"]\n  Proxy --\u003e API[\"FastAPI /v1/filter\"]\n  API --\u003e Model[\"openai/privacy-filter\"]\n  API --\u003e Redaction[\"Redaction logic\"]\n  Redaction --\u003e API\n  API --\u003e Proxy\n  Proxy --\u003e Web\n```\n\nThe browser never calls the model API directly. The Next.js app proxies requests through its\nserver-side route, optionally adding `PRIVACY_FILTER_INTERNAL_TOKEN` so the API can reject direct\npublic traffic.\n\n## Repository Layout\n\n```text\napps/\n  api/      FastAPI service, Lambda adapter, redaction logic, and tests\n  web/      Next.js App Router sandbox, API proxy, and tests\ninfra/\n  docker/   API and web Dockerfiles\ndocs/       API contract and deployment notes\n```\n\n## Requirements\n\n- Python 3.11 or newer\n- Node.js 24 or newer\n- npm\n- Docker, optional but recommended for deployment parity\n- AWS CLI, only for managing the included Shiftbloom AWS deployment\n\nThe real model runtime requires `torch` and `transformers`. Unit tests do not download or load the\nmodel unless the explicit real-model smoke test is enabled.\n\n## Quick Start\n\nClone the repo and copy the example environment:\n\n```bash\ngit clone https://github.com/shiftbloom-studio/openai-privacy-filter-api.git\ncd openai-privacy-filter-api\ncp .env.example .env\n```\n\nStart the API without inference dependencies:\n\n```bash\npython3 -m venv .venv\nsource .venv/bin/activate\npython -m pip install -e \"apps/api[dev]\"\nuvicorn privacy_filter_api.main:app --app-dir apps/api/src --reload\n```\n\nStart the web sandbox in another shell:\n\n```bash\nnpm install\nPRIVACY_FILTER_API_URL=http://localhost:8000 npm --workspace apps/web run dev\n```\n\nOpen `http://localhost:3000`.\n\n## Running the Real Model Locally\n\nInstall inference dependencies:\n\n```bash\nsource .venv/bin/activate\npython -m pip install -e \"apps/api[dev,inference]\"\n```\n\nRun the API with a Hugging Face cache directory:\n\n```bash\nHF_HOME=.hf-cache uvicorn privacy_filter_api.main:app --app-dir apps/api/src --reload\n```\n\nThe first request that needs inference may download model files. To run the smoke test explicitly:\n\n```bash\nRUN_REAL_MODEL_TESTS=1 python -m pytest apps/api/tests/test_real_model_smoke.py\n```\n\n## API Usage\n\nHealth check:\n\n```bash\ncurl http://localhost:8000/health\n```\n\nFilter text:\n\n```bash\ncurl -X POST http://localhost:8000/v1/filter \\\n  -H \"content-type: application/json\" \\\n  -d '{\n    \"text\": \"My name is Alice Smith and my email is alice@example.com.\",\n    \"mode\": \"mask\",\n    \"mask_token\": \"[REDACTED]\",\n    \"include_spans\": true\n  }'\n```\n\nExample response:\n\n```json\n{\n  \"original_text\": \"My name is Alice Smith and my email is alice@example.com.\",\n  \"filtered_text\": \"My name is [REDACTED] and my email is [REDACTED].\",\n  \"spans\": [\n    {\n      \"label\": \"private_person\",\n      \"start\": 11,\n      \"end\": 22,\n      \"text\": \"Alice Smith\",\n      \"score\": 0.97\n    }\n  ],\n  \"model\": \"openai/privacy-filter\"\n}\n```\n\nSupported labels:\n\n- `account_number`\n- `private_address`\n- `private_email`\n- `private_person`\n- `private_phone`\n- `private_url`\n- `private_date`\n- `secret`\n\nSupported modes:\n\n- `mask`: replace each accepted span with `mask_token`.\n- `remove`: remove each accepted span.\n- `annotate`: replace each accepted span with `[label:value]`.\n\nSee [docs/api.md](docs/api.md) for the API contract.\n\n## Configuration\n\n| Variable | Used by | Default | Description |\n| --- | --- | --- | --- |\n| `PRIVACY_FILTER_MODEL_ID` | API | `openai/privacy-filter` | Hugging Face model id reported by the service and used when no model path is set. |\n| `PRIVACY_FILTER_MODEL_PATH` | API | empty | Local model directory. Use this for baked or mounted model files. |\n| `PRIVACY_FILTER_RUNTIME` | API | `local` | Runtime label returned by `/health`. |\n| `PRIVACY_FILTER_CORS_ORIGINS` | API | `http://localhost:3000,https://privacy.shiftbloom.studio` | Comma-separated CORS allowlist. |\n| `PRIVACY_FILTER_INTERNAL_TOKEN` | API and web | empty | Optional shared token. The web proxy sends it to the API. |\n| `PRIVACY_FILTER_DEVICE` | API | empty | Optional Transformers device setting. |\n| `PRIVACY_FILTER_REVISION` | API | empty | Optional Hugging Face model revision. |\n| `PRIVACY_FILTER_TRUST_REMOTE_CODE` | API | `false` | Enables remote model code if a future revision requires it. |\n| `HF_HOME` | API | `.hf-cache` locally | Hugging Face cache directory. |\n| `PRIVACY_FILTER_API_URL` | web | `http://localhost:8000` | API base URL used by the Next.js server-side proxy. |\n\n## Verification\n\nAPI:\n\n```bash\nsource .venv/bin/activate\npython -m ruff check apps/api\npython -m pytest apps/api\n```\n\nWeb:\n\n```bash\nnpm --workspace apps/web run lint\nnpm --workspace apps/web run typecheck\nnpm --workspace apps/web run test\nnpm --workspace apps/web run build\n```\n\nDocker smoke build:\n\n```bash\ndocker build -f infra/docker/api.Dockerfile --build-arg API_EXTRAS= -t privacy-filter-api:core .\ndocker build -f infra/docker/web.Dockerfile -t privacy-filter-web .\n```\n\n## Docker\n\nRun both services locally with Docker Compose:\n\n```bash\ndocker compose up --build\n```\n\nBuild the API image:\n\n```bash\ndocker build -f infra/docker/api.Dockerfile -t privacy-filter-api .\n```\n\nBuild the web image:\n\n```bash\ndocker build -f infra/docker/web.Dockerfile -t privacy-filter-web .\n```\n\nThe API Dockerfile copies `privacy-filter-model/` into `/models/privacy-filter`. For production\noffline inference, place the required model files there before building and set\n`PRIVACY_FILTER_MODEL_PATH=/models/privacy-filter`.\n\nRequired model files:\n\n- `config.json`\n- `model.safetensors`\n- `tokenizer.json`\n- `tokenizer_config.json`\n- `viterbi_calibration.json`\n\nDo not commit model files to the repository.\n\n## Deployment\n\nThe project can run anywhere that supports Docker containers. The included deployment notes cover\nDocker, AWS Lambda container images, Google Cloud Run, Cloudflare routing, and the current AWS App\nRunner setup. See [docs/deployment.md](docs/deployment.md).\n\n### Shiftbloom AWS App Runner\n\nThis repository includes a GitHub Actions workflow for the current Shiftbloom deployment:\n\n- API App Runner service: `privacy-filter-api`\n- Web App Runner service: `privacy-filter-web`\n- AWS region: `eu-central-1`\n- ECR repositories: `privacy-filter-api`, `privacy-filter-web`\n- Model artifact bucket: `shiftbloom-privacy-filter-build-349744179866-eu-central-1`\n\nOn pushes to `main`, [deploy-aws.yml](.github/workflows/deploy-aws.yml) detects changed paths and\ndeploys only the affected surface. It can also be run manually with `all`, `api`, `web`, or `auto`.\n\nForks should replace the AWS account id, service ARNs, ECR repositories, artifact bucket, and OIDC\nrole with their own infrastructure.\n\n## Security and Privacy Notes\n\n- Treat model output as advisory. Validate behavior against your data and policy requirements.\n- Do not send sensitive production data to infrastructure you do not control.\n- Use `PRIVACY_FILTER_INTERNAL_TOKEN` when the API is reachable outside a private network.\n- Keep CORS origins narrow in production.\n- Keep model files, caches, credentials, and deployment artifacts out of source control.\n- Review upstream model terms and dependencies before production use.\n\n## Contributing\n\nContributions are welcome. Keep changes focused and include tests for behavior changes.\n\nSuggested flow:\n\n1. Open an issue or draft PR for larger changes.\n2. Run the relevant verification commands locally.\n3. Keep API contract changes documented in [docs/api.md](docs/api.md).\n4. Keep deployment changes documented in [docs/deployment.md](docs/deployment.md).\n\nPlease avoid committing generated model files, local caches, credentials, or machine-specific build\nartifacts.\n\n## License\n\nApache License 2.0. See [LICENSE](LICENSE).\n\n## Acknowledgements\n\nThis project wraps [`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter) through\nstandard Python and web application tooling. Model behavior, supported labels, and runtime\nrequirements may change with upstream model revisions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshiftbloom-studio%2Fopenai-privacy-filter-api","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshiftbloom-studio%2Fopenai-privacy-filter-api","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshiftbloom-studio%2Fopenai-privacy-filter-api/lists"}