{"id":47793484,"url":"https://github.com/datenlabor-bmz/easy-redact","last_synced_at":"2026-04-03T15:59:52.683Z","repository":{"id":339286447,"uuid":"1161265245","full_name":"datenlabor-bmz/easy-redact","owner":"datenlabor-bmz","description":null,"archived":false,"fork":false,"pushed_at":"2026-03-05T16:04:25.000Z","size":740,"stargazers_count":1,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-05T18:48:14.646Z","etag":null,"topics":["ai-agents","foi","freedom-of-information","gdpr","govtech","local-ai","local-first","ner","ollama","pii","privacy","redaction","spacy","vllm"],"latest_commit_sha":null,"homepage":"https://easyredact.io","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datenlabor-bmz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-18T23:09:24.000Z","updated_at":"2026-03-05T16:17:47.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/datenlabor-bmz/easy-redact","commit_stats":null,"previous_names":["datenlabor-bmz/easy-redact"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/datenlabor-bmz/easy-redact","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datenlabor-bmz%2Feasy-redact","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datenlabor-bmz%2Feasy-redact/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datenlabor-bmz%2Feasy-redact/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datenlabor-bmz%2Feasy-redact/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datenlabor-bmz","download_url":"https://codeload.github.com/datenlabor-bmz/easy-redact/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datenlabor-bmz%2Feasy-redact/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31362607,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-03T15:19:21.178Z","status":"ssl_error","status_checked_at":"2026-04-03T15:19:20.670Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","foi","freedom-of-information","gdpr","govtech","local-ai","local-first","ner","ollama","pii","privacy","redaction","spacy","vllm"],"created_at":"2026-04-03T15:59:52.054Z","updated_at":"2026-04-03T15:59:52.675Z","avatar_url":"https://github.com/datenlabor-bmz.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"[[Gitlab repo](https://gitlab.opencode.de/datenlabor-bmz/easy-redact) | [Github mirror](https://github.com/datenlabor-bmz)]\n\n# EasyRedact\n\nAI-assisted PDF redaction tool for German federal ministries (BMZ). Upload a PDF or DOCX, have the AI suggest redactions via an interactive chat, review and adjust them, then export a fully redacted document — everything runs in the browser, documents never leave the machine unless you choose Cloud AI mode.\n\n## Features\n\n- **Two redaction modes**\n  - **PII** — redacts personal data: names, addresses, emails, phone numbers, bank details, dates of birth\n  - **FOI / IFG** — redacts based on the exemption clauses of a chosen Freedom of Information law; jurisdiction rules are loaded at runtime from [`datenlabor-bmz/redaction-rules`](https://github.com/datenlabor-bmz/redaction-rules)\n\n- **Cloud AI and Local AI** (switchable at any time via the mode selector)\n  - **Cloud AI** — any OpenAI-compatible LLM endpoint (e.g. Azure AI Foundry, GDPR-compliant, no data retention)\n  - **Local AI** — processes documents on your own infrastructure or in the browser; configured via `LOCAL_AI` (see below)\n\n- **AI chat assistant** — reads the document, asks targeted clarifying questions, then suggests redactions with exact text matches, confidence ratings, affected persons, and legal justifications\n\n- **Manual redactions** — draw rectangles or select text directly on the PDF without AI involvement\n\n- **Multi-document tabs** — open multiple PDFs at once; session (documents, redactions, chat) is persisted in IndexedDB and survives page reload\n\n- **Export**\n  - Preview PDF — yellow highlight boxes for review and sign-off\n  - Redacted PDF — text permanently removed via MuPDF, ready for publication\n\n- **DOCX → PDF conversion** — LibreOffice-backed, Docker deployment only\n\n## Architecture\n\nThree-panel layout rendered entirely client-side:\n\n| Panel | Content |\n|-------|---------|\n| Left | Redaction list grouped by person/page; accept / ignore controls; FOI rule assignment |\n| Center | PDF viewer (MuPDF WASM in a Comlink worker); zoom; export |\n| Right | Chat or NLP panel; AI mode selector; streaming SSE from `/api/chat` |\n\nThe Next.js API routes (`/api/chat`, `/api/docx`, `/api/nlp`) are thin server-side proxies — all document rendering and redaction geometry stay in the browser.\n\n## Getting Started\n\n```bash\nnpm install\nnpm run dev\n```\n\nOpen [http://localhost:3000](http://localhost:3000). Copy `.env.example` to `.env` and fill in at minimum the Cloud LLM credentials.\n\n## Environment Variables\n\nSee `.env.example` for the full list. Key variables:\n\n```env\n# Cloud LLM (any OpenAI-compatible API — Azure AI Foundry, OpenAI, etc.)\nCLOUD_LLM_API_BASE=https://YOUR-RESOURCE.openai.azure.com/openai/v1\nCLOUD_LLM_API_KEY=\nCLOUD_LLM_MODEL=gpt-5.1\n\n# Local LLM (Ollama, vLLM, llama.cpp, or other OpenAI-compatible API — used when LOCAL_AI=llm)\nLOCAL_LLM_API_BASE=http://localhost:11434/v1\nLOCAL_LLM_API_KEY=ollama\nLOCAL_LLM_MODEL=llama3.3:latest\n\n# Which local AI mode: 'ner-browser' (default), 'llm', or 'ner'\nLOCAL_AI=ner-browser\n\n# Set to 'false' to hide the Cloud AI option from the UI\nCLOUD_AI=true\n\n# Default UI language (optional): en, de, fr, es, ru, ar, zh\n# DEFAULT_LOCALE=de\n```\n\n### Azure AI Foundry setup\n\nTo use Azure OpenAI as the Cloud LLM, create an Azure AI Foundry resource and set:\n\n```env\nCLOUD_LLM_API_BASE=https://YOUR-RESOURCE.openai.azure.com/openai/v1\nCLOUD_LLM_API_KEY=your-azure-api-key\nCLOUD_LLM_MODEL=gpt-5.1\n```\n\nThe `/openai/v1` path exposes an OpenAI-compatible API. No Azure-specific SDK configuration is needed.\n\n### Deployment profiles\n\n| Profile | `CLOUD_AI` | `LOCAL_AI` | Use case |\n|---------|-----------|------------|----------|\n| Online demo | `true` (default) | `ner-browser` (default) | easyredact.io — Cloud AI + in-browser NLP |\n| On-premise (GPU) | `true` | `llm` | Cloud AI + local LLM (Ollama, vLLM, llama.cpp) |\n| On-premise (CPU) | `true` | `ner` | Cloud AI + spaCy NER for standard hardware |\n| Air-gapped | `false` | `llm` or `ner` | No cloud connection at all |\n\n## Docker\n\n### Pre-built images\n\nPre-built images are published to GitHub Container Registry on every release:\n\n```bash\n# Standard image (serves at /)\ndocker pull ghcr.io/datenlabor-bmz/easy-redact:latest\ndocker run -p 3000:3000 --env-file .env ghcr.io/datenlabor-bmz/easy-redact:latest\n\n# Image with BASE_PATH=/easyredact (serves at /easyredact/)\ndocker pull ghcr.io/datenlabor-bmz/easy-redact-with-base-path:latest\ndocker run -p 3000:3000 --env-file .env ghcr.io/datenlabor-bmz/easy-redact-with-base-path:latest\n```\n\n### Building from source\n\nThe Dockerfile bundles LibreOffice (DOCX conversion), Python + uv, and the German spaCy model (`de_core_news_lg`). By default it sets `LOCAL_AI=ner`:\n\n```bash\ndocker build -t easy-redact .\ndocker run -p 3000:3000 --env-file .env easy-redact\n```\n\nFor production deployment on `linux/amd64` (e.g. when building on Apple Silicon):\n\n```bash\ndocker buildx build --platform linux/amd64 -t easy-redact .\n```\n\nTo use a local LLM instead of spaCy, override `LOCAL_AI` at runtime:\n\n```bash\ndocker run -p 3000:3000 -e LOCAL_AI=llm --env-file .env easy-redact\n```\n\nDOCX upload and spaCy NLP are only available in the Docker build; they return HTTP 501 otherwise.\n\n### Subpath deployment\n\nTo serve the app under a subpath (e.g. `acme.bund.de/easyredact/`), pass `BASE_PATH` at build time:\n\n```bash\ndocker build --platform linux/amd64 --build-arg BASE_PATH=/easyredact -t easy-redact .\n```\n\nThis sets the Next.js `basePath`, which rewrites all routes, assets, and API endpoints. Configure nginx to forward requests without stripping the prefix:\n\n```nginx\nlocation /easyredact/ {\n    proxy_pass http://127.0.0.1:3000;\n    proxy_set_header Host $host;\n    proxy_set_header X-Real-IP $remote_addr;\n    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;\n    proxy_set_header X-Forwarded-Proto $scheme;\n}\n```\n\n## Tech Stack\n\n- **Next.js 15** (App Router) + **React 19**\n- **MuPDF** 1.27 — PDF rendering and redacted export via WASM + Comlink web worker\n- **OpenAI SDK** — standard `OpenAI` client for both cloud and local LLMs; streaming chat completions with function calling\n- **spaCy** (`de_core_news_lg`) — German NER, invoked via a Python script with `uv run`\n- **Tailwind CSS v4** + **shadcn/ui** (Radix primitives)\n- **IndexedDB** (via `idb`) — client-side persistence for files, session state, and chat history\n\n## See Also\n\n- [`datenlabor-bmz/redaction-ui`](https://github.com/datenlabor-bmz/redaction-ui) — standalone React component library (`@datenlabor-bmz/redaction-ui`) for PDF viewing and redaction, published to npm for use in other applications\n- [`datenlabor-bmz/redaction-rules`](https://github.com/datenlabor-bmz/redaction-rules) — machine-readable FOI exemption rules by jurisdiction, fetched at runtime in FOI mode\n\n## License\n\nAGPL-3.0. This project uses [MuPDF](https://mupdf.com/licensing/) which is licensed under the GNU Affero General Public License.\n\n## Credits\n\nBuilt by the [BMZ DataLab](https://www.bmz-digital.global/en/overview-of-initiatives/the-bmz-data-lab/), the data science unit of Germany's Federal Ministry for Economic Cooperation and Development.\n\nFunded by the European Union — [NextGenerationEU](https://next-generation-eu.europa.eu).\n\n\u003ca href=\"https://next-generation-eu.europa.eu\"\u003e\u003cimg src=\"public/logo-nextgen-eu.jpg\" alt=\"NextGenerationEU\" height=\"80\"\u003e\u003c/a\u003e  \u003ca href=\"https://www.bmz-digital.global/en/overview-of-initiatives/the-bmz-data-lab/\"\u003e\u003cimg src=\"public/logo-datalab.svg\" alt=\"BMZ DataLab\" height=\"60\"\u003e\u003c/a\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatenlabor-bmz%2Feasy-redact","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatenlabor-bmz%2Feasy-redact","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatenlabor-bmz%2Feasy-redact/lists"}