{"id":37071362,"url":"https://github.com/leo-gan/anonymizer","last_synced_at":"2026-01-14T08:21:03.789Z","repository":{"id":316982925,"uuid":"1041039262","full_name":"leo-gan/anonymizer","owner":"leo-gan","description":"An app and an SDK to anonymize large PDF files","archived":false,"fork":false,"pushed_at":"2025-10-06T16:30:41.000Z","size":288,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-06T17:01:25.930Z","etag":null,"topics":["anonymization","anonymize","anthropic","deanonymization","gemini","healthcare","huggingface-hub","legal-documents","llm","ollama","openai","openrouter","pdf","python","training-data"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/leo-gan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-19T22:22:26.000Z","updated_at":"2025-10-06T16:25:58.000Z","dependencies_parsed_at":"2025-09-28T03:12:18.748Z","dependency_job_id":"2ac87cf6-488a-49cf-a256-006c5925525e","html_url":"https://github.com/leo-gan/anonymizer","commit_stats":null,"previous_names":["leo-gan/anonymizer"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/leo-gan/anonymizer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leo-gan%2Fanonymizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leo-gan%2Fanonymizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leo-gan%2Fanonymizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leo-gan%2Fanonymizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/leo-gan","download_url":"https://codeload.github.com/leo-gan/anonymizer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leo-gan%2Fanonymizer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28413792,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T08:16:59.381Z","status":"ssl_error","status_checked_at":"2026-01-14T08:13:45.490Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anonymization","anonymize","anthropic","deanonymization","gemini","healthcare","huggingface-hub","legal-documents","llm","ollama","openai","openrouter","pdf","python","training-data"],"created_at":"2026-01-14T08:21:02.222Z","updated_at":"2026-01-14T08:21:02.979Z","avatar_url":"https://github.com/leo-gan.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🦉🫥 PDF Anonymizer\n\nThis application anonymizes large PDF, Markdown or Text files using LLMs.\n\n- **High-Quality Anonymization**: Leverages LLMs to identify and replace Personally Identifiable Information (PII) with high accuracy.\n- **Large File Support**: Consistently anonymizes large files (tested up to 1GB).\n- **Multi-Provider \u0026 Cost-Effective**: Free to use with local [Ollama](https://ollama.com/) models. It also supports major providers like [OpenAI](https://openai.com/), [Anthropic](https://www.anthropic.com/), [Google](https://ai.google.com/), [Hugging Face](https://huggingface.co/), and [OpenRouter](https://openrouter.ai/).\n- **Reversible**: Supports deanonymization to recover original data when needed.\n- **Multi-Format**: Works with PDF, Markdown, and plain text files.\n\n## Project Structure\n\nThis project is a monorepo containing two main packages:\n\n- **`packages/pdf-anonymizer-core`**: The core library containing the anonymization and deanonymization logic. See the [core README](./packages/pdf-anonymizer-core/README.md) for more details.\n- **`packages/pdf-anonymizer-cli`**: A command-line interface for using the anonymizer. See the [CLI README](./packages/pdf-anonymizer-cli/README.md) for detailed usage instructions.\n\n## Development Installation\n\n1.  **Install `uv`**: This project uses `uv` for package management. Follow the [official installation instructions](https://astral.sh/docs/uv#installation).\n\n2.  **Clone the repository**:\n    ```bash\n    git clone \u003crepository_url\u003e\n    cd anonymizer\n    ```\n\n3.  **Install dependencies**:\n    ```bash\n    uv sync --group dev\n    ```\n\n4.  **Install Ollama (optional)**: If you want to use a local model for anonymization, install [Ollama](https://ollama.com/).\n\n5.  **Set up environment variables**: Create a `.env` file in the `packages/pdf-anonymizer-cli` directory and add the necessary API keys for the providers you want to use. For example:\n    ```env\n    # For Google models\n    GOOGLE_API_KEY=\"YOUR_GOOGLE_API_KEY\"\n\n    # For OpenAI models\n    OPENAI_API_KEY=\"YOUR_OPENAI_API_KEY\"\n\n    # For Anthropic models\n    ANTHROPIC_API_KEY=\"YOUR_ANTHROPIC_API_KEY\"\n\n    # For Hugging Face models\n    HUGGING_FACE_TOKEN=\"YOUR_HF_TOKEN\"\n\n    # For OpenRouter models\n    OPENROUTER_API_KEY=\"YOUR_OPENROUTER_KEY\"\n    ```\n\n## Quick Start\n\nTo anonymize a file, use the `pdf-anonymizer` command:\n\n```bash\npdf-anonymizer run document.pdf\n```\n\nFor detailed command-line options and examples, please refer to the [**CLI README**](./packages/pdf-anonymizer-cli/README.md).\n\n## Testing\n\nTo run the test suite:\n\n```bash\nuv run pytest\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fleo-gan%2Fanonymizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fleo-gan%2Fanonymizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fleo-gan%2Fanonymizer/lists"}