{"id":25673115,"url":"https://github.com/agentica5/a5-pii-anonymizer","last_synced_at":"2025-11-16T22:04:06.813Z","repository":{"id":278089949,"uuid":"934466970","full_name":"AgenticA5/A5-PII-Anonymizer","owner":"AgenticA5","description":"Desktop App with Built-In LLM for Removing Personal Identifiable Information in Documents","archived":false,"fork":false,"pushed_at":"2025-10-08T10:59:57.000Z","size":6252,"stargazers_count":36,"open_issues_count":1,"forks_count":12,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-10-08T12:35:27.145Z","etag":null,"topics":["ai","ai-safety","alm","anonymisation","anonymity","anonymization","artificial-intelligence","desktop-application","electron","electron-desktop","gdpr","hipaa","linux","llm","macos","personal-identifiable-information","pii","privacy","reasoning","windows"],"latest_commit_sha":null,"homepage":"https://amicus5.com/apps/pa","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AgenticA5.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-17T22:10:04.000Z","updated_at":"2025-10-08T11:00:01.000Z","dependencies_parsed_at":"2025-02-18T00:24:09.116Z","dependency_job_id":"3ccd69e1-7224-40f6-8add-f66fe74ae26e","html_url":"https://github.com/AgenticA5/A5-PII-Anonymizer","commit_stats":null,"previous_names":["agentica5/a5-pii-anonymizer"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/AgenticA5/A5-PII-Anonymizer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AgenticA5%2FA5-PII-Anonymizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AgenticA5%2FA5-PII-Anonymizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AgenticA5%2FA5-PII-Anonymizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AgenticA5%2FA5-PII-Anonymizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AgenticA5","download_url":"https://codeload.github.com/AgenticA5/A5-PII-Anonymizer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AgenticA5%2FA5-PII-Anonymizer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":284783345,"owners_count":27062629,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-16T02:00:05.974Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-safety","alm","anonymisation","anonymity","anonymization","artificial-intelligence","desktop-application","electron","electron-desktop","gdpr","hipaa","linux","llm","macos","personal-identifiable-information","pii","privacy","reasoning","windows"],"created_at":"2025-02-24T12:53:53.271Z","updated_at":"2025-11-16T22:04:06.770Z","avatar_url":"https://github.com/AgenticA5.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# A5 PII Anonymizer\n\n### Built-In LLM Desktop App Preview:\n\u003cimg src=\"./assets/preview.gif\" alt=\"Chrome Extension Demo Gif\" style=\"display: block; width: 100%; max-width: none;\" /\u003e\n\nThis repository provides an **Electron** desktop application for **locally anonymizing documents** before sending them to advanced Large Language Models (LLMs). By stripping out personal or identifiable information using a **context-aware** model, you can safely train or query external LLMs (e.g., OpenAI’s o3 model) with minimal privacy risk.\n\n## Motivation\n\n- **PII Removal**: Traditional RegEx-based anonymization often fails on nuanced data. With an ONNX-based model, you gain context-aware detection for **names, addresses, phone numbers, etc.**\n- **Safe LLM Usage**: Many companies need to keep real customer or employee data **internal** but still want to leverage powerful external LLMs. This tool helps them do so by anonymizing data **on their end** first.\n- **Flexible**: Supports `.txt`, `.docx`, `.xls(x)`, `.csv`, `.pdf`, and more. Converts text, merges tokens, and replaces them with consistent pseudonyms.\n\n## Key Features\n\n1. **Electron App**: Cross-platform desktop UI built on HTML/CSS/JS.  \n2. **Daily Limit**: **100** documents per day for the free tier (by default). You can raise or remove it if you prefer—this is open source.  \n3. **Context-Aware**: Relies on a local ONNX model downloaded separately (due to GitHub’s file-size constraints).  \n4. **Mapping** (Pro Mode): If you enable Pro, the app can produce a JSON file mapping each original entity (e.g., “John Smith”) to its anonymized token (e.g., “NAME_1”).  \n5. **MIT License**: Free to modify and distribute. We welcome contributions.\n\n## Getting Started\n\n1. **Clone or Download** this repository.  \n    - Model available here: [https://huggingface.co/iiiorg/piiranha-v1-detect-personal-information/tree/main](https://huggingface.co/iiiorg/piiranha-v1-detect-personal-information/tree/main)\n   - Download the ONNX model (`.onnx` files) from our external link (not included here due to size constraints).  \n   - Place it under `./models/protectai/lakshyakh93-deberta_finetuned_pii-onnx/` or as directed in `fileProcessor.js`.  \n3. **Install Dependencies**:  \n   ```bash\n   npm install\n   ```  \n4. **Run (Dev Mode)**:  \n   ```bash\n   npm run dev\n   ```  \n   Or if you prefer:  \n   ```bash\n   npx electron .\n   ```  \n5. **Build / Package** (macOS example):  \n   ```bash\n   npm run build:mac\n\n\n### Basic Usage\n\n- **Drop or Select Files**: The main UI allows you to drag-and-drop or pick multiple files/folders.  \n- **Output Directory**: Choose where the anonymized files should be placed.  \n- **Anonymize**: Click “Anonymize Files” to run.  \n- **Mapping (Pro)**: If you have a Pro key, the app will create an additional `-map.json` file capturing each replaced entity.  \n\n## How It Works\n\n- **Electron**:  \n  - **`main.js`**: Spawns the main window, handles file selection, passes tasks to `FileProcessor`.  \n  - **`renderer.js`**: Manages the UI (index.html), user interactions, daily usage counters, and “Pro” logic.  \n- **`fileProcessor.js`**:  \n  - Loads the local ONNX model (via `@xenova/transformers`).  \n  - Identifies personal data by context (names, addresses, etc.).  \n  - Replaces them with tokens (`NAME_1`, `PHONE_NUMBER_3`, etc.).  \n  - If Pro, writes a JSON mapping for re-identification.  \n- **Local Model**:  \n  - We rely on a context-aware token classification model. This is significantly more effective than simple RegEx for real-world PII.\n\n## Limitations \u0026 Notes\n\n- **Daily 100-File Limit**: By default, the free version only processes 100 documents per day. This is purely enforced in the UI. Since it’s open source, you can remove or change it as needed.  \n- **No Guarantee**: Even context-aware models can miss certain edge cases. Always manually review if 100% privacy is critical.  \n- **Cross-Platform**:  \n  - macOS builds are tested on both M-series (ARM) and Intel.  \n  - Windows and Linux builds are also available.\n\n## Contributing\n\nWe use the **MIT License**, so feel free to open pull requests, modify the code, or adapt it for your needs. If you make improvements or fix bugs, please share them back!\n\n---\n\n**Thanks for checking out the A5 PII Anonymizer.** We hope this helps you safely leverage powerful LLMs with real data while keeping personal information private.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagentica5%2Fa5-pii-anonymizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fagentica5%2Fa5-pii-anonymizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagentica5%2Fa5-pii-anonymizer/lists"}