{"id":28100239,"url":"https://github.com/rpgeeganage/pii-guard","last_synced_at":"2025-10-12T14:47:15.891Z","repository":{"id":291590370,"uuid":"946226847","full_name":"rpgeeganage/pII-guard","owner":"rpgeeganage","description":"🛡️ PII Guard is an LLM-powered tool that detects and manages Personally Identifiable Information (PII) in logs — designed to support data privacy and GDPR compliance","archived":false,"fork":false,"pushed_at":"2025-06-15T18:41:11.000Z","size":655,"stargazers_count":53,"open_issues_count":1,"forks_count":6,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-15T19:05:55.391Z","etag":null,"topics":["ai","large-language-model","large-language-models","llm","pii","pii-detection","privacy-enhancing-technologies","privacy-protection","privacy-tools"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rpgeeganage.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-03-10T20:11:48.000Z","updated_at":"2025-06-15T18:41:14.000Z","dependencies_parsed_at":"2025-05-05T14:50:43.648Z","dependency_job_id":"ec0e5b34-b952-4989-8bef-28c5990bc74f","html_url":"https://github.com/rpgeeganage/pII-guard","commit_stats":null,"previous_names":["rpgeeganage/pii-guard"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/rpgeeganage/pII-guard","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rpgeeganage%2FpII-guard","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rpgeeganage%2FpII-guard/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rpgeeganage%2FpII-guard/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rpgeeganage%2FpII-guard/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rpgeeganage","download_url":"https://codeload.github.com/rpgeeganage/pII-guard/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rpgeeganage%2FpII-guard/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279011609,"owners_count":26084964,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-12T02:00:06.719Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","large-language-model","large-language-models","llm","pii","pii-detection","privacy-enhancing-technologies","privacy-protection","privacy-tools"],"created_at":"2025-05-13T18:32:06.935Z","updated_at":"2025-10-12T14:47:15.857Z","avatar_url":"https://github.com/rpgeeganage.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🛡️ PII Guard\n\n**PII Guard** is an LLM-powered tool that detects and manages Personally Identifiable Information (PII) in logs — designed to support data privacy and GDPR compliance.\n\n\u003e ⚠️ **This is a personal side project**  \n\u003e Built to explore how Large Language Models can detect sensitive data in logs more intelligently than traditional regex-based approaches.\n\n## 📚 Table of Contents\n\n- [About](#-about)\n- [Why Use LLMs for PII Detection?](#-why-use-llms-for-pii-detection)\n- [PII Types Detected](#-pii-types-detected)\n  - [Identity Information](#-identity-information)\n  - [Sensitive Categories (GDPR Art 9)](#-sensitive-categories-gdpr-art-9)\n  - [Government \u0026 Financial Identifiers](#-government--financial-identifiers)\n  - [Network \u0026 Device Information](#-network--device-information)\n  - [Vehicle Information](#-vehicle-information)\n- [Architecture](#-architecture)\n- [Getting Started](#-getting-started)\n- [Try It Out](#-try-it-out)\n- [How to Test](#-how-to-test)\n- [Project Structure](#-project-structure)\n- [Suggestions \u0026 Contributions](#-suggestions--contributions)\n\n---\n\n## 🧠 About\n\nThis project experiments with **Large Language Models (LLMs)** — specifically the `gemma:3b` model running locally via **Ollama** — to evaluate how effectively they can identify PII in both structured and unstructured log data.\n\n\u003e 🧠 **LLM-Based Detection with Ollama**  \n\u003e - Uses `gemma:3b` through the Ollama runtime  \n\u003e - Analyzes logs using natural language understanding  \n\u003e - Handles real-world, messy logs better than regex  \n\u003e - Work in progress — contributions welcome!\n\n---\n\n## 💡 Why Use LLMs for PII Detection?\n\n- 🔍 Identifies PII even when it's obfuscated, incomplete, or embedded in text\n- 🌐 Handles multilingual input and inconsistent formats\n- 🧠 Leverages semantic context instead of relying on static patterns\n- 🧪 Ideal for experimenting with privacy tooling powered by AI\n\n\u003e Traditional detection rules often break under complexity — LLMs provide contextual intelligence.\n\n---\n\n## 🧾 PII Types Detected\n\n### 👤 Identity Information  \n`full-name`, `first-name`, `last-name`, `username`, `email`, `phone-number`, `mobile`, `address`, `postal-code`, `location`\n\n### 🧠 Sensitive Categories (GDPR Art. 9)  \n`racial-or-ethnic-origin`, `political-opinion`, `religious-belief`, `philosophical-belief`, `trade-union-membership`, `genetic-data`, `biometric-data`, `health-data`, `sex-life`, `sexual-orientation`\n\n### 🧾 Government \u0026 Financial Identifiers  \n`national-id`, `passport-number`, `driving-license-number`, `ssn`, `vat-number`, `credit-card`, `iban`, `bank-account`\n\n### 🌐 Network \u0026 Device Information  \n`ip-address`, `ip-addresses`, `mac-address`, `imei`, `device-id`, `device-metadata`, `browser-fingerprint`, `cookie-id`, `location-coordinates`\n\n### 🚘 Vehicle Information  \n`license-plate`\n\n---\n\n## 🏗️ Architecture\n\nThis is how _**PII Guard**_ works:\n\n![architecture](https://github.com/user-attachments/assets/753aa336-26a2-449f-8a8d-8e1efd40c33b)\n\n---\n\n## 🚀 Getting Started\n\n- Clone the repo and start everything with a single command:\n\n```bash\nmake all-in-up\n```\n\n- Shut down everything with:\n\n```bash\nmake all-in-down\n```\n\nThis will launch the full stack:\n\n- 🐘 PostgreSQL  \n- 🔎 Elasticsearch  \n- 🐇 RabbitMQ  \n- 🤖 Ollama (with `gemma:3b`)  \n- 🌐 PII Guard dashboard and backend API\n\n---\n\n## 🧪 Try It Out\n\n### 🖥️ Web Interface  \nVisit: [http://localhost:3000](http://localhost:3000)  \n\n### 🔌 API Endpoint  \n[http://localhost:8888/api/jobs](http://localhost:8888/api/jobs)\n\n### 🌀 Submit Sample Logs (cURL)\n\n```bash\ncurl --location 'http://localhost:8888/api/jobs/flush' \\\n--header 'Content-Type: application/json' \\\n--data-raw '{\n  \"version\": \"1.0.0\",\n  \"logs\": [\n    \"{\\\"timestamp\\\":\\\"2025-04-21T15:02:10Z\\\",\\\"service\\\":\\\"auth-service\\\",\\\"level\\\":\\\"INFO\\\",\\\"event\\\":\\\"user_login\\\",\\\"requestId\\\":\\\"1a9c7e21\\\",\\\"user\\\":{\\\"id\\\":\\\"u9001001\\\",\\\"name\\\":\\\"Leila Park\\\",\\\"email\\\":\\\"leila.park@example.io\\\"},\\\"srcIp\\\":\\\"198.51.100.15\\\"}\",\n    \"{\\\"timestamp\\\":\\\"2025-04-21T15:02:12Z\\\",\\\"service\\\":\\\"cache-service\\\",\\\"level\\\":\\\"DEBUG\\\",\\\"event\\\":\\\"cache_miss\\\",\\\"requestId\\\":\\\"82c5cc9f\\\",\\\"cacheKey\\\":\\\"product_44291_variant_blue\\\",\\\"region\\\":\\\"us-east-1\\\"}\"\n  ]\n}'\n```\n\n---\n\n## 🧪 How to Test\n\nPlease refer to the [Testing PII Guard](how-to-test/README.md) guide for instructions on running the test setup, including simulated log generation and stress testing.\n\nThis guide will help you set up a test environment to evaluate the performance and detection accuracy of PII Guard.\n\n---\n\n## 📂 Project Structure\n\n- **API**: [`api/`](https://github.com/rpgeeganage/pII-guard/tree/main/api)\n- **Dashboard**: [`ui/`](https://github.com/rpgeeganage/pII-guard/tree/main/ui)  \n- **LLM Prompt Template**: [`api/src/prompt/pii.prompt.ts`](https://github.com/rpgeeganage/pII-guard/tree/main/api/src/prompt/pii.prompt.ts)\n\n---\n\n## 🙌 Suggestions \u0026 Contributions\n\nGot a bug to report? Feature request? Wild idea? Bring it on!\n\n- 🐛 Bug reports help improve stability  \n- ✨ Feature requests help shape the product  \n- 💬 Suggestions, feedback, and contributions are all welcome!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frpgeeganage%2Fpii-guard","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frpgeeganage%2Fpii-guard","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frpgeeganage%2Fpii-guard/lists"}