https://github.com/rpgeeganage/pii-guard
๐ก๏ธ PII Guard is an LLM-powered tool that detects and manages Personally Identifiable Information (PII) in logs โ designed to support data privacy and GDPR compliance
https://github.com/rpgeeganage/pii-guard
ai large-language-model large-language-models llm pii pii-detection privacy-enhancing-technologies privacy-protection privacy-tools
Last synced: 8 months ago
JSON representation
๐ก๏ธ PII Guard is an LLM-powered tool that detects and manages Personally Identifiable Information (PII) in logs โ designed to support data privacy and GDPR compliance
- Host: GitHub
- URL: https://github.com/rpgeeganage/pii-guard
- Owner: rpgeeganage
- License: other
- Created: 2025-03-10T20:11:48.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-15T18:41:11.000Z (12 months ago)
- Last Synced: 2025-06-15T19:05:55.391Z (12 months ago)
- Topics: ai, large-language-model, large-language-models, llm, pii, pii-detection, privacy-enhancing-technologies, privacy-protection, privacy-tools
- Language: TypeScript
- Homepage:
- Size: 640 KB
- Stars: 53
- Watchers: 1
- Forks: 6
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ๐ก๏ธ PII Guard
**PII Guard** is an LLM-powered tool that detects and manages Personally Identifiable Information (PII) in logs โ designed to support data privacy and GDPR compliance.
> โ ๏ธ **This is a personal side project**
> Built to explore how Large Language Models can detect sensitive data in logs more intelligently than traditional regex-based approaches.
## ๐ Table of Contents
- [About](#-about)
- [Why Use LLMs for PII Detection?](#-why-use-llms-for-pii-detection)
- [PII Types Detected](#-pii-types-detected)
- [Identity Information](#-identity-information)
- [Sensitive Categories (GDPR Art 9)](#-sensitive-categories-gdpr-art-9)
- [Government & Financial Identifiers](#-government--financial-identifiers)
- [Network & Device Information](#-network--device-information)
- [Vehicle Information](#-vehicle-information)
- [Architecture](#-architecture)
- [Getting Started](#-getting-started)
- [Try It Out](#-try-it-out)
- [How to Test](#-how-to-test)
- [Project Structure](#-project-structure)
- [Suggestions & Contributions](#-suggestions--contributions)
---
## ๐ง About
This project experiments with **Large Language Models (LLMs)** โ specifically the `gemma:3b` model running locally via **Ollama** โ to evaluate how effectively they can identify PII in both structured and unstructured log data.
> ๐ง **LLM-Based Detection with Ollama**
> - Uses `gemma:3b` through the Ollama runtime
> - Analyzes logs using natural language understanding
> - Handles real-world, messy logs better than regex
> - Work in progress โ contributions welcome!
---
## ๐ก Why Use LLMs for PII Detection?
- ๐ Identifies PII even when it's obfuscated, incomplete, or embedded in text
- ๐ Handles multilingual input and inconsistent formats
- ๐ง Leverages semantic context instead of relying on static patterns
- ๐งช Ideal for experimenting with privacy tooling powered by AI
> Traditional detection rules often break under complexity โ LLMs provide contextual intelligence.
---
## ๐งพ PII Types Detected
### ๐ค Identity Information
`full-name`, `first-name`, `last-name`, `username`, `email`, `phone-number`, `mobile`, `address`, `postal-code`, `location`
### ๐ง Sensitive Categories (GDPR Art. 9)
`racial-or-ethnic-origin`, `political-opinion`, `religious-belief`, `philosophical-belief`, `trade-union-membership`, `genetic-data`, `biometric-data`, `health-data`, `sex-life`, `sexual-orientation`
### ๐งพ Government & Financial Identifiers
`national-id`, `passport-number`, `driving-license-number`, `ssn`, `vat-number`, `credit-card`, `iban`, `bank-account`
### ๐ Network & Device Information
`ip-address`, `ip-addresses`, `mac-address`, `imei`, `device-id`, `device-metadata`, `browser-fingerprint`, `cookie-id`, `location-coordinates`
### ๐ Vehicle Information
`license-plate`
---
## ๐๏ธ Architecture
This is how _**PII Guard**_ works:

---
## ๐ Getting Started
- Clone the repo and start everything with a single command:
```bash
make all-in-up
```
- Shut down everything with:
```bash
make all-in-down
```
This will launch the full stack:
- ๐ PostgreSQL
- ๐ Elasticsearch
- ๐ RabbitMQ
- ๐ค Ollama (with `gemma:3b`)
- ๐ PII Guard dashboard and backend API
---
## ๐งช Try It Out
### ๐ฅ๏ธ Web Interface
Visit: [http://localhost:3000](http://localhost:3000)
### ๐ API Endpoint
[http://localhost:8888/api/jobs](http://localhost:8888/api/jobs)
### ๐ Submit Sample Logs (cURL)
```bash
curl --location 'http://localhost:8888/api/jobs/flush' \
--header 'Content-Type: application/json' \
--data-raw '{
"version": "1.0.0",
"logs": [
"{\"timestamp\":\"2025-04-21T15:02:10Z\",\"service\":\"auth-service\",\"level\":\"INFO\",\"event\":\"user_login\",\"requestId\":\"1a9c7e21\",\"user\":{\"id\":\"u9001001\",\"name\":\"Leila Park\",\"email\":\"leila.park@example.io\"},\"srcIp\":\"198.51.100.15\"}",
"{\"timestamp\":\"2025-04-21T15:02:12Z\",\"service\":\"cache-service\",\"level\":\"DEBUG\",\"event\":\"cache_miss\",\"requestId\":\"82c5cc9f\",\"cacheKey\":\"product_44291_variant_blue\",\"region\":\"us-east-1\"}"
]
}'
```
---
## ๐งช How to Test
Please refer to the [Testing PII Guard](how-to-test/README.md) guide for instructions on running the test setup, including simulated log generation and stress testing.
This guide will help you set up a test environment to evaluate the performance and detection accuracy of PII Guard.
---
## ๐ Project Structure
- **API**: [`api/`](https://github.com/rpgeeganage/pII-guard/tree/main/api)
- **Dashboard**: [`ui/`](https://github.com/rpgeeganage/pII-guard/tree/main/ui)
- **LLM Prompt Template**: [`api/src/prompt/pii.prompt.ts`](https://github.com/rpgeeganage/pII-guard/tree/main/api/src/prompt/pii.prompt.ts)
---
## ๐ Suggestions & Contributions
Got a bug to report? Feature request? Wild idea? Bring it on!
- ๐ Bug reports help improve stability
- โจ Feature requests help shape the product
- ๐ฌ Suggestions, feedback, and contributions are all welcome!