An open API service indexing awesome lists of open source software.

https://github.com/rpgeeganage/pii-guard

๐Ÿ›ก๏ธ PII Guard is an LLM-powered tool that detects and manages Personally Identifiable Information (PII) in logs โ€” designed to support data privacy and GDPR compliance
https://github.com/rpgeeganage/pii-guard

ai large-language-model large-language-models llm pii pii-detection privacy-enhancing-technologies privacy-protection privacy-tools

Last synced: 8 months ago
JSON representation

๐Ÿ›ก๏ธ PII Guard is an LLM-powered tool that detects and manages Personally Identifiable Information (PII) in logs โ€” designed to support data privacy and GDPR compliance

Awesome Lists containing this project

README

          

# ๐Ÿ›ก๏ธ PII Guard

**PII Guard** is an LLM-powered tool that detects and manages Personally Identifiable Information (PII) in logs โ€” designed to support data privacy and GDPR compliance.

> โš ๏ธ **This is a personal side project**
> Built to explore how Large Language Models can detect sensitive data in logs more intelligently than traditional regex-based approaches.

## ๐Ÿ“š Table of Contents

- [About](#-about)
- [Why Use LLMs for PII Detection?](#-why-use-llms-for-pii-detection)
- [PII Types Detected](#-pii-types-detected)
- [Identity Information](#-identity-information)
- [Sensitive Categories (GDPR Art 9)](#-sensitive-categories-gdpr-art-9)
- [Government & Financial Identifiers](#-government--financial-identifiers)
- [Network & Device Information](#-network--device-information)
- [Vehicle Information](#-vehicle-information)
- [Architecture](#-architecture)
- [Getting Started](#-getting-started)
- [Try It Out](#-try-it-out)
- [How to Test](#-how-to-test)
- [Project Structure](#-project-structure)
- [Suggestions & Contributions](#-suggestions--contributions)

---

## ๐Ÿง  About

This project experiments with **Large Language Models (LLMs)** โ€” specifically the `gemma:3b` model running locally via **Ollama** โ€” to evaluate how effectively they can identify PII in both structured and unstructured log data.

> ๐Ÿง  **LLM-Based Detection with Ollama**
> - Uses `gemma:3b` through the Ollama runtime
> - Analyzes logs using natural language understanding
> - Handles real-world, messy logs better than regex
> - Work in progress โ€” contributions welcome!

---

## ๐Ÿ’ก Why Use LLMs for PII Detection?

- ๐Ÿ” Identifies PII even when it's obfuscated, incomplete, or embedded in text
- ๐ŸŒ Handles multilingual input and inconsistent formats
- ๐Ÿง  Leverages semantic context instead of relying on static patterns
- ๐Ÿงช Ideal for experimenting with privacy tooling powered by AI

> Traditional detection rules often break under complexity โ€” LLMs provide contextual intelligence.

---

## ๐Ÿงพ PII Types Detected

### ๐Ÿ‘ค Identity Information
`full-name`, `first-name`, `last-name`, `username`, `email`, `phone-number`, `mobile`, `address`, `postal-code`, `location`

### ๐Ÿง  Sensitive Categories (GDPR Art. 9)
`racial-or-ethnic-origin`, `political-opinion`, `religious-belief`, `philosophical-belief`, `trade-union-membership`, `genetic-data`, `biometric-data`, `health-data`, `sex-life`, `sexual-orientation`

### ๐Ÿงพ Government & Financial Identifiers
`national-id`, `passport-number`, `driving-license-number`, `ssn`, `vat-number`, `credit-card`, `iban`, `bank-account`

### ๐ŸŒ Network & Device Information
`ip-address`, `ip-addresses`, `mac-address`, `imei`, `device-id`, `device-metadata`, `browser-fingerprint`, `cookie-id`, `location-coordinates`

### ๐Ÿš˜ Vehicle Information
`license-plate`

---

## ๐Ÿ—๏ธ Architecture

This is how _**PII Guard**_ works:

![architecture](https://github.com/user-attachments/assets/753aa336-26a2-449f-8a8d-8e1efd40c33b)

---

## ๐Ÿš€ Getting Started

- Clone the repo and start everything with a single command:

```bash
make all-in-up
```

- Shut down everything with:

```bash
make all-in-down
```

This will launch the full stack:

- ๐Ÿ˜ PostgreSQL
- ๐Ÿ”Ž Elasticsearch
- ๐Ÿ‡ RabbitMQ
- ๐Ÿค– Ollama (with `gemma:3b`)
- ๐ŸŒ PII Guard dashboard and backend API

---

## ๐Ÿงช Try It Out

### ๐Ÿ–ฅ๏ธ Web Interface
Visit: [http://localhost:3000](http://localhost:3000)

### ๐Ÿ”Œ API Endpoint
[http://localhost:8888/api/jobs](http://localhost:8888/api/jobs)

### ๐ŸŒ€ Submit Sample Logs (cURL)

```bash
curl --location 'http://localhost:8888/api/jobs/flush' \
--header 'Content-Type: application/json' \
--data-raw '{
"version": "1.0.0",
"logs": [
"{\"timestamp\":\"2025-04-21T15:02:10Z\",\"service\":\"auth-service\",\"level\":\"INFO\",\"event\":\"user_login\",\"requestId\":\"1a9c7e21\",\"user\":{\"id\":\"u9001001\",\"name\":\"Leila Park\",\"email\":\"leila.park@example.io\"},\"srcIp\":\"198.51.100.15\"}",
"{\"timestamp\":\"2025-04-21T15:02:12Z\",\"service\":\"cache-service\",\"level\":\"DEBUG\",\"event\":\"cache_miss\",\"requestId\":\"82c5cc9f\",\"cacheKey\":\"product_44291_variant_blue\",\"region\":\"us-east-1\"}"
]
}'
```

---

## ๐Ÿงช How to Test

Please refer to the [Testing PII Guard](how-to-test/README.md) guide for instructions on running the test setup, including simulated log generation and stress testing.

This guide will help you set up a test environment to evaluate the performance and detection accuracy of PII Guard.

---

## ๐Ÿ“‚ Project Structure

- **API**: [`api/`](https://github.com/rpgeeganage/pII-guard/tree/main/api)
- **Dashboard**: [`ui/`](https://github.com/rpgeeganage/pII-guard/tree/main/ui)
- **LLM Prompt Template**: [`api/src/prompt/pii.prompt.ts`](https://github.com/rpgeeganage/pII-guard/tree/main/api/src/prompt/pii.prompt.ts)

---

## ๐Ÿ™Œ Suggestions & Contributions

Got a bug to report? Feature request? Wild idea? Bring it on!

- ๐Ÿ› Bug reports help improve stability
- โœจ Feature requests help shape the product
- ๐Ÿ’ฌ Suggestions, feedback, and contributions are all welcome!