An open API service indexing awesome lists of open source software.

https://github.com/zafrem/pii-search


https://github.com/zafrem/pii-search

Last synced: 4 months ago
JSON representation

Awesome Lists containing this project

README

          

# PII Search

A comprehensive multi-language PII (Personally Identifiable Information) detection system with advanced parallel processing, cascaded detection models, and integrated data generation capabilities for training and testing.

## Overview

This application provides multiple PII detection approaches with advanced AI models:

1. **Basic Search** - Rule-based pattern matching using regex patterns
2. **Cascaded AI Detection** - Parallel processing with Multilingual BERT → DeBERTa v3 → Ollama LLM
3. **Simple Learning Engine** - Adaptive ML with continuous training capabilities
4. **Data Generation System** - Faker-based PII data generation for training and testing

### Key Features

- ** Advanced AI Detection** - Parallel processing with cascaded models and adaptive learning
- ** Multi-language Support** - 12+ languages with locale-aware generation
- ** Data Generation & Training** - Faker-based generation with 23+ data types
- ** Comprehensive Labeling System** - Interactive annotation with multiple export formats
- ** Privacy & Security** - Local processing with GDPR/HIPAA ready architecture
- ** Production Features** - Docker containerization with health monitoring

## Demo

![Demo](./image/PII_Search.gif)

## 📋 Documentation

- [Installation Guide](doc/installation.md) - Setup instructions and deployment options
- [Usage Guide](doc/usage.md) - Detection workflows and supported PII types
- [Architecture](doc/architecture.md) - System components and technology stack
- [API Documentation](doc/api.md) - Complete API reference and examples
- [Development Guide](doc/development.md) - Contributing and development setup
- [Security & Privacy](doc/security.md) - Security features and compliance
- [Troubleshooting](doc/troubleshooting.md) - Common issues and solutions

## Quick Start

**Prerequisites**: Node.js 16+, Python 3.8+, Ollama

```bash
# Clone and install
git clone
cd pii-search
npm install

# Setup engines
cd deep_search_engine && ./setup.sh && cd ..
cd context_search_engine && ./setup.sh && cd ..

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3.2:3b

# Start application
npm run dev
```

**Access**: Frontend at http://localhost:3000

For complete setup instructions, see [Installation Guide](doc/installation.md).

## Docker Usage

### Using Pre-built Images

Pull and run the latest image:

```bash
# Pull the image
docker pull zafrem/pii-search:latest

# Run with default configuration
docker run -p 3000:3000 -p 3001:3001 -p 8000:8000 -p 8001:8001 zafrem/pii-search:latest

# Run with custom tag
docker pull zafrem/pii-search:tagname
docker run -p 3000:3000 -p 3001:3001 -p 8000:8000 -p 8001:8001 zafrem/pii-search:tagname
```

### Building and Pushing Custom Images

```bash
# Build the image
docker build -t zafrem/pii-search:tagname .

# Push to registry
docker push zafrem/pii-search:tagname

# Run your custom build
docker run -p 3000:3000 -p 3001:3001 -p 8000:8000 -p 8001:8001 zafrem/pii-search:tagname
```

### Docker Compose (Recommended)

```bash
# Start all services
docker-compose up -d

# View logs
docker-compose logs -f

# Stop services
docker-compose down
```

**Access**:
- Frontend: http://localhost:3000
- Backend API: http://localhost:3001
- Deep Search Engine: http://localhost:8000
- Context Search Engine: http://localhost:8001

## License

This project is dual-licensed under the MIT License and a Commercial License.
- GNU General Public License v3.0 License: Free for open source and personal use - see the [LICENSE](LICENSE) file for details.
- Commercial License: Required for commercial use, available via separate agreement Contact: zafrem@gmail.com

## Acknowledgments

- **Ollama** for local LLM capabilities
- **Hugging Face** for transformer models
- **React** and **TypeScript** communities
- **scikit-learn** for ML algorithms
- **FastAPI** for Python web framework

---

For detailed learning and training processes, see [PII_LEARNING_MANUAL.md](PII_LEARNING_MANUAL.md).