https://github.com/zafrem/pii-search
https://github.com/zafrem/pii-search
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/zafrem/pii-search
- Owner: zafrem
- License: gpl-3.0
- Created: 2025-03-22T06:24:20.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-09-18T13:00:56.000Z (9 months ago)
- Last Synced: 2025-09-18T15:42:58.619Z (9 months ago)
- Language: Python
- Size: 19.4 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PII Search
A comprehensive multi-language PII (Personally Identifiable Information) detection system with advanced parallel processing, cascaded detection models, and integrated data generation capabilities for training and testing.
## Overview
This application provides multiple PII detection approaches with advanced AI models:
1. **Basic Search** - Rule-based pattern matching using regex patterns
2. **Cascaded AI Detection** - Parallel processing with Multilingual BERT → DeBERTa v3 → Ollama LLM
3. **Simple Learning Engine** - Adaptive ML with continuous training capabilities
4. **Data Generation System** - Faker-based PII data generation for training and testing
### Key Features
- ** Advanced AI Detection** - Parallel processing with cascaded models and adaptive learning
- ** Multi-language Support** - 12+ languages with locale-aware generation
- ** Data Generation & Training** - Faker-based generation with 23+ data types
- ** Comprehensive Labeling System** - Interactive annotation with multiple export formats
- ** Privacy & Security** - Local processing with GDPR/HIPAA ready architecture
- ** Production Features** - Docker containerization with health monitoring
## Demo

## 📋 Documentation
- [Installation Guide](doc/installation.md) - Setup instructions and deployment options
- [Usage Guide](doc/usage.md) - Detection workflows and supported PII types
- [Architecture](doc/architecture.md) - System components and technology stack
- [API Documentation](doc/api.md) - Complete API reference and examples
- [Development Guide](doc/development.md) - Contributing and development setup
- [Security & Privacy](doc/security.md) - Security features and compliance
- [Troubleshooting](doc/troubleshooting.md) - Common issues and solutions
## Quick Start
**Prerequisites**: Node.js 16+, Python 3.8+, Ollama
```bash
# Clone and install
git clone
cd pii-search
npm install
# Setup engines
cd deep_search_engine && ./setup.sh && cd ..
cd context_search_engine && ./setup.sh && cd ..
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3.2:3b
# Start application
npm run dev
```
**Access**: Frontend at http://localhost:3000
For complete setup instructions, see [Installation Guide](doc/installation.md).
## Docker Usage
### Using Pre-built Images
Pull and run the latest image:
```bash
# Pull the image
docker pull zafrem/pii-search:latest
# Run with default configuration
docker run -p 3000:3000 -p 3001:3001 -p 8000:8000 -p 8001:8001 zafrem/pii-search:latest
# Run with custom tag
docker pull zafrem/pii-search:tagname
docker run -p 3000:3000 -p 3001:3001 -p 8000:8000 -p 8001:8001 zafrem/pii-search:tagname
```
### Building and Pushing Custom Images
```bash
# Build the image
docker build -t zafrem/pii-search:tagname .
# Push to registry
docker push zafrem/pii-search:tagname
# Run your custom build
docker run -p 3000:3000 -p 3001:3001 -p 8000:8000 -p 8001:8001 zafrem/pii-search:tagname
```
### Docker Compose (Recommended)
```bash
# Start all services
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose down
```
**Access**:
- Frontend: http://localhost:3000
- Backend API: http://localhost:3001
- Deep Search Engine: http://localhost:8000
- Context Search Engine: http://localhost:8001
## License
This project is dual-licensed under the MIT License and a Commercial License.
- GNU General Public License v3.0 License: Free for open source and personal use - see the [LICENSE](LICENSE) file for details.
- Commercial License: Required for commercial use, available via separate agreement Contact: zafrem@gmail.com
## Acknowledgments
- **Ollama** for local LLM capabilities
- **Hugging Face** for transformer models
- **React** and **TypeScript** communities
- **scikit-learn** for ML algorithms
- **FastAPI** for Python web framework
---
For detailed learning and training processes, see [PII_LEARNING_MANUAL.md](PII_LEARNING_MANUAL.md).