{"id":44325896,"url":"https://github.com/zafrem/pii-search","last_synced_at":"2026-02-11T07:34:51.337Z","repository":{"id":308826094,"uuid":"952917381","full_name":"zafrem/pii-search","owner":"zafrem","description":null,"archived":false,"fork":false,"pushed_at":"2025-09-18T13:00:56.000Z","size":20336,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-18T15:42:58.619Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zafrem.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-03-22T06:24:20.000Z","updated_at":"2025-09-18T13:01:00.000Z","dependencies_parsed_at":"2025-09-06T15:20:18.520Z","dependency_job_id":null,"html_url":"https://github.com/zafrem/pii-search","commit_stats":null,"previous_names":["zafrem/pii-scanner","zafrem/pii-search"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zafrem/pii-search","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zafrem%2Fpii-search","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zafrem%2Fpii-search/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zafrem%2Fpii-search/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zafrem%2Fpii-search/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zafrem","download_url":"https://codeload.github.com/zafrem/pii-search/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zafrem%2Fpii-search/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29329493,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-11T06:13:03.264Z","status":"ssl_error","status_checked_at":"2026-02-11T06:12:55.843Z","response_time":97,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-11T07:34:51.177Z","updated_at":"2026-02-11T07:34:51.332Z","avatar_url":"https://github.com/zafrem.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PII Search\n\nA comprehensive multi-language PII (Personally Identifiable Information) detection system with advanced parallel processing, cascaded detection models, and integrated data generation capabilities for training and testing.\n\n## Overview\n\nThis application provides multiple PII detection approaches with advanced AI models:\n\n1. **Basic Search** - Rule-based pattern matching using regex patterns\n2. **Cascaded AI Detection** - Parallel processing with Multilingual BERT → DeBERTa v3 → Ollama LLM\n3. **Simple Learning Engine** - Adaptive ML with continuous training capabilities\n4. **Data Generation System** - Faker-based PII data generation for training and testing\n\n### Key Features\n\n- ** Advanced AI Detection** - Parallel processing with cascaded models and adaptive learning\n- ** Multi-language Support** - 12+ languages with locale-aware generation\n- ** Data Generation \u0026 Training** - Faker-based generation with 23+ data types\n- ** Comprehensive Labeling System** - Interactive annotation with multiple export formats\n- ** Privacy \u0026 Security** - Local processing with GDPR/HIPAA ready architecture\n- ** Production Features** - Docker containerization with health monitoring\n\n## Demo\n\n![Demo](./image/PII_Search.gif)\n\n## 📋 Documentation\n\n- [Installation Guide](doc/installation.md) - Setup instructions and deployment options\n- [Usage Guide](doc/usage.md) - Detection workflows and supported PII types\n- [Architecture](doc/architecture.md) - System components and technology stack\n- [API Documentation](doc/api.md) - Complete API reference and examples\n- [Development Guide](doc/development.md) - Contributing and development setup\n- [Security \u0026 Privacy](doc/security.md) - Security features and compliance\n- [Troubleshooting](doc/troubleshooting.md) - Common issues and solutions\n\n## Quick Start\n\n**Prerequisites**: Node.js 16+, Python 3.8+, Ollama\n\n```bash\n# Clone and install\ngit clone \u003crepository-url\u003e\ncd pii-search\nnpm install\n\n# Setup engines\ncd deep_search_engine \u0026\u0026 ./setup.sh \u0026\u0026 cd ..\ncd context_search_engine \u0026\u0026 ./setup.sh \u0026\u0026 cd ..\n\n# Install Ollama\ncurl -fsSL https://ollama.ai/install.sh | sh\nollama pull llama3.2:3b\n\n# Start application\nnpm run dev\n```\n\n**Access**: Frontend at http://localhost:3000\n\nFor complete setup instructions, see [Installation Guide](doc/installation.md).\n\n## Docker Usage\n\n### Using Pre-built Images\n\nPull and run the latest image:\n\n```bash\n# Pull the image\ndocker pull zafrem/pii-search:latest\n\n# Run with default configuration\ndocker run -p 3000:3000 -p 3001:3001 -p 8000:8000 -p 8001:8001 zafrem/pii-search:latest\n\n# Run with custom tag\ndocker pull zafrem/pii-search:tagname\ndocker run -p 3000:3000 -p 3001:3001 -p 8000:8000 -p 8001:8001 zafrem/pii-search:tagname\n```\n\n### Building and Pushing Custom Images\n\n```bash\n# Build the image\ndocker build -t zafrem/pii-search:tagname .\n\n# Push to registry\ndocker push zafrem/pii-search:tagname\n\n# Run your custom build\ndocker run -p 3000:3000 -p 3001:3001 -p 8000:8000 -p 8001:8001 zafrem/pii-search:tagname\n```\n\n### Docker Compose (Recommended)\n\n```bash\n# Start all services\ndocker-compose up -d\n\n# View logs\ndocker-compose logs -f\n\n# Stop services\ndocker-compose down\n```\n\n**Access**: \n- Frontend: http://localhost:3000\n- Backend API: http://localhost:3001\n- Deep Search Engine: http://localhost:8000\n- Context Search Engine: http://localhost:8001\n\n## License\n\nThis project is dual-licensed under the MIT License and a Commercial License.\n- GNU General Public License v3.0 License: Free for open source and personal use - see the [LICENSE](LICENSE) file for details.\n- Commercial License: Required for commercial use, available via separate agreement Contact: zafrem@gmail.com\n\n## Acknowledgments\n\n- **Ollama** for local LLM capabilities\n- **Hugging Face** for transformer models\n- **React** and **TypeScript** communities\n- **scikit-learn** for ML algorithms\n- **FastAPI** for Python web framework\n\n---\n\nFor detailed learning and training processes, see [PII_LEARNING_MANUAL.md](PII_LEARNING_MANUAL.md).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzafrem%2Fpii-search","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzafrem%2Fpii-search","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzafrem%2Fpii-search/lists"}