{"id":40276378,"url":"https://github.com/bluewave-labs/maskwise","last_synced_at":"2026-01-20T03:07:34.099Z","repository":{"id":311231158,"uuid":"1041018751","full_name":"bluewave-labs/maskwise","owner":"bluewave-labs","description":"Maskwise detects, redacts, masks, and anonymizes sensitive data across text, images, and structured data in training datasets for LLM systems. Powered by Microsoft Presidio","archived":false,"fork":false,"pushed_at":"2025-11-18T19:37:08.000Z","size":1420,"stargazers_count":46,"open_issues_count":3,"forks_count":11,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-14T01:37:07.210Z","etag":null,"topics":["data","data-anonymization","data-redaction","data-scanning","gdpr-compliance","hipaa-compliance","pii-anonymization","pii-detection","sensitive-data-masking"],"latest_commit_sha":null,"homepage":"https://verifywise.ai/maskwise","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bluewave-labs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-19T21:23:47.000Z","updated_at":"2025-12-28T14:40:32.000Z","dependencies_parsed_at":null,"dependency_job_id":"506fe70b-0a67-4ec2-8606-b7184b1c3c35","html_url":"https://github.com/bluewave-labs/maskwise","commit_stats":null,"previous_names":["bluewave-labs/maskwise"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/bluewave-labs/maskwise","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bluewave-labs%2Fmaskwise","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bluewave-labs%2Fmaskwise/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bluewave-labs%2Fmaskwise/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bluewave-labs%2Fmaskwise/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bluewave-labs","download_url":"https://codeload.github.com/bluewave-labs/maskwise/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bluewave-labs%2Fmaskwise/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28594958,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-20T02:08:49.799Z","status":"ssl_error","status_checked_at":"2026-01-20T02:08:44.148Z","response_time":117,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-anonymization","data-redaction","data-scanning","gdpr-compliance","hipaa-compliance","pii-anonymization","pii-detection","sensitive-data-masking"],"created_at":"2026-01-20T03:07:34.026Z","updated_at":"2026-01-20T03:07:34.091Z","avatar_url":"https://github.com/bluewave-labs.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MaskWise - Enterprise PII Anonymization Platform\n\nMaskwise by [VerifyWise](https://verifywise.ai/) is a data privacy platform built to detect, redact, mask, and anonymize sensitive information across unstructured text, images, and structured data within LLM training datasets. It automatically identifies and classifies PII, payment data, health records or other regulated content. \n\nThe system supports 50+ document and file formats, applies anonymization while preserving original structure and formatting, and generates full compliance audit trails for traceability and verification.\n\nJoin the Discord server for discussions: [Click here](https://discord.gg/xr5ta83BBr)\n\n## Overview\n\n  - [Microsoft Presidio](https://github.com/microsoft/presidio/) integration with 15+ compliance entity types (SSN, Credit Cards, HIPAA, GDPR etc)\n  - RBAC with comprehensive audit trails\n  - Full Office Suite Support (Word, Excel, PowerPoint, PDF) with format preservation\n  - Batch Processing for enterprise-scale volumes\n  - OCR Integration for scanned documents\n  - Policy-driven Processing with customizable business rules\n  - Format-preserving Anonymization maintaining document usability\n  - Multiple Strategies (`redact`, `mask`, `replace`, `encrypt`)\n  - Original + Anonymized Downloads for audit workflows\n  - On-premise \u0026 Docker Installation\n  - RESTful API for existing system integration\n\n\u003ccenter\u003e\n \u003cimg width=\"737\" height=\"326\" alt=\"image\" src=\"https://github.com/user-attachments/assets/e3bca2fd-69d6-4dd2-affd-6e6a3f58312e\" /\u003e\n\u003c/center\u003e\n\n**Roadmap:**\n\n- Vault\n- Single Sign-On Ready (Active Directory, SAML, OIDC) (in the works)\n\nYou can deploy Maskwise in 24 hours an reduce PII exposure risk by 95%. Maskwise can process thousands of documents per hour.\n\n\n\n\n## 🚀 Quick Deploy with Docker (Recommended)\n\n**Deploy Maskwise in under 5 minutes using pre-built images:**\n\n```bash\n# 1. Clone repository\ngit clone https://github.com/bluewave-labs/maskwise.git\ncd maskwise\n\n# 2. Configure environment (required)\ncp .env.production.example .env\n# Edit .env: Set POSTGRES_PASSWORD and JWT_SECRET\n\n# 3. Deploy all services \ndocker-compose -f docker-compose.production.yml up -d\n\n# 4. Initialize database (one-time setup)\ndocker-compose -f docker-compose.production.yml exec api npx prisma migrate deploy\ndocker-compose -f docker-compose.production.yml exec api npx prisma db seed\n\n# 5. Access Maskwise\n# Frontend: http://localhost:3000\n# Login: admin@maskwise.com / admin123\n```\n\n**✅ Ready-to-use Docker images available:**\n- `ghcr.io/bluewave-labs/maskwise-api:latest`\n- `ghcr.io/bluewave-labs/maskwise-worker:latest` \n- `ghcr.io/bluewave-labs/maskwise-web:latest`\n\n## Maskwise use cases for AI and LLMs\n\n### 1. Safe training data curation  \nLLM training datasets often contain sensitive information like PII or confidential business data. Maskwise detects and anonymizes this content before ingestion, preventing models from memorizing or leaking private details.  \n\n### 2. Fine-tuning on proprietary data  \nWhen fine-tuning LLMs with internal corpora such as customer conversations or documents, regulated data may slip through. Maskwise redacts or masks sensitive fields while preserving structure, enabling safe and compliant fine-tuning.  \n\n### 3. Prompt and response anonymization  \nPrompts and outputs collected for evaluation or reinforcement learning can include sensitive content. Maskwise anonymizes these logs before they’re stored or shared, reducing exposure and ensuring privacy.  \n\n### 4. Synthetic dataset generation  \nTo expand training data safely, Maskwise anonymizes real records and replaces them with synthetic placeholders. This preserves realism for model training while protecting user privacy.  \n\n## Architecture\n\nThis is a monorepo containing:\n\n- **apps/web** - Next.js frontend with shadcn/ui\n- **apps/api** - NestJS backend API\n- **apps/worker** - Background job processor\n- **packages/shared** - Shared utilities and helpers\n- **packages/types** - TypeScript type definitions\n- **packages/database** - Database schemas and migrations\n\n## Tech Stack\n\n- **Frontend**: Next.js 14 (App Router) + TypeScript + shadcn/ui + TailwindCSS\n- **Backend**: NestJS + TypeScript + PostgreSQL + Redis\n- **Processing**: Microsoft Presidio + Apache Tika + Tesseract OCR\n- **Deployment**: Docker Compose\n\n## Screenshots\n\n### Main dashboard\n\n\u003cimg width=\"1387\" height=\"790\" alt=\"image\" src=\"https://github.com/user-attachments/assets/29982118-74f9-4934-b90e-c755a0953f50\" /\u003e\n\n### Project view\n\n\u003cimg width=\"1389\" height=\"789\" alt=\"image\" src=\"https://github.com/user-attachments/assets/9e0e5d71-7f9f-4c67-89f5-733d66c6058b\" /\u003e\n\n### Datasets view\n\n\u003cimg width=\"1391\" height=\"785\" alt=\"image\" src=\"https://github.com/user-attachments/assets/c7bfa5c2-5cf2-4765-9cc6-a0190f7b7bd1\" /\u003e\n\n### Jobs overview\n\n\u003cimg width=\"1390\" height=\"786\" alt=\"image\" src=\"https://github.com/user-attachments/assets/debb8d53-0d9e-41f1-81cf-423eaaaa3fa4\" /\u003e\n\n### Anonymization workflow\n\n\u003cimg width=\"1387\" height=\"784\" alt=\"image\" src=\"https://github.com/user-attachments/assets/505d1aee-cb96-4a1c-805d-706c1ee91f2a\" /\u003e\n\n### Policies \n\n\u003cimg width=\"1383\" height=\"791\" alt=\"image\" src=\"https://github.com/user-attachments/assets/e7fbf7aa-abaa-41eb-8806-6262108b7500\" /\u003e\n\n\n### Settings \n\n\u003cimg width=\"1389\" height=\"778\" alt=\"image\" src=\"https://github.com/user-attachments/assets/228ce6d8-7dbc-49ef-bd2b-d8f0b2c818c8\" /\u003e\n\n\n## Quick Start\n\n### Option 1: Docker Images (Recommended)\n\n**🚀 Zero-build deployment with pre-built images from GitHub Container Registry:**\n\n**Prerequisites:**\n- Docker and Docker Compose installed\n- 4GB+ RAM available\n\n**Quick Deploy:**\n```bash\n# Clone and configure\ngit clone https://github.com/bluewave-labs/maskwise.git\ncd maskwise\ncp .env.production.example .env\n# Edit .env: Set POSTGRES_PASSWORD, JWT_SECRET\n\n# Deploy all services instantly\ndocker-compose -f docker-compose.production.yml up -d\n\n# One-time database setup\ndocker-compose -f docker-compose.production.yml exec api npx prisma migrate deploy\ndocker-compose -f docker-compose.production.yml exec api npx prisma db seed\n```\n\n**Access Application:**\n- **🌐 Web UI**: http://localhost:3000\n- **🔗 API**: http://localhost:3001  \n- **👤 Admin Login**: admin@maskwise.com / admin123\n\n**Service Status Check:**\n```bash\n# Verify all services are healthy\ndocker-compose -f docker-compose.production.yml ps\n```\n\n**✅ Pre-built Docker Images (No Build Required):**\n- `ghcr.io/bluewave-labs/maskwise-api:latest` - Backend API service\n- `ghcr.io/bluewave-labs/maskwise-worker:latest` - Background job processor  \n- `ghcr.io/bluewave-labs/maskwise-web:latest` - Frontend web application\n\n**Features:**\n- Multi-platform support (linux/amd64, linux/arm64)\n- Security-optimized Alpine Linux base\n- Automated health checks and restart policies\n- Production-ready with resource limits\n\nSee [DOCKER.md](DOCKER.md) for complete Docker deployment guide.\n\n### Option 2: Development Setup\n\n**For development with live code changes:**\n\n**Prerequisites:**\n- **Docker** and **Docker Compose** installed and running\n- **Node.js 18+** and **npm** installed\n- **PostgreSQL client** (optional, for direct database access)\n\n### Quick Setup Script (Alternative)\n```bash\n# Automated setup of infrastructure and database\n./start-dev.sh\n```\nThen follow the terminal instructions to start the three application services.\n\n### Installation Steps (Manual)\n\n1. **Clone and Install Dependencies**\n   ```bash\n   git clone https://github.com/your-org/maskwise.git\n   cd maskwise\n   npm install\n   ```\n\n2. **Start Infrastructure Services**\n   ```bash\n   # Start PostgreSQL, Redis, Presidio, Tika, and Tesseract\n   docker-compose up -d postgres redis presidio-analyzer presidio-anonymizer tika tesseract\n   \n   # Wait for services to be healthy (about 30-60 seconds)\n   docker-compose ps\n   ```\n\n3. **Set Up Database**\n   ```bash\n   # Navigate to database package\n   cd packages/database\n   \n   # Generate Prisma client\n   npx prisma generate\n   \n   # Run migrations\n   npx prisma migrate deploy\n   \n   # Seed database with admin user and policies\n   npx prisma db seed\n   \n   # Return to project root\n   cd ../..\n   ```\n\n4. **Start Application Services**\n   \n   Open 3 separate terminals and run:\n   \n   **Terminal 1 - API Server:**\n   ```bash\n   cd apps/api\n   JWT_SECRET=maskwise_jwt_secret_dev_only \\\n   DATABASE_URL=postgresql://maskwise:maskwise_dev_password@localhost:5436/maskwise \\\n   REDIS_URL=redis://localhost:6379 \\\n   npm run dev\n   ```\n   \n   **Terminal 2 - Worker Service:**\n   ```bash\n   cd apps/worker\n   npm run dev\n   ```\n   \n   **Terminal 3 - Web Frontend:**\n   ```bash\n   cd apps/web\n   npx next dev -p 3005\n   ```\n\n5. **Access the Application**\n   - **Frontend**: http://localhost:3005\n   - **API**: http://localhost:3001\n   - **Default Admin**: admin@maskwise.com / admin123\n\n### Verification\n```bash\n# Check Docker services are healthy\ndocker-compose ps\n\n# Test API is responding\ncurl http://localhost:3001/health\n\n# All services should show as running/healthy\n```\n\n### Troubleshooting\n- **Port conflicts**: Change ports in the commands above if needed\n- **Docker issues**: Run `docker-compose down` and restart\n- **Database connection**: Ensure PostgreSQL container is healthy before starting API\n- **Missing dependencies**: Run `npm install` in individual app directories if needed\n\n## Production Deployment\n\n### 🏭 Production Docker Deployment (Recommended)\n\n**Deploy Maskwise to production using battle-tested Docker images:**\n\n```bash\n# 1. Setup production environment\ngit clone https://github.com/bluewave-labs/maskwise.git\ncd maskwise\ncp .env.production.example .env\n\n# 2. Configure secure production values\n# Edit .env with:\n# - Strong POSTGRES_PASSWORD (use a password manager)\n# - Secure JWT_SECRET (32+ random characters)  \n# - External database URLs if using managed services\n# - Custom ports if needed\n\n# 3. Deploy instantly with pre-built images\ndocker-compose -f docker-compose.production.yml up -d\n\n# 4. Initialize database (first-time only)\ndocker-compose -f docker-compose.production.yml exec api npx prisma migrate deploy\ndocker-compose -f docker-compose.production.yml exec api npx prisma db seed\n\n# 5. Verify deployment\ndocker-compose -f docker-compose.production.yml ps\ncurl -f http://localhost:3001/health\n```\n\n**🛡️ Production Features:**\n- ✅ **Zero build time** - pre-built images ready to deploy\n- ✅ **Multi-platform** - works on amd64/arm64 (Apple Silicon, AWS Graviton)\n- ✅ **Security hardened** - Alpine Linux with non-root users\n- ✅ **Auto-healing** - health checks with automatic restart\n- ✅ **Resource optimized** - memory limits and CPU controls\n- ✅ **High availability** - separate API, Worker, and Web services\n\n### Option 2: Build from Source\n1. **Copy environment template**\n   ```bash\n   cp .env.example .env\n   # Edit .env with your production values\n   ```\n\n2. **Deploy**\n   ```bash\n   make prod\n   ```\n\n### Option 3: Kubernetes with Helm (Enterprise)\n1. **Prerequisites**\n   - Kubernetes cluster (1.19+)\n   - Helm 3.0+\n   - kubectl configured\n\n2. **Configure values**\n   ```bash\n   # Edit production values\n   cp k8s/values-production.yaml k8s/values-production-custom.yaml\n   # Update image registry, domains, secrets, etc.\n   ```\n\n3. **Deploy**\n   ```bash\n   # One-command deployment\n   make k8s-deploy\n   \n   # Or manual\n   ./k8s/deploy.sh\n   ```\n\n4. **Access**\n   ```bash\n   # Port forward for local access\n   make k8s-port\n   \n   # Or configure ingress for external access\n   # Update ingress.hosts in values file\n   ```\n\n### Kubernetes Features\n- **Auto-scaling**: HPA based on CPU/memory\n- **High Availability**: Multi-replica deployments\n- **Rolling Updates**: Zero-downtime deployments  \n- **Monitoring**: Prometheus integration ready\n- **Security**: Pod security contexts, network policies\n- **Storage**: Persistent volumes for data\n\n## Development\n\n### Starting Development Environment\nFollow the installation steps above to run in development mode.\n\n### Building and Testing\n```bash\n# Build individual packages\ncd apps/api \u0026\u0026 npm run build\ncd apps/web \u0026\u0026 npm run build\ncd apps/worker \u0026\u0026 npm run build\n\n# Run linting\ncd apps/api \u0026\u0026 npm run lint\ncd apps/web \u0026\u0026 npm run lint\n\n# Type checking\ncd apps/api \u0026\u0026 npm run type-check\ncd apps/web \u0026\u0026 npm run type-check\n\n# Run tests\ncd apps/api \u0026\u0026 npm test\n```\n\n### Database Operations\n```bash\ncd packages/database\n\n# Reset database (careful!)\nnpx prisma migrate reset\n\n# Apply new migrations\nnpx prisma migrate dev\n\n# View data in browser\nnpx prisma studio\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbluewave-labs%2Fmaskwise","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbluewave-labs%2Fmaskwise","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbluewave-labs%2Fmaskwise/lists"}