https://github.com/zensgit/dedupcad-vision
Graphics-based CAD drawing deduplication using computer vision
https://github.com/zensgit/dedupcad-vision
Last synced: 2 months ago
JSON representation
Graphics-based CAD drawing deduplication using computer vision
- Host: GitHub
- URL: https://github.com/zensgit/dedupcad-vision
- Owner: zensgit
- License: mit
- Created: 2025-11-19T01:04:51.000Z (7 months ago)
- Default Branch: master
- Last Pushed: 2026-03-30T08:25:53.000Z (2 months ago)
- Last Synced: 2026-03-30T10:14:42.939Z (2 months ago)
- Language: Python
- Size: 44.7 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Security: docs/security-audit-report.md
- Agents: AGENTS.md
Awesome Lists containing this project
README
# CADDedup Vision
**Graphics-based CAD drawing deduplication using computer vision techniques**
[](https://github.com/zensgit/dedupcad-vision/actions/workflows/ci.yml)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/users/zensgit/packages/container/package/dedupcad-vision)
## Overview
CADDedup Vision is a high-performance, production-ready system for detecting duplicate CAD drawings using computer vision. It features a **progressive 4-layer search architecture** that balances speed and accuracy.
## Documentation Map
- Documentation index: [docs/DOCUMENTATION_INDEX.md](docs/DOCUMENTATION_INDEX.md)
- Deployment guide: [docs/DEPLOYMENT.md](docs/DEPLOYMENT.md)
- Windows Server deployment: [docs/WINDOWS_SERVER_DEPLOYMENT.md](docs/WINDOWS_SERVER_DEPLOYMENT.md)
- Pre-release checklist: [docs/PRE_RELEASE_CHECKLIST.md](docs/PRE_RELEASE_CHECKLIST.md)
- Operations runbook: [docs/OPERATIONS_RUNBOOK.md](docs/OPERATIONS_RUNBOOK.md)
- API v2 reference: [docs/API_V2_REFERENCE.md](docs/API_V2_REFERENCE.md)
- Technical handoff note: [reports/TECHNICAL_SESSION_NOTES_20260310.md](reports/TECHNICAL_SESSION_NOTES_20260310.md)
### Key Features
- **Progressive Search**: L1 (pHash) → L2 (FAISS) → L3 (ML) → L4 (Geometric)
- **Sub-second Search**: 50-300ms for most queries
- **Scalable**: Handles 100K+ drawings with FAISS indexing
- **Production Ready**: Kubernetes Helm chart, monitoring, caching
- **Extensible**: Plugin architecture for ML Platform and DedupCAD integration
### Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Progressive Search Engine │
├─────────────────────────────────────────────────────────────────┤
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ L1: pHash │ → │ L2: FAISS │ → │ L3: ML │ → L4 │
│ │ (~1ms) │ │ (~10ms) │ │ (optional) │ │
│ │ Fast filter │ │ ANN search │ │ Deep verify │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Cache Layer (Redis) │ Rate Limiting │ Telemetry (OpenTelemetry)│
└─────────────────────────────────────────────────────────────────┘
```
## Quick Start
### Docker (Recommended)
```bash
# Pull and run (latest)
docker run -p 8000:8000 ghcr.io/zensgit/dedupcad-vision:latest
# Or pin a release version
docker run -p 8000:8000 ghcr.io/zensgit/dedupcad-vision:1.1.7
# Optional: Docker Hub mirror (if configured for this repo)
docker run -p 8000:8000 :latest
# Or with docker-compose
docker-compose up -d
```
Note: `ghcr.io` container packages may be private. If you see `401 Unauthorized`, either make the
package public (GitHub UI -> Packages -> Settings -> Change visibility) or login with a GitHub PAT:
`docker login ghcr.io` (token scope: `read:packages`).
Note: The root Dockerfile exposes port 8000. The Python entrypoint defaults to 58001.
### Python Installation
Tested Python versions: 3.10, 3.11, 3.13 (3.11 recommended). Python 3.13
uses NumPy 2.x and faiss-cpu>=1.10.0 via dependency markers.
```bash
# Install from PyPI
pip install caddedup-vision
# Install with all extras
pip install caddedup-vision[all]
# Start the server
caddedup-vision
```
Default port for the Python entrypoint is 58001. Override with CADDEDUP_VISION_PORT if needed.
### Kubernetes (Helm)
```bash
helm install caddedup-vision ./deploy/helm/caddedup-vision \
--set redis.auth.password=your-password \
--set persistence.enabled=true
```
If you deploy from `ghcr.io` and the image is private, create an `imagePullSecret` and set
`imagePullSecrets` in Helm values. See `deploy/helm/caddedup-vision/README.md`.
For detailed deployment instructions, see [Deployment Guide](docs/DEPLOYMENT.md).
For a step-by-step development + verification checklist, see `docs/DEV_AND_VERIFY_ZH.md`.
## API Usage
### Search for Duplicates
```bash
# Upload and search
curl -X POST http://localhost:58001/api/v2/search \
-F "file=@drawing.pdf" \
-F "mode=balanced"
```
### End-to-End Smoke Check (Search + Visual Diff)
Use the bundled script to verify the full flow:
upload/index -> search similar drawings -> generate colored visual diff.
```bash
# 1) start server
python3 start_server.py --port 58001
# 2) run smoke test in another terminal
scripts/smoke_search_visual_diff.sh
```
Optional arguments:
```bash
scripts/smoke_search_visual_diff.sh
```
Expected output includes:
- index response (`success=true`)
- search response with at least one candidate (`similar` or `duplicates`)
- visual diff response (`success=true`)
- generated diff image: `/tmp/visual_diff_stored.png`
### Python Client
```python
import httpx
async with httpx.AsyncClient() as client:
with open("drawing.pdf", "rb") as f:
response = await client.post(
"http://localhost:58001/api/v2/search",
files={"file": f},
data={"mode": "balanced"}
)
result = response.json()
matches = (result.get("duplicates") or []) + (result.get("similar") or [])
for match in matches:
print(f"Match: {match['file_name']} ({match['similarity']:.1%})")
```
### Search Modes
| Mode | Layers | Typical Speed | Accuracy | Use Case |
|------|--------|---------------|----------|----------|
| `l1` | L1 (pHash) | ~5ms | Coarse | Ultra fast filtering |
| `fast` | L1 + L2 (FAISS) | ~10-50ms | Good | Quick screening |
| `balanced` | L1 + L2 (+ optional L3) | ~200-500ms | Better | Recommended |
| `precise` | L1 + L2 (+ optional L3/L4) | ~0.5-10s | Best | Final verification |
See [API Documentation](docs/API_USAGE.md) for complete reference.
## Web UI
The system includes a built-in Web UI for management and monitoring.
- **URL**: `http://localhost:8000`
- **URL (Python entrypoint)**: `http://localhost:58001`
- **Features**:
- **Search**: Drag & drop file search with visual diff.
- **License Manager**: Generate and validate licenses (Requires Auth).
- **Update Monitor**: Track plugin update status and errors.
### Authentication
Admin features (License generation, Update config) are protected by Basic Authentication.
- **Default User**: `admin`
- **Default Password**: `admin`
- **Configuration**: Set `ADMIN_USER` and `ADMIN_PASSWORD` environment variables.
## Configuration
### Environment Variables
```bash
# Server
CADDEDUP_VISION_PORT=58001
CADDEDUP_VISION_WORKERS=1
# Search Thresholds
PHASH_THRESHOLD=10
FEATURE_SIMILARITY_MIN=0.85
# Redis
REDIS_URL=redis://localhost:6379/0
# Rate Limiting
RATE_LIMIT_ENABLED=true
RATE_LIMIT_SEARCH=100/minute
# Telemetry (optional)
OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
```
### Helm Values (Production)
```yaml
# High Availability
replicaCount: 3
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
# Monitoring
metrics:
serviceMonitor:
enabled: true
prometheusRule:
enabled: true
grafana:
dashboard:
enabled: true
# Caching
redis:
architecture: replication
```
See [Helm Chart README](deploy/helm/caddedup-vision/README.md) for full configuration.
## Operations
For production deployment and ops checklists, see `docs/OPERATIONS_RUNBOOK.md`.
## Delivery Pack
See `reports/DELIVERY_SUMMARY.md` for a concise handoff index.
## User Flow Recap
- English: `docs/USER_FLOW_RECAP.md`
- 中文版: `docs/USER_FLOW_RECAP_ZH.md`
## Project Structure
```
dedupcad-vision/
├── src/caddedup_vision/
│ ├── api/ # FastAPI application
│ ├── core/ # Core algorithms (pHash, features)
│ ├── search/ # Search engine & indexes
│ ├── cache/ # Multi-layer caching
│ ├── telemetry/ # OpenTelemetry integration
│ ├── logging/ # Structured logging
│ └── storage/ # Storage backends (S3, local)
├── tests/ # 287 tests
├── deploy/
│ └── helm/ # Kubernetes Helm chart
├── docs/ # Documentation
└── .github/workflows/ # CI/CD pipelines
```
## Development
### Setup
```bash
# Clone and install
git clone https://github.com/your-org/dedupcad-vision.git
cd dedupcad-vision
# Create a virtual env (Python >= 3.10, tested with 3.11)
python3.11 -m venv .venv
source .venv/bin/activate
python -m pip install -e ".[dev,test]"
# Run tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=src/caddedup_vision --cov-report=html
```
### Testing
```bash
# All tests
pytest tests/ -v
# Specific module
pytest tests/test_search.py -v
# With markers
pytest tests/ -m "not slow" -v
```
## Monitoring
### Metrics (Prometheus)
- `caddedup_vision_search_requests_total` - Search request count
- `caddedup_vision_search_duration_seconds` - Search latency histogram
- `caddedup_vision_search_layer_hits_total` - Layer hit distribution
- `caddedup_vision_cache_hit_rate` - Cache effectiveness
### Grafana Dashboard
Pre-built dashboard included in Helm chart:
- Request overview (QPS, latency, error rate)
- Progressive search layer analysis
- Redis & cache performance
- Resource utilization
### Alerting
PrometheusRule alerts for:
- High error rates
- Latency degradation
- Circuit breaker trips
- Resource exhaustion
## Roadmap
- [x] Core algorithms (pHash, FAISS)
- [x] Progressive 4-layer search
- [x] FastAPI REST API
- [x] Redis caching
- [x] Rate limiting
- [x] Kubernetes Helm chart
- [x] Prometheus metrics & Grafana dashboard
- [x] OpenTelemetry tracing
- [x] CI/CD pipelines
- [x] ML Platform integration (L3)
- [x] DedupCAD integration (L4)
- [x] Batch processing API
- [x] Web UI
## License
MIT License - see [LICENSE](LICENSE) for details.
## Acknowledgments
- [OpenCV](https://opencv.org/) - Computer vision
- [FAISS](https://github.com/facebookresearch/faiss) - Vector similarity search
- [FastAPI](https://fastapi.tiangolo.com/) - Modern web framework
- [OpenTelemetry](https://opentelemetry.io/) - Observability
---
**Version**: 1.0.0
**Status**: Production Ready