An open API service indexing awesome lists of open source software.

https://github.com/zensgit/dedupcad-vision

Graphics-based CAD drawing deduplication using computer vision
https://github.com/zensgit/dedupcad-vision

Last synced: 2 months ago
JSON representation

Graphics-based CAD drawing deduplication using computer vision

Awesome Lists containing this project

README

          

# CADDedup Vision

**Graphics-based CAD drawing deduplication using computer vision techniques**

[![CI](https://github.com/zensgit/dedupcad-vision/actions/workflows/ci.yml/badge.svg)](https://github.com/zensgit/dedupcad-vision/actions/workflows/ci.yml)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Docker](https://img.shields.io/badge/docker-ready-brightgreen.svg)](https://github.com/users/zensgit/packages/container/package/dedupcad-vision)

## Overview

CADDedup Vision is a high-performance, production-ready system for detecting duplicate CAD drawings using computer vision. It features a **progressive 4-layer search architecture** that balances speed and accuracy.

## Documentation Map

- Documentation index: [docs/DOCUMENTATION_INDEX.md](docs/DOCUMENTATION_INDEX.md)
- Deployment guide: [docs/DEPLOYMENT.md](docs/DEPLOYMENT.md)
- Windows Server deployment: [docs/WINDOWS_SERVER_DEPLOYMENT.md](docs/WINDOWS_SERVER_DEPLOYMENT.md)
- Pre-release checklist: [docs/PRE_RELEASE_CHECKLIST.md](docs/PRE_RELEASE_CHECKLIST.md)
- Operations runbook: [docs/OPERATIONS_RUNBOOK.md](docs/OPERATIONS_RUNBOOK.md)
- API v2 reference: [docs/API_V2_REFERENCE.md](docs/API_V2_REFERENCE.md)
- Technical handoff note: [reports/TECHNICAL_SESSION_NOTES_20260310.md](reports/TECHNICAL_SESSION_NOTES_20260310.md)

### Key Features

- **Progressive Search**: L1 (pHash) → L2 (FAISS) → L3 (ML) → L4 (Geometric)
- **Sub-second Search**: 50-300ms for most queries
- **Scalable**: Handles 100K+ drawings with FAISS indexing
- **Production Ready**: Kubernetes Helm chart, monitoring, caching
- **Extensible**: Plugin architecture for ML Platform and DedupCAD integration

### Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│ Progressive Search Engine │
├─────────────────────────────────────────────────────────────────┤
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ L1: pHash │ → │ L2: FAISS │ → │ L3: ML │ → L4 │
│ │ (~1ms) │ │ (~10ms) │ │ (optional) │ │
│ │ Fast filter │ │ ANN search │ │ Deep verify │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Cache Layer (Redis) │ Rate Limiting │ Telemetry (OpenTelemetry)│
└─────────────────────────────────────────────────────────────────┘
```

## Quick Start

### Docker (Recommended)

```bash
# Pull and run (latest)
docker run -p 8000:8000 ghcr.io/zensgit/dedupcad-vision:latest

# Or pin a release version
docker run -p 8000:8000 ghcr.io/zensgit/dedupcad-vision:1.1.7

# Optional: Docker Hub mirror (if configured for this repo)
docker run -p 8000:8000 :latest

# Or with docker-compose
docker-compose up -d
```

Note: `ghcr.io` container packages may be private. If you see `401 Unauthorized`, either make the
package public (GitHub UI -> Packages -> Settings -> Change visibility) or login with a GitHub PAT:
`docker login ghcr.io` (token scope: `read:packages`).

Note: The root Dockerfile exposes port 8000. The Python entrypoint defaults to 58001.

### Python Installation

Tested Python versions: 3.10, 3.11, 3.13 (3.11 recommended). Python 3.13
uses NumPy 2.x and faiss-cpu>=1.10.0 via dependency markers.

```bash
# Install from PyPI
pip install caddedup-vision

# Install with all extras
pip install caddedup-vision[all]

# Start the server
caddedup-vision
```

Default port for the Python entrypoint is 58001. Override with CADDEDUP_VISION_PORT if needed.

### Kubernetes (Helm)

```bash
helm install caddedup-vision ./deploy/helm/caddedup-vision \
--set redis.auth.password=your-password \
--set persistence.enabled=true
```

If you deploy from `ghcr.io` and the image is private, create an `imagePullSecret` and set
`imagePullSecrets` in Helm values. See `deploy/helm/caddedup-vision/README.md`.

For detailed deployment instructions, see [Deployment Guide](docs/DEPLOYMENT.md).
For a step-by-step development + verification checklist, see `docs/DEV_AND_VERIFY_ZH.md`.

## API Usage

### Search for Duplicates

```bash
# Upload and search
curl -X POST http://localhost:58001/api/v2/search \
-F "file=@drawing.pdf" \
-F "mode=balanced"
```

### End-to-End Smoke Check (Search + Visual Diff)

Use the bundled script to verify the full flow:
upload/index -> search similar drawings -> generate colored visual diff.

```bash
# 1) start server
python3 start_server.py --port 58001

# 2) run smoke test in another terminal
scripts/smoke_search_visual_diff.sh
```

Optional arguments:

```bash
scripts/smoke_search_visual_diff.sh
```

Expected output includes:
- index response (`success=true`)
- search response with at least one candidate (`similar` or `duplicates`)
- visual diff response (`success=true`)
- generated diff image: `/tmp/visual_diff_stored.png`

### Python Client

```python
import httpx

async with httpx.AsyncClient() as client:
with open("drawing.pdf", "rb") as f:
response = await client.post(
"http://localhost:58001/api/v2/search",
files={"file": f},
data={"mode": "balanced"}
)
result = response.json()

matches = (result.get("duplicates") or []) + (result.get("similar") or [])
for match in matches:
print(f"Match: {match['file_name']} ({match['similarity']:.1%})")
```

### Search Modes

| Mode | Layers | Typical Speed | Accuracy | Use Case |
|------|--------|---------------|----------|----------|
| `l1` | L1 (pHash) | ~5ms | Coarse | Ultra fast filtering |
| `fast` | L1 + L2 (FAISS) | ~10-50ms | Good | Quick screening |
| `balanced` | L1 + L2 (+ optional L3) | ~200-500ms | Better | Recommended |
| `precise` | L1 + L2 (+ optional L3/L4) | ~0.5-10s | Best | Final verification |

See [API Documentation](docs/API_USAGE.md) for complete reference.

## Web UI

The system includes a built-in Web UI for management and monitoring.

- **URL**: `http://localhost:8000`
- **URL (Python entrypoint)**: `http://localhost:58001`
- **Features**:
- **Search**: Drag & drop file search with visual diff.
- **License Manager**: Generate and validate licenses (Requires Auth).
- **Update Monitor**: Track plugin update status and errors.

### Authentication

Admin features (License generation, Update config) are protected by Basic Authentication.

- **Default User**: `admin`
- **Default Password**: `admin`
- **Configuration**: Set `ADMIN_USER` and `ADMIN_PASSWORD` environment variables.

## Configuration

### Environment Variables

```bash
# Server
CADDEDUP_VISION_PORT=58001
CADDEDUP_VISION_WORKERS=1

# Search Thresholds
PHASH_THRESHOLD=10
FEATURE_SIMILARITY_MIN=0.85

# Redis
REDIS_URL=redis://localhost:6379/0

# Rate Limiting
RATE_LIMIT_ENABLED=true
RATE_LIMIT_SEARCH=100/minute

# Telemetry (optional)
OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
```

### Helm Values (Production)

```yaml
# High Availability
replicaCount: 3
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10

# Monitoring
metrics:
serviceMonitor:
enabled: true
prometheusRule:
enabled: true

grafana:
dashboard:
enabled: true

# Caching
redis:
architecture: replication
```

See [Helm Chart README](deploy/helm/caddedup-vision/README.md) for full configuration.

## Operations

For production deployment and ops checklists, see `docs/OPERATIONS_RUNBOOK.md`.

## Delivery Pack

See `reports/DELIVERY_SUMMARY.md` for a concise handoff index.

## User Flow Recap

- English: `docs/USER_FLOW_RECAP.md`
- 中文版: `docs/USER_FLOW_RECAP_ZH.md`

## Project Structure

```
dedupcad-vision/
├── src/caddedup_vision/
│ ├── api/ # FastAPI application
│ ├── core/ # Core algorithms (pHash, features)
│ ├── search/ # Search engine & indexes
│ ├── cache/ # Multi-layer caching
│ ├── telemetry/ # OpenTelemetry integration
│ ├── logging/ # Structured logging
│ └── storage/ # Storage backends (S3, local)
├── tests/ # 287 tests
├── deploy/
│ └── helm/ # Kubernetes Helm chart
├── docs/ # Documentation
└── .github/workflows/ # CI/CD pipelines
```

## Development

### Setup

```bash
# Clone and install
git clone https://github.com/your-org/dedupcad-vision.git
cd dedupcad-vision

# Create a virtual env (Python >= 3.10, tested with 3.11)
python3.11 -m venv .venv
source .venv/bin/activate
python -m pip install -e ".[dev,test]"

# Run tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=src/caddedup_vision --cov-report=html
```

### Testing

```bash
# All tests
pytest tests/ -v

# Specific module
pytest tests/test_search.py -v

# With markers
pytest tests/ -m "not slow" -v
```

## Monitoring

### Metrics (Prometheus)

- `caddedup_vision_search_requests_total` - Search request count
- `caddedup_vision_search_duration_seconds` - Search latency histogram
- `caddedup_vision_search_layer_hits_total` - Layer hit distribution
- `caddedup_vision_cache_hit_rate` - Cache effectiveness

### Grafana Dashboard

Pre-built dashboard included in Helm chart:
- Request overview (QPS, latency, error rate)
- Progressive search layer analysis
- Redis & cache performance
- Resource utilization

### Alerting

PrometheusRule alerts for:
- High error rates
- Latency degradation
- Circuit breaker trips
- Resource exhaustion

## Roadmap

- [x] Core algorithms (pHash, FAISS)
- [x] Progressive 4-layer search
- [x] FastAPI REST API
- [x] Redis caching
- [x] Rate limiting
- [x] Kubernetes Helm chart
- [x] Prometheus metrics & Grafana dashboard
- [x] OpenTelemetry tracing
- [x] CI/CD pipelines
- [x] ML Platform integration (L3)
- [x] DedupCAD integration (L4)
- [x] Batch processing API
- [x] Web UI

## License

MIT License - see [LICENSE](LICENSE) for details.

## Acknowledgments

- [OpenCV](https://opencv.org/) - Computer vision
- [FAISS](https://github.com/facebookresearch/faiss) - Vector similarity search
- [FastAPI](https://fastapi.tiangolo.com/) - Modern web framework
- [OpenTelemetry](https://opentelemetry.io/) - Observability

---

**Version**: 1.0.0
**Status**: Production Ready