https://github.com/redredchen01/wm-tool-enhanced
https://github.com/redredchen01/wm-tool-enhanced
Last synced: 14 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/redredchen01/wm-tool-enhanced
- Owner: redredchen01
- Created: 2026-04-15T10:03:44.000Z (about 2 months ago)
- Default Branch: feat/auto-quality-enhancement
- Last Pushed: 2026-04-15T10:29:55.000Z (about 2 months ago)
- Last Synced: 2026-04-15T12:15:56.801Z (about 2 months ago)
- Language: Python
- Size: 1.46 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Agents: AGENTS.md
Awesome Lists containing this project
README
# wm-tool
Local video floating-text watermark detector and remover — frame-by-frame OCR, temporal tracking, stable mask generation, multi-strategy removal, and optional custom watermark overlay.
## Features
- **EasyOCR / PaddleOCR detection** with confidence filtering, area ratio rejection, and edge-region focus
- **Temporal tracking** — assigns bounding boxes across keyframes using EMA-smoothed centroids; survives OCR misses via configurable gap tolerance
- **Geometry filter** — distinguishes flat 2D overlays from angled physical logos (clothing, products) via skew / aspect / area coefficient-of-variation tests
- **ROI refine pass** — second-pass detection inside predicted track regions at a lower confidence threshold to catch semi-transparent text
- **Stable mask generation** — unions per-track bounding boxes across all frames; supports expand/feather margins, hint regions, and strict-mode (detection-only, no interpolation)
- **Auto-chunking** — splits long videos into ≤4 GB RAM chunks and concatenates the output seamlessly
- **Multiple removal strategies** — see table below
- **Gradio web UI** — full pipeline accessible via browser without writing config files
- **Apple Silicon GPU support** — MPS auto-detected; CUDA supported on Linux
- **Phase 2 Performance Optimizations** — 55-60% speedup via dynamic ROI, optical flow skipping, batch OCR, and adaptive morphology
## Installation
```bash
pip install -r requirements.txt
```
**选择 OCR 引擎** (Phase 1 优化已完成):
- EasyOCR (默认): `pip install easyocr` — 稳定,多语言,100-150ms/frame
- PaddleOCR (推荐): `pip install paddleocr>=2.7` — 快 2-3 倍,模型小 80%,30-50ms/frame
在 `configs/demo.yaml` 中选择 `backend: "easyocr"` 或 `backend: "paddleocr"`
Requires `ffmpeg` on `PATH` for accurate duration probing and video concatenation.
## Quick Start
**CLI**
```bash
# Single file
python -m src.app --input video.mp4 --output out.mp4 --config configs/demo.yaml
# Batch mode
python -m src.app --input-dir ./videos/ --output-dir ./output/ --config configs/demo.yaml
```
**Web UI**
```bash
python webui.py
# or
python webui.py --port 7861 --share
```
## Configuration
All options are set via a YAML file or the web UI. The full schema is defined in `src/config.py`.
```yaml
detection:
backend: easyocr # easyocr | paddleocr
detect_interval: 10 # run OCR every N frames
confidence_threshold: 0.3
max_area_ratio: 0.25 # reject boxes covering >25% of frame (subtitles/logos)
detect_max_side: 960 # downscale longer side before OCR (0 = disabled)
frame_enhance: false # CLAHE + unsharp mask — helps semi-transparent text
roi_refine: false # second-pass inside predicted regions
# Phase 2 Performance Optimizations (experimental, enabled by default)
enable_dynamic_roi: true # learn watermark position, auto-narrow search region
roi_history_window: 15 # frames to accumulate before suggesting ROI
enable_optical_flow: true # skip detection on low-motion frames
motion_threshold: 5.0 # optical flow magnitude threshold (pixels)
tracking:
gap_tolerance: 60 # survive N-frame OCR gaps before closing a track
min_track_frames: 2 # drop single-frame noise
min_track_ratio: 0.0 # fraction of keyframes a track must appear in
geometry_filter: true # reject physically angled text
mask:
expand_px: 20 # grow mask beyond detected bbox
cover_expand_px: 15 # extra pixels during removal only
feather_radius: 0 # soft edge (pixels)
hint_regions: # force-mask regions regardless of detection
- [0.0, 0.0, 0.15, 0.08] # x_pct, y_pct, w_pct, h_pct
# Phase 2 Performance Optimizations
enable_morphological_fastpath: true # adaptive dilation for fast mask generation
remove:
strategy: gaussian_blur # default strategy when adaptive is disabled
blur_ksize: 71
# Phase 3.8: Adaptive Watermark Removal Strategy
# Routes removal based on background complexity: simple regions → Gaussian blur (fast),
# complex regions → LaMa (high quality). Improves output quality by 5-20%.
enable_adaptive_strategy: true
adaptive_complexity_threshold: 0.35 # 0.0 (simple) to 1.0 (complex)
simple_region_strategy: gaussian_blur # Fast strategies: gaussian_blur, smart_cover, solid
complex_region_strategy: lama # Quality strategies: lama, inpaint, temporal_median
blur_adaptive_kernel_min: 11 # Small kernel for simple regions (preserve detail)
blur_adaptive_kernel_max: 31 # Large kernel for complex regions (coverage)
encode_crf: 23 # H.264 quality (18–28 typical)
encode_preset: fast
debug:
enabled: false
output_dir: debug
```
## Performance Optimization (Phase 2)
Phase 2 optimizations are **enabled by default** and can significantly reduce processing time:
| Optimization | Speedup | Condition |
|---|---|---|
| **Dynamic ROI** | 30-40% | Stable watermark position |
| **Optical Flow Skip** | 20-30% | Low-motion scenes |
| **Batch OCR** | 25-30% | Multi-frame detection |
| **Morphological Fast Path** | 15-25% | Mixed mask sizes |
| **Detector Cache** | 2-3s/video | Multi-video batch |
**Expected total improvement**: 55-60% for typical 1080p video (130s → 45-55s)
To disable any optimization:
```yaml
detection:
enable_dynamic_roi: false
enable_optical_flow: false
mask:
enable_morphological_fastpath: false
```
See `docs/reports/OPTIMIZATION_COMPLETE_SUMMARY.md` and `docs/reports/AUTO_OPTIMIZATION_GUIDE.md` for detailed information.
## Phase 2 Complexity-Driven Removal (Beta)
**Phase 2 Unit 3+** introduces intelligent watermark removal via background complexity detection and adaptive parameter tuning:
### Background Complexity Detection
Analyzes the mask region to determine adaptive inpainting parameters:
- **Histogram Variance**: Measures color/brightness variation in masked region
- **Edge Density**: Computes Sobel gradient magnitude around mask boundary
- **Complexity Score** (0.0-1.0): Combined metric where 0=simple, 1=complex
### Adaptive LaMa Inpainting
When `remove.enable_complexity_detection: true`, the LaMa inpainting kernel size adapts per-frame:
```yaml
remove:
strategy: lama # High-fidelity learned inpainting
enable_lama: true
enable_complexity_detection: true # Complexity-aware adaptation
lama_complexity_kernel_min: 3 # Min dilation kernel (simple bkg)
lama_complexity_kernel_max: 25 # Max dilation kernel (complex bkg)
enable_lama_batch: true # Batch frame processing (GPU)
lama_batch_size: 4 # Frames per batch
enable_lama_model_cache: true # Keep model in VRAM
```
### Phase 2 Presets
Three preconfigured presets optimize for different motion scenarios:
| Preset | detect_interval | kernel_range | Use Case |
|---|---|---|---|
| `P2-穩定` | 15 | 3–15px | Stable background, low motion |
| `P2-快速` | 5 | 9–25px | Fast-moving watermarks, complex scenes |
| `P2-混合` | 10 | 5–21px | Balanced: variable motion/complexity |
Access via Web UI: buttons under "🔮 Phase 2 複雜度感知預設" or YAML:
```yaml
detection:
detect_interval: 10
frame_enhance: true
removal:
strategy: lama
enable_complexity_detection: true
lama_complexity_kernel_min: 5
lama_complexity_kernel_max: 21
upscale:
enable_upscale: true # ESRGAN 4x super-resolution
target_height: 1080
```
### Expected Performance
Phase 2 with LaMa + ESRGAN upscaling:
- **Speed**: 3-6s/frame (GPU) → 2-4s/frame with batch processing (35% faster)
- **Quality**: SSIM ≥0.80, ΔE ≤15 (better texture/edge preservation vs. blur)
- **Resolution**: Auto-upscale 720p removal output to 1080p
## Phase 3.8: Adaptive Watermark Removal Strategy
Automatically routes watermark removal based on background complexity, achieving 5-20% quality improvement:
### How It Works
1. **Complexity Detection** per watermark region
- **Low complexity** (uniform, simple textures) → Gaussian blur (fast, preserves detail)
- **High complexity** (fine textures, edges) → LaMa (high-quality inpainting)
2. **Adaptive Kernel Sizing** for Gaussian blur
- Simple regions use **small kernels** (11px) for detail preservation
- Threshold-based routing ensures smooth transitions
### Configuration
```yaml
remove:
enable_adaptive_strategy: true # Enable adaptive routing
adaptive_complexity_threshold: 0.35 # Complexity boundary (0.0-1.0)
simple_region_strategy: gaussian_blur # Fast removal for simple regions
complex_region_strategy: lama # Quality removal for complex regions
blur_adaptive_kernel_min: 11 # Min kernel (simple regions)
blur_adaptive_kernel_max: 31 # Max kernel (complex regions)
```
### Quality Targets
- **SSIM** ≥ 0.85 (detail preservation)
- **ΔE** ≤ 10 (color consistency)
- **Temporal stability** < 1% flicker
- **Processing** ≤ 2 min per minute of 1080p video
### Tuning
- **Threshold too low**: Overuses Gaussian blur, may miss complex regions
- **Threshold too high**: Overuses LaMa, slower but higher quality
- **Default (0.35)**: Balanced for editorial workflows
## Strategies
| Strategy | Description |
|---|---|
| `gaussian_blur` | Repeated Gaussian blur over the masked region (default) |
| `smart_cover` | Blur with optional solid-color tint blended on top |
| `mosaic` | Pixelate the masked region |
| `solid` | Fill with a solid color |
| `delogo` | FFmpeg `delogo` filter — interpolates from surrounding pixels |
| `inpaint` | OpenCV Telea inpainting |
| `temporal_median` | Sample ±N frames and use the per-pixel median as background |
`temporal_median` produces the cleanest result for stationary watermarks but requires more memory and time.
## Debug Output
Enable `debug.enabled: true` to write per-stage artifacts to `debug/`:
- `keyframes/` — sampled frames sent to OCR
- `detections/` — frames with raw OCR bounding boxes drawn
- `tracks/` — frames with tracker assignments and IDs
- `masks/` — binary mask images per frame
- `removed/` — frames after removal, before re-encoding
Useful for diagnosing missed detections or incorrect mask placement.
## Requirements
- Python 3.9+
- OpenCV (`opencv-python`)
- EasyOCR (default) or PaddleOCR
- NumPy, Pydantic v2, PyYAML, tqdm
- Gradio (web UI only)
- `ffmpeg` binary (duration probing, video concatenation)