https://github.com/tazztone/subject-frame-extractor
extracting, analyzing, and filtering frames from video files or YouTube links
https://github.com/tazztone/subject-frame-extractor
ffmpeg opencv pyiqa sam3 ytdlp
Last synced: about 1 month ago
JSON representation
extracting, analyzing, and filtering frames from video files or YouTube links
- Host: GitHub
- URL: https://github.com/tazztone/subject-frame-extractor
- Owner: tazztone
- License: mit
- Created: 2025-09-18T20:24:57.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2026-05-08T22:35:36.000Z (about 2 months ago)
- Last Synced: 2026-05-08T23:38:08.556Z (about 2 months ago)
- Topics: ffmpeg, opencv, pyiqa, sam3, ytdlp
- Language: Python
- Homepage:
- Size: 14.8 MB
- Stars: 2
- Watchers: 0
- Forks: 1
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Agents: AGENTS.md
Awesome Lists containing this project
README
# Subject Frame Extractor
[](https://python.org)
[](https://pytorch.org/)
[](https://gradio.app/)
[](LICENSE)
An AI-powered tool for extracting, analyzing, and filtering high-quality frames from video footage. Designed for dataset builders (LoRA / Dreambooth training), content creators, and researchers who need curated image sets from raw video — not just raw frame dumps.
Also includes a **Photo Culling** mode for scoring and rating RAW photo libraries.
## Tech Stack
| Layer | Technology |
|---|---|
| Runtime | Python 3.10+ (3.12 recommended) |
| UI | Gradio 6.x |
| Segmentation | SAM 3 (Segment Anything Model 3, Facebook Research) |
| Object detection | YOLO (80 COCO classes) |
| Face analysis | InsightFace (similarity matching, blink detection, head pose) |
| Quality scoring | NIQE (perceptual), Laplacian variance (sharpness), pHash + LPIPS (dedup) |
| Video / media | FFmpeg, yt-dlp |
| RAW processing | ExifTool (embedded preview extraction — no demosaicing) |
| Data | PyTorch, NumPy, OpenCV, SQLite, Pydantic |
| Dependency management | uv |
## Architecture
```
/
├── app.py # Gradio UI entry point
├── cli.py # Headless CLI (extract / analyze / full / status / photo)
├── core/
│ ├── config.py # Full configuration schema (Pydantic)
│ ├── extractor.py # Extraction strategies (keyframe, interval, scene, Nth)
│ ├── analyzer.py # AI analysis pipeline (SAM seeding, tracking, metrics)
│ ├── tracker.py # Subject tracking across scenes
│ ├── face.py # InsightFace integration
│ ├── quality.py # NIQE, sharpness, entropy, LPIPS scoring
│ ├── dedup.py # pHash + LPIPS deduplication
│ ├── photo.py # Photo culling: RAW ingest, scoring, XMP sidecar export
│ └── database.py # SQLite session metadata
├── SAM3_repo/ # SAM 3 submodule
└── scripts/
├── linux_run_app.sh
└── setup scripts
```
**Pipeline:** Extract frames → scene segmentation → AI seeding (face ref / text / YOLO) → SAM 3 propagation → quality metrics → interactive filtering → AR-aware crop export.
## Key Features
- **Extraction strategies** — keyframes, fixed intervals, scene-based, every Nth frame; YouTube URL support
- **Multi-class tracking** — find and track any of 80 COCO objects via YOLO + SAM 3; open-vocabulary text descriptions
- **Face matching** — find every frame of a specific person using InsightFace reference photo
- **Quality filtering** — interactive sliders for sharpness, contrast, NIQE perceptual score
- **Smart deduplication** — pHash + LPIPS removes near-identical frames per scene
- **AR-aware export** — subject-centred crops in 1:1, 9:16, 16:9, or custom ratios
- **Photo culling mode** — RAW preview extraction (CR2, NEF, ARW, DNG, ORF…), AI scoring, export to Lightroom/Capture One XMP sidecar star ratings
## Quick Start
**Prerequisites:** Python 3.10+, FFmpeg in PATH, CUDA GPU recommended (~8 GB VRAM for SAM 3)
```bash
git clone --recursive https://github.com/tazztone/subject-frame-extractor.git
cd subject-frame-extractor
uv sync
# Launch Gradio UI
uv run python app.py
# → http://127.0.0.1:7860
```
## CLI Usage
```bash
# Extract frames
uv run python cli.py extract --video video.mp4 --output ./results --nth-frame 10
# Run AI analysis (with face reference)
uv run python cli.py analyze --session ./results --video video.mp4 --face-ref person.png --resume
# Full pipeline in one command
uv run python cli.py full --video video.mp4 --output ./results --face-ref person.png
# Photo culling workflow
uv run python cli.py photo ingest --folder /path/to/raws --output ./photo_session
uv run python cli.py photo score --session ./photo_session
uv run python cli.py photo export --session ./photo_session # → XMP sidecars
```
## Configuration
See `core/config.py` for the full Pydantic schema. Key settings:
| Category | Key Fields | Default |
|---|---|---|
| Paths | `logs_dir`, `models_dir`, `downloads_dir` | `logs`, `models`, `downloads` |
| Models | `face_model_name`, `tracker_model_name` | `buffalo_l`, `sam3` |
| Performance | `analysis_default_workers`, `cache_size` | `4`, `200` |
See [AGENTS.md](AGENTS.md) for architecture details, critical rules, and development guidelines.
## License
MIT — see [LICENSE](LICENSE).
---
## Technical Debt & Roadmap
This project uses a semi-automated TODO tracking system to prioritize refactors and features.
- **Check Current Debt**: Run `uv run python scripts/generate_todo_report.py` to generate `TODO_REPORT.md`.
- **Top 20 Summary**:
1. [High] Refactor `core/pipelines.py` to use modular `core/managers`. (In Progress)
2. [High] Implement thread-safe model access for InsightFace.
3. [Medium] Add temporal consistency smoothing between frames in `MaskPropagator`.
4. [Medium] Add adaptive quality thresholds based on propagation distance.
5. [Low] Support demosaicing for RAW photo ingest (currently uses previews).