https://github.com/skurbee/ytarchiver
Download + Compress + Transcribe + Organize + Browse
https://github.com/skurbee/ytarchiver
4kdownloader batch-processing cuda faster-whisper ffmpeg matplotlib transcription vlc-media-player whisper whisper-ai youtube yt-dlp yt-dlp-gui yt-dlp-wrapper
Last synced: 15 days ago
JSON representation
Download + Compress + Transcribe + Organize + Browse
- Host: GitHub
- URL: https://github.com/skurbee/ytarchiver
- Owner: skurbee
- License: mit
- Created: 2026-03-07T18:24:41.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-06-03T21:29:36.000Z (18 days ago)
- Last Synced: 2026-06-03T23:06:43.529Z (18 days ago)
- Topics: 4kdownloader, batch-processing, cuda, faster-whisper, ffmpeg, matplotlib, transcription, vlc-media-player, whisper, whisper-ai, youtube, yt-dlp, yt-dlp-gui, yt-dlp-wrapper
- Language: Python
- Homepage:
- Size: 101 MB
- Stars: 12
- Watchers: 0
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: docs/CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# YTArchiver
Download, organize, transcribe, search, compress, and browse entire channels.
A tour of the UI is on the [**wiki**](https://github.com/skurbee/YTArchiver/wiki).
## Features
### Downloading & Syncing
* **Channel subscriptions** — add channels to a sub list with per-channel settings
* **Selective archiving** — set a start date, grab the full history, or only pull new uploads
* **Auto-sync** — set a recurring interval (1hr, 3hr, 6hr, etc.) to automatically check all subbed channels for new videos
* **Duration filters** — exclude Shorts, livestreams, or long-form videos by setting min/max duration limits
* **Resolution control** — per-channel resolution settings (144p–1080p, or "best")
* **Redownload** — change a channel's resolution and retroactively redownload all videos at the new setting
* **Resumable downloads** — ID caching, so interrupted syncs pick back up without re-scanning
* **Cookie support** — uses Firefox cookies for age-restricted or member content, and helps avoid IP rate-limiting
### Organization
* **Folder sorting** — sort videos into `\YYYY\` or `\YYYY\MM\` folders, configurable per channel
* **Reorganize tool** — re-sort existing downloads into a new org structure at any time
* **Date Fix** — retroactively set file dates to the original YouTube upload date (fuzzy title matching, useful for migrating from other tools)
### Transcription
* **Auto-captions first** — pulls YouTube's built-in captions when available, with punctuation model cleanup
* **Whisper GPU fallback** — runs Whisper locally on GPU for videos without captions; model selectable per channel
* **Auto-transcribe** — per-channel toggle to automatically transcribe new videos after each sync
* **Transcript output** — clean `.txt` files with an option to follow the channel's folder org or combine into a single file per channel
* **Hidden JSONL sidecars** — per-word timestamps and video IDs stored alongside readable transcripts
### Browse Tab
* **Searchable transcript database** — full-text search across all transcribed channels
* **Embedded video player** — HTML5 video with a synced, scrolling transcript alongside
* **Click-to-seek** — click any word in the transcript to jump the video to that moment
* **Word frequency analysis** — frequency graphs (Year / Month / Week buckets) and word cloud visualizations
* **Recent downloads** — list or thumbnail-grid view of recently-archived videos, filterable
### Compression
* **AV1 NVENC encoding** — compress archived videos using AV1 hardware encoding (NVIDIA GPU)
* **Quality presets** — Generous, Average, and Below Average quality tiers with target bitrate-per-hour calculations
* **HQ downscale** — download at a higher resolution, then downscale for better quality at lower resolutions
### UI & Workflow
* **Four-tab layout** — Download, Subs, Browse, Settings
* **Settings categories** — General / Performance / Appearance / Tools / Index sub-tabs
* **Simple / Verbose log modes** — toggle between a readable sync view and full yt-dlp output
* **Pause / resume** — pause active downloads or transcriptions mid-session and resume without losing progress
* **Processing task queue** — transcription and compression jobs run in their own reorderable queue, separate from downloads
* **System tray** — sits in the tray with separate indicators for downloads and Processing tasks; auto-sync controllable from the tray menu
* **Internet monitoring** — automatically pauses on connection loss, resumes when connectivity is restored
* **Drive monitoring** — automatically pauses on drive failure, resumes when drive is restored
* **Auto-update** — checks for new releases on GitHub at startup
## Tech Stack
Built on **yt-dlp** + **ffmpeg** for downloading, **OpenAI Whisper** (faster-whisper) for GPU transcription, **SQLite FTS5** for the transcript index, **Chart.js** for graphing, and a **pywebview** shell rendering an HTML/CSS/JS frontend.
## Documentation
* [Architecture](docs/ARCHITECTURE.md) — process model, sync pipeline, design decisions
* [Building the exe](docs/BUILD.md) — PyInstaller workflow
* [Contributing](docs/CONTRIBUTING.md) — project layout and how to find your bearings
* [Changelog](docs/CHANGELOG.md) — release notes
* [Project map](docs/PROJECT_MAP.md) — file-by-file index