https://github.com/jhtwu/youtube-to-srt
YouTube → best-channel (VAD+SNR) → dual-mono → SRT via Faster-Whisper
https://github.com/jhtwu/youtube-to-srt
asr audio dualmono faster-whisper srt whisper youtube yt-dlp
Last synced: 20 days ago
JSON representation
YouTube → best-channel (VAD+SNR) → dual-mono → SRT via Faster-Whisper
- Host: GitHub
- URL: https://github.com/jhtwu/youtube-to-srt
- Owner: jhtwu
- Created: 2025-09-04T01:05:55.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2025-09-04T01:11:01.000Z (about 1 month ago)
- Last Synced: 2025-09-04T03:13:16.259Z (about 1 month ago)
- Topics: asr, audio, dualmono, faster-whisper, srt, whisper, youtube, yt-dlp
- Language: Python
- Size: 8.79 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Agents: AGENTS.md
Awesome Lists containing this project
README
# YouTube Audio → Dual‑Mono → SRT (Simple Pipeline)
## Overview
This repo provides a minimal, reliable pipeline to:
- Download audio from YouTube.
- Auto‑select the better speech channel (L/R) using VAD+SNR.
- Produce a dual‑mono MP3 to avoid mono/earbud phase cancellation.
- Transcribe the audio to SRT using Faster‑Whisper.For a full Chinese guide, see `README_ZH.md`.
## Requirements
- Python 3.10+
- `ffmpeg`
- Network access (download models and YouTube media)
- Install deps: `pip install -r requirements.txt`## One‑Command Pipeline
- Activate your venv, then run:
- `python scripts/auto_simple_pipeline.py "" -o downloads -m small --device auto`
- Outputs:
- Audio: `downloads/.dualmono.mp3`
- Subtitles: `downloads/.dualmono.srt`## Components
- `scripts/auto_simple_pipeline.py`: Orchestrates download → best channel → dual‑mono → SRT.
- `scripts/select_best_channel.py`: Scores L/R via WebRTC VAD + SNR and writes dual‑mono.
- `scripts/transcribe_simple.py`: Full‑file SRT transcription via Faster‑Whisper (no chunking/enhance).## Notes
- Models: `tiny/base/small/medium/large-v3`. Default `small` balances speed/accuracy.
- Device: `--device auto|cpu|cuda`; with NVIDIA GPU, prefer `cuda`.
- Compute type:
- CPU: `int8` (fast, compact) or `float32` (slower, slightly more accurate).
- GPU: `float16` is a solid default; `int8_float16` if supported and memory‑constrained.
- Why dual‑mono? Some videos contain near‑out‑of‑phase stereo segments; mono/one‑ear playback can cancel speech. Dual‑mono prevents this and improves ASR stability.