https://github.com/jhtwu/youtube-to-srt

YouTube → best-channel (VAD+SNR) → dual-mono → SRT via Faster-Whisper
https://github.com/jhtwu/youtube-to-srt

asr audio dualmono faster-whisper srt whisper youtube yt-dlp

Last synced: 20 days ago
JSON representation

YouTube → best-channel (VAD+SNR) → dual-mono → SRT via Faster-Whisper

Host: GitHub
URL: https://github.com/jhtwu/youtube-to-srt
Owner: jhtwu
Created: 2025-09-04T01:05:55.000Z (about 1 month ago)
Default Branch: main
Last Pushed: 2025-09-04T01:11:01.000Z (about 1 month ago)
Last Synced: 2025-09-04T03:13:16.259Z (about 1 month ago)
Topics: asr, audio, dualmono, faster-whisper, srt, whisper, youtube, yt-dlp
Language: Python
Size: 8.79 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Agents: AGENTS.md

Awesome Lists containing this project

README

# YouTube Audio → Dual‑Mono → SRT (Simple Pipeline)

## Overview
This repo provides a minimal, reliable pipeline to:
- Download audio from YouTube.
- Auto‑select the better speech channel (L/R) using VAD+SNR.
- Produce a dual‑mono MP3 to avoid mono/earbud phase cancellation.
- Transcribe the audio to SRT using Faster‑Whisper.

For a full Chinese guide, see `README_ZH.md`.

## Requirements
- Python 3.10+
- `ffmpeg`
- Network access (download models and YouTube media)
- Install deps: `pip install -r requirements.txt`

## One‑Command Pipeline
- Activate your venv, then run:
- `python scripts/auto_simple_pipeline.py "" -o downloads -m small --device auto`
- Outputs:
- Audio: `downloads/.dualmono.mp3`
- Subtitles: `downloads/.dualmono.srt`

## Components
- `scripts/auto_simple_pipeline.py`: Orchestrates download → best channel → dual‑mono → SRT.
- `scripts/select_best_channel.py`: Scores L/R via WebRTC VAD + SNR and writes dual‑mono.
- `scripts/transcribe_simple.py`: Full‑file SRT transcription via Faster‑Whisper (no chunking/enhance).

## Notes
- Models: `tiny/base/small/medium/large-v3`. Default `small` balances speed/accuracy.
- Device: `--device auto|cpu|cuda`; with NVIDIA GPU, prefer `cuda`.
- Compute type:
- CPU: `int8` (fast, compact) or `float32` (slower, slightly more accurate).
- GPU: `float16` is a solid default; `int8_float16` if supported and memory‑constrained.
- Why dual‑mono? Some videos contain near‑out‑of‑phase stereo segments; mono/one‑ear playback can cancel speech. Dual‑mono prevents this and improves ASR stability.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jhtwu/youtube-to-srt

Awesome Lists containing this project

README