https://github.com/sergiomarquezdev/yt-transcriber
🛠️ CLI tool to transcribe YouTube videos using OpenAI Whisper with CUDA acceleration, generate AI summaries (EN/ES) with Gemini, and create LinkedIn/Twitter content. Supports YouTube, Google Drive, and local files.
https://github.com/sergiomarquezdev/yt-transcriber
ai cli cuda gemini python transcription whisper youtube
Last synced: about 1 month ago
JSON representation
🛠️ CLI tool to transcribe YouTube videos using OpenAI Whisper with CUDA acceleration, generate AI summaries (EN/ES) with Gemini, and create LinkedIn/Twitter content. Supports YouTube, Google Drive, and local files.
- Host: GitHub
- URL: https://github.com/sergiomarquezdev/yt-transcriber
- Owner: sergiomarquezdev
- Created: 2025-12-04T10:38:17.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2026-05-06T08:33:25.000Z (about 2 months ago)
- Last Synced: 2026-05-06T10:38:36.877Z (about 2 months ago)
- Topics: ai, cli, cuda, gemini, python, transcription, whisper, youtube
- Language: Python
- Homepage:
- Size: 316 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Agents: AGENTS.md
Awesome Lists containing this project
README
# YouTube Video Transcriber & Summarizer
[](https://www.python.org/downloads/)
[](LICENSE)
[](https://pytorch.org/)
**Transform YouTube videos into searchable text, AI summaries, and social media content.**
## Features
- **CUDA-accelerated transcription** with OpenAI Whisper
- **Multi-language support** with auto-detection
- **AI-powered summaries** (EN + ES) via Claude CLI:
- Executive summary (2-3 sentences)
- Key points (5-10 bullets)
- Smart timestamps (inferred from content)
- Action items (when applicable)
- **Social Media Post Kits**: Auto-generate LinkedIn posts + Twitter threads
- **Optional timestamped segments**: emit `_segments.json` sidecar with Whisper `start/end/text`
- **Optional visual evidence (V1)**: extract midpoint frames per segment for local files
- **Multiple input sources**: YouTube, Google Drive, local files
- **Performance optimized**:
- Automatic memory cleanup (no memory leaks)
- Auto temp file cleanup
## Quick Start
```bash
# Clone the repository
git clone https://github.com/sergiomarquezdev/yt-transcriber.git
cd yt-transcriber
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/macOS
.\venv\Scripts\activate # Windows
# Install dependencies
pip install -e .
# Configure environment
cp .env.example .env
# Ensure `claude` CLI is in PATH (requires Claude Max/Pro subscription)
# Transcribe your first video
yt-transcriber transcribe --url "https://www.youtube.com/watch?v=VIDEO_ID"
```
## Prerequisites
- **Python 3.12+**
- **FFmpeg** - Required for audio processing
- **CUDA 12.8** (Optional) - For GPU acceleration
- **Claude CLI** (Optional) - Required for AI summaries, translations, and post kits. Install from [Claude Code](https://docs.anthropic.com/en/docs/claude-code) and ensure `claude` is in your PATH with an active subscription.
### FFmpeg Installation
**Windows:**
```powershell
# Download from https://github.com/BtbN/FFmpeg-Builds/releases
# Extract to C:\ffmpeg and add to PATH
[Environment]::SetEnvironmentVariable("Path", $env:Path + ";C:\ffmpeg\bin", [EnvironmentVariableTarget]::User)
```
**macOS:**
```bash
brew install ffmpeg
```
**Linux:**
```bash
sudo apt install ffmpeg # Ubuntu/Debian
```
## Usage
```bash
# Only transcription (DEFAULT)
yt-transcriber transcribe --url "https://www.youtube.com/watch?v=VIDEO_ID"
# Transcription + AI summaries (EN + ES)
yt-transcriber transcribe --url "URL" --summarize
# Transcription + summaries + Post Kits (LinkedIn + Twitter)
yt-transcriber transcribe --url "URL" --post-kits
# Force Spanish transcription
yt-transcriber transcribe --url "URL" --language es
# Local file
yt-transcriber transcribe --url "path/to/video.mp4" --summarize
# Emit timestamped segments sidecar JSON
yt-transcriber transcribe --url "URL_OR_LOCAL_FILE" --segments
# Extract visual evidence frames (implies --segments, local files only in V1)
yt-transcriber transcribe --url "path/to/video.mp4" --visual-evidence
```
### CLI Options
| Option | Short | Description |
|--------|-------|-------------|
| `--url` | `-u` | YouTube URL, Google Drive URL, or local file path |
| `--language` | `-l` | Language code (`en`, `es`) - auto-detect if omitted |
| `--summarize` | | Generate AI summaries (EN + ES) |
| `--post-kits` | | Generate LinkedIn + Twitter content (implies --summarize) |
| `--segments` / `--no-segments` | | Enable/disable `_segments.json` sidecar (CLI override; env fallback when omitted) |
| `--visual-evidence` / `--no-visual-evidence` | | Enable/disable frame extraction per segment (local files only in V1; visual implies segments) |
| `--ffmpeg-location` | | Custom FFmpeg path |
## Output
Files are saved to `output/`:
```
output/
├── transcripts/
│ ├── {title}_vid_{id}.txt # Raw transcription (always)
│ ├── {title}_vid_{id}_segments.json # Optional: segments sidecar
│ └── {title}_vid_{id}_frame_{idx}.jpg # Optional: visual evidence frames
└── summaries/
├── {title}_vid_{id}_summary_EN.md # English summary
├── {title}_vid_{id}_summary_ES.md # Spanish summary
└── {title}_vid_{id}_post_kits.md # LinkedIn + Twitter content
```
### AI Summary Contents
- Executive summary
- Key points (5-7 bullets)
- Timestamps (5-8 important moments)
- Action items
### Post Kits Contents
- **LinkedIn post** (800-1200 chars): Professional hook, insights, CTA, hashtags
- **Twitter thread** (8-12 tweets): Numbered tweets, max 280 chars each
## Configuration
Create `.env` from template:
```bash
# Whisper Model
WHISPER_MODEL_NAME=base # tiny, base, small, medium, large
WHISPER_DEVICE=cuda # cuda or cpu
# Claude CLI (required for summaries, translations, post kits)
# Ensure `claude` is in PATH with active subscription (Max/Pro)
# CLAUDE_CLI_PATH=claude
# CLAUDE_CLI_TIMEOUT=180
# DEFAULT_LLM_MODEL=sonnet
# Directories
TEMP_DOWNLOAD_DIR=temp_files/
OUTPUT_TRANSCRIPTS_DIR=output/transcripts/
SUMMARY_OUTPUT_DIR=output/summaries/
# Optional transcript segments sidecar (default off)
TRANSCRIPT_SEGMENTS_ENABLED=false
# Optional visual evidence extraction (default off, local files only in V1)
VISUAL_EVIDENCE_ENABLED=false
VISUAL_EVIDENCE_MIN_SEGMENT_SECONDS=1.0
# Logging
LOG_LEVEL=INFO
```
### Model Selection
| Model | Speed | Accuracy | VRAM | Use Case |
|-------|-------|----------|------|----------|
| `tiny` | Fast | Low | ~1GB | Quick drafts |
| `base` | Good | Medium | ~1GB | **Default - Balanced** |
| `small` | Medium | Good | ~2GB | Better quality |
| `medium` | Slow | High | ~5GB | High accuracy |
| `large` | Slowest | Best | ~10GB | Best quality |
## Programmatic Usage
```python
from yt_transcriber.cli import run_transcribe_command
# Only transcription (default)
transcript, _, _, _ = run_transcribe_command(url="path/to/video.mp4")
# With summaries
transcript, summary_en, summary_es, _ = run_transcribe_command(
url="https://www.youtube.com/watch?v=VIDEO_ID",
generate_summary=True,
)
# With post kits (implies summary)
transcript, summary_en, summary_es, post_kits = run_transcribe_command(
url="https://www.youtube.com/watch?v=VIDEO_ID",
generate_post_kits=True,
)
```
## Performance & Resource Management
The application includes automatic optimizations for production use:
**Memory Management:**
- Whisper model auto-loads/unloads per video (prevents memory leaks)
- Automatic garbage collection after each transcription
**Resource Cleanup:**
- Temp files auto-deleted after processing (even on errors)
- No manual cleanup needed
**FFmpeg:**
- Timeout protection (5min) prevents hung processes
## Troubleshooting
**FFmpeg not found:**
```bash
# Use direct path
yt-transcriber transcribe --url "URL" --ffmpeg-location "C:\ffmpeg\bin\ffmpeg.exe"
```
**CUDA not available:**
```bash
# Check installation
python -c "import torch; print(torch.cuda.is_available())"
# Fall back to CPU in .env
WHISPER_DEVICE=cpu
```
**Out of memory:**
```bash
# Use smaller model in .env
WHISPER_MODEL_NAME=tiny
```
## License
MIT License - see [LICENSE](LICENSE)
## Acknowledgments
- [OpenAI Whisper](https://github.com/openai/whisper) - AI transcription
- [yt-dlp](https://github.com/yt-dlp/yt-dlp) - Video downloading
- [Claude Code](https://docs.anthropic.com/en/docs/claude-code) - AI summarization & translation
---
Made with care by [Sergio Marquez](https://github.com/sergiomarquezdev)