An open API service indexing awesome lists of open source software.

https://github.com/sergiomarquezdev/yt-transcriber

🛠️ CLI tool to transcribe YouTube videos using OpenAI Whisper with CUDA acceleration, generate AI summaries (EN/ES) with Gemini, and create LinkedIn/Twitter content. Supports YouTube, Google Drive, and local files.
https://github.com/sergiomarquezdev/yt-transcriber

ai cli cuda gemini python transcription whisper youtube

Last synced: about 1 month ago
JSON representation

🛠️ CLI tool to transcribe YouTube videos using OpenAI Whisper with CUDA acceleration, generate AI summaries (EN/ES) with Gemini, and create LinkedIn/Twitter content. Supports YouTube, Google Drive, and local files.

Awesome Lists containing this project

README

          

# YouTube Video Transcriber & Summarizer

[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
[![PyTorch](https://img.shields.io/badge/PyTorch-CUDA%2012.8-red.svg)](https://pytorch.org/)

**Transform YouTube videos into searchable text, AI summaries, and social media content.**

## Features

- **CUDA-accelerated transcription** with OpenAI Whisper
- **Multi-language support** with auto-detection
- **AI-powered summaries** (EN + ES) via Claude CLI:
- Executive summary (2-3 sentences)
- Key points (5-10 bullets)
- Smart timestamps (inferred from content)
- Action items (when applicable)
- **Social Media Post Kits**: Auto-generate LinkedIn posts + Twitter threads
- **Optional timestamped segments**: emit `_segments.json` sidecar with Whisper `start/end/text`
- **Optional visual evidence (V1)**: extract midpoint frames per segment for local files
- **Multiple input sources**: YouTube, Google Drive, local files
- **Performance optimized**:
- Automatic memory cleanup (no memory leaks)
- Auto temp file cleanup

## Quick Start

```bash
# Clone the repository
git clone https://github.com/sergiomarquezdev/yt-transcriber.git
cd yt-transcriber

# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/macOS
.\venv\Scripts\activate # Windows

# Install dependencies
pip install -e .

# Configure environment
cp .env.example .env
# Ensure `claude` CLI is in PATH (requires Claude Max/Pro subscription)

# Transcribe your first video
yt-transcriber transcribe --url "https://www.youtube.com/watch?v=VIDEO_ID"
```

## Prerequisites

- **Python 3.12+**
- **FFmpeg** - Required for audio processing
- **CUDA 12.8** (Optional) - For GPU acceleration
- **Claude CLI** (Optional) - Required for AI summaries, translations, and post kits. Install from [Claude Code](https://docs.anthropic.com/en/docs/claude-code) and ensure `claude` is in your PATH with an active subscription.

### FFmpeg Installation

**Windows:**
```powershell
# Download from https://github.com/BtbN/FFmpeg-Builds/releases
# Extract to C:\ffmpeg and add to PATH
[Environment]::SetEnvironmentVariable("Path", $env:Path + ";C:\ffmpeg\bin", [EnvironmentVariableTarget]::User)
```

**macOS:**
```bash
brew install ffmpeg
```

**Linux:**
```bash
sudo apt install ffmpeg # Ubuntu/Debian
```

## Usage

```bash
# Only transcription (DEFAULT)
yt-transcriber transcribe --url "https://www.youtube.com/watch?v=VIDEO_ID"

# Transcription + AI summaries (EN + ES)
yt-transcriber transcribe --url "URL" --summarize

# Transcription + summaries + Post Kits (LinkedIn + Twitter)
yt-transcriber transcribe --url "URL" --post-kits

# Force Spanish transcription
yt-transcriber transcribe --url "URL" --language es

# Local file
yt-transcriber transcribe --url "path/to/video.mp4" --summarize

# Emit timestamped segments sidecar JSON
yt-transcriber transcribe --url "URL_OR_LOCAL_FILE" --segments

# Extract visual evidence frames (implies --segments, local files only in V1)
yt-transcriber transcribe --url "path/to/video.mp4" --visual-evidence
```

### CLI Options

| Option | Short | Description |
|--------|-------|-------------|
| `--url` | `-u` | YouTube URL, Google Drive URL, or local file path |
| `--language` | `-l` | Language code (`en`, `es`) - auto-detect if omitted |
| `--summarize` | | Generate AI summaries (EN + ES) |
| `--post-kits` | | Generate LinkedIn + Twitter content (implies --summarize) |
| `--segments` / `--no-segments` | | Enable/disable `_segments.json` sidecar (CLI override; env fallback when omitted) |
| `--visual-evidence` / `--no-visual-evidence` | | Enable/disable frame extraction per segment (local files only in V1; visual implies segments) |
| `--ffmpeg-location` | | Custom FFmpeg path |

## Output

Files are saved to `output/`:

```
output/
├── transcripts/
│ ├── {title}_vid_{id}.txt # Raw transcription (always)
│ ├── {title}_vid_{id}_segments.json # Optional: segments sidecar
│ └── {title}_vid_{id}_frame_{idx}.jpg # Optional: visual evidence frames
└── summaries/
├── {title}_vid_{id}_summary_EN.md # English summary
├── {title}_vid_{id}_summary_ES.md # Spanish summary
└── {title}_vid_{id}_post_kits.md # LinkedIn + Twitter content
```

### AI Summary Contents

- Executive summary
- Key points (5-7 bullets)
- Timestamps (5-8 important moments)
- Action items

### Post Kits Contents

- **LinkedIn post** (800-1200 chars): Professional hook, insights, CTA, hashtags
- **Twitter thread** (8-12 tweets): Numbered tweets, max 280 chars each

## Configuration

Create `.env` from template:

```bash
# Whisper Model
WHISPER_MODEL_NAME=base # tiny, base, small, medium, large
WHISPER_DEVICE=cuda # cuda or cpu

# Claude CLI (required for summaries, translations, post kits)
# Ensure `claude` is in PATH with active subscription (Max/Pro)
# CLAUDE_CLI_PATH=claude
# CLAUDE_CLI_TIMEOUT=180
# DEFAULT_LLM_MODEL=sonnet

# Directories
TEMP_DOWNLOAD_DIR=temp_files/
OUTPUT_TRANSCRIPTS_DIR=output/transcripts/
SUMMARY_OUTPUT_DIR=output/summaries/

# Optional transcript segments sidecar (default off)
TRANSCRIPT_SEGMENTS_ENABLED=false

# Optional visual evidence extraction (default off, local files only in V1)
VISUAL_EVIDENCE_ENABLED=false
VISUAL_EVIDENCE_MIN_SEGMENT_SECONDS=1.0

# Logging
LOG_LEVEL=INFO
```

### Model Selection

| Model | Speed | Accuracy | VRAM | Use Case |
|-------|-------|----------|------|----------|
| `tiny` | Fast | Low | ~1GB | Quick drafts |
| `base` | Good | Medium | ~1GB | **Default - Balanced** |
| `small` | Medium | Good | ~2GB | Better quality |
| `medium` | Slow | High | ~5GB | High accuracy |
| `large` | Slowest | Best | ~10GB | Best quality |

## Programmatic Usage

```python
from yt_transcriber.cli import run_transcribe_command

# Only transcription (default)
transcript, _, _, _ = run_transcribe_command(url="path/to/video.mp4")

# With summaries
transcript, summary_en, summary_es, _ = run_transcribe_command(
url="https://www.youtube.com/watch?v=VIDEO_ID",
generate_summary=True,
)

# With post kits (implies summary)
transcript, summary_en, summary_es, post_kits = run_transcribe_command(
url="https://www.youtube.com/watch?v=VIDEO_ID",
generate_post_kits=True,
)
```

## Performance & Resource Management

The application includes automatic optimizations for production use:

**Memory Management:**
- Whisper model auto-loads/unloads per video (prevents memory leaks)
- Automatic garbage collection after each transcription

**Resource Cleanup:**
- Temp files auto-deleted after processing (even on errors)
- No manual cleanup needed

**FFmpeg:**
- Timeout protection (5min) prevents hung processes

## Troubleshooting

**FFmpeg not found:**
```bash
# Use direct path
yt-transcriber transcribe --url "URL" --ffmpeg-location "C:\ffmpeg\bin\ffmpeg.exe"
```

**CUDA not available:**
```bash
# Check installation
python -c "import torch; print(torch.cuda.is_available())"

# Fall back to CPU in .env
WHISPER_DEVICE=cpu
```

**Out of memory:**
```bash
# Use smaller model in .env
WHISPER_MODEL_NAME=tiny
```

## License

MIT License - see [LICENSE](LICENSE)

## Acknowledgments

- [OpenAI Whisper](https://github.com/openai/whisper) - AI transcription
- [yt-dlp](https://github.com/yt-dlp/yt-dlp) - Video downloading
- [Claude Code](https://docs.anthropic.com/en/docs/claude-code) - AI summarization & translation

---

Made with care by [Sergio Marquez](https://github.com/sergiomarquezdev)