https://github.com/sergiomarquezdev/yt-transcriber

🛠️ CLI tool to transcribe YouTube videos using OpenAI Whisper with CUDA acceleration, generate AI summaries (EN/ES) with Gemini, and create LinkedIn/Twitter content. Supports YouTube, Google Drive, and local files.
https://github.com/sergiomarquezdev/yt-transcriber

ai cli cuda gemini python transcription whisper youtube

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/sergiomarquezdev/yt-transcriber
Owner: sergiomarquezdev
Created: 2025-12-04T10:38:17.000Z (7 months ago)
Default Branch: main
Last Pushed: 2026-05-06T08:33:25.000Z (about 2 months ago)
Last Synced: 2026-05-06T10:38:36.877Z (about 2 months ago)
Topics: ai, cli, cuda, gemini, python, transcription, whisper, youtube
Language: Python
Homepage:
Size: 316 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Agents: AGENTS.md

Awesome Lists containing this project

README

          # YouTube Video Transcriber & Summarizer

[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)

[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)

[![PyTorch](https://img.shields.io/badge/PyTorch-CUDA%2012.8-red.svg)](https://pytorch.org/)

**Transform YouTube videos into searchable text, AI summaries, and social media content.**

## Features

- **CUDA-accelerated transcription** with OpenAI Whisper

- **Multi-language support** with auto-detection

- **AI-powered summaries** (EN + ES) via Claude CLI:

  - Executive summary (2-3 sentences)

  - Key points (5-10 bullets)

  - Smart timestamps (inferred from content)

  - Action items (when applicable)

- **Social Media Post Kits**: Auto-generate LinkedIn posts + Twitter threads

- **Optional timestamped segments**: emit `_segments.json` sidecar with Whisper `start/end/text`

- **Optional visual evidence (V1)**: extract midpoint frames per segment for local files

- **Multiple input sources**: YouTube, Google Drive, local files

- **Performance optimized**:

  - Automatic memory cleanup (no memory leaks)

  - Auto temp file cleanup

## Quick Start

```bash

# Clone the repository

git clone https://github.com/sergiomarquezdev/yt-transcriber.git

cd yt-transcriber

# Create virtual environment

python -m venv venv

source venv/bin/activate  # Linux/macOS

.\venv\Scripts\activate   # Windows

# Install dependencies

pip install -e .

# Configure environment

cp .env.example .env

# Ensure `claude` CLI is in PATH (requires Claude Max/Pro subscription)

# Transcribe your first video

yt-transcriber transcribe --url "https://www.youtube.com/watch?v=VIDEO_ID"

```

## Prerequisites

- **Python 3.12+**

- **FFmpeg** - Required for audio processing

- **CUDA 12.8** (Optional) - For GPU acceleration

- **Claude CLI** (Optional) - Required for AI summaries, translations, and post kits. Install from [Claude Code](https://docs.anthropic.com/en/docs/claude-code) and ensure `claude` is in your PATH with an active subscription.

### FFmpeg Installation

**Windows:**

```powershell

# Download from https://github.com/BtbN/FFmpeg-Builds/releases

# Extract to C:\ffmpeg and add to PATH

[Environment]::SetEnvironmentVariable("Path", $env:Path + ";C:\ffmpeg\bin", [EnvironmentVariableTarget]::User)

```

**macOS:**

```bash

brew install ffmpeg

```

**Linux:**

```bash

sudo apt install ffmpeg  # Ubuntu/Debian

```

## Usage

```bash

# Only transcription (DEFAULT)

yt-transcriber transcribe --url "https://www.youtube.com/watch?v=VIDEO_ID"

# Transcription + AI summaries (EN + ES)

yt-transcriber transcribe --url "URL" --summarize

# Transcription + summaries + Post Kits (LinkedIn + Twitter)

yt-transcriber transcribe --url "URL" --post-kits

# Force Spanish transcription

yt-transcriber transcribe --url "URL" --language es

# Local file

yt-transcriber transcribe --url "path/to/video.mp4" --summarize

# Emit timestamped segments sidecar JSON

yt-transcriber transcribe --url "URL_OR_LOCAL_FILE" --segments

# Extract visual evidence frames (implies --segments, local files only in V1)

yt-transcriber transcribe --url "path/to/video.mp4" --visual-evidence

```

### CLI Options

| Option | Short | Description |

|--------|-------|-------------|

| `--url` | `-u` | YouTube URL, Google Drive URL, or local file path |

| `--language` | `-l` | Language code (`en`, `es`) - auto-detect if omitted |

| `--summarize` | | Generate AI summaries (EN + ES) |

| `--post-kits` | | Generate LinkedIn + Twitter content (implies --summarize) |

| `--segments` / `--no-segments` | | Enable/disable `_segments.json` sidecar (CLI override; env fallback when omitted) |

| `--visual-evidence` / `--no-visual-evidence` | | Enable/disable frame extraction per segment (local files only in V1; visual implies segments) |

| `--ffmpeg-location` | | Custom FFmpeg path |

## Output

Files are saved to `output/`:

```

output/

├── transcripts/

│   ├── {title}_vid_{id}.txt             # Raw transcription (always)

│   ├── {title}_vid_{id}_segments.json   # Optional: segments sidecar

│   └── {title}_vid_{id}_frame_{idx}.jpg # Optional: visual evidence frames

└── summaries/

    ├── {title}_vid_{id}_summary_EN.md  # English summary

    ├── {title}_vid_{id}_summary_ES.md  # Spanish summary

    └── {title}_vid_{id}_post_kits.md   # LinkedIn + Twitter content

```

### AI Summary Contents

- Executive summary

- Key points (5-7 bullets)

- Timestamps (5-8 important moments)

- Action items

### Post Kits Contents

- **LinkedIn post** (800-1200 chars): Professional hook, insights, CTA, hashtags

- **Twitter thread** (8-12 tweets): Numbered tweets, max 280 chars each

## Configuration

Create `.env` from template:

```bash

# Whisper Model

WHISPER_MODEL_NAME=base    # tiny, base, small, medium, large

WHISPER_DEVICE=cuda        # cuda or cpu

# Claude CLI (required for summaries, translations, post kits)

# Ensure `claude` is in PATH with active subscription (Max/Pro)

# CLAUDE_CLI_PATH=claude

# CLAUDE_CLI_TIMEOUT=180

# DEFAULT_LLM_MODEL=sonnet

# Directories

TEMP_DOWNLOAD_DIR=temp_files/

OUTPUT_TRANSCRIPTS_DIR=output/transcripts/

SUMMARY_OUTPUT_DIR=output/summaries/

# Optional transcript segments sidecar (default off)

TRANSCRIPT_SEGMENTS_ENABLED=false

# Optional visual evidence extraction (default off, local files only in V1)

VISUAL_EVIDENCE_ENABLED=false

VISUAL_EVIDENCE_MIN_SEGMENT_SECONDS=1.0

# Logging

LOG_LEVEL=INFO

```

### Model Selection

| Model | Speed | Accuracy | VRAM | Use Case |

|-------|-------|----------|------|----------|

| `tiny` | Fast | Low | ~1GB | Quick drafts |

| `base` | Good | Medium | ~1GB | **Default - Balanced** |

| `small` | Medium | Good | ~2GB | Better quality |

| `medium` | Slow | High | ~5GB | High accuracy |

| `large` | Slowest | Best | ~10GB | Best quality |

## Programmatic Usage

```python

from yt_transcriber.cli import run_transcribe_command

# Only transcription (default)

transcript, _, _, _ = run_transcribe_command(url="path/to/video.mp4")

# With summaries

transcript, summary_en, summary_es, _ = run_transcribe_command(

    url="https://www.youtube.com/watch?v=VIDEO_ID",

    generate_summary=True,

)

# With post kits (implies summary)

transcript, summary_en, summary_es, post_kits = run_transcribe_command(

    url="https://www.youtube.com/watch?v=VIDEO_ID",

    generate_post_kits=True,

)

```

## Performance & Resource Management

The application includes automatic optimizations for production use:

**Memory Management:**

- Whisper model auto-loads/unloads per video (prevents memory leaks)

- Automatic garbage collection after each transcription

**Resource Cleanup:**

- Temp files auto-deleted after processing (even on errors)

- No manual cleanup needed

**FFmpeg:**

- Timeout protection (5min) prevents hung processes

## Troubleshooting

**FFmpeg not found:**

```bash

# Use direct path

yt-transcriber transcribe --url "URL" --ffmpeg-location "C:\ffmpeg\bin\ffmpeg.exe"

```

**CUDA not available:**

```bash

# Check installation

python -c "import torch; print(torch.cuda.is_available())"

# Fall back to CPU in .env

WHISPER_DEVICE=cpu

```

**Out of memory:**

```bash

# Use smaller model in .env

WHISPER_MODEL_NAME=tiny

```

## License

MIT License - see [LICENSE](LICENSE)

## Acknowledgments

- [OpenAI Whisper](https://github.com/openai/whisper) - AI transcription

- [yt-dlp](https://github.com/yt-dlp/yt-dlp) - Video downloading

- [Claude Code](https://docs.anthropic.com/en/docs/claude-code) - AI summarization & translation

---

Made with care by [Sergio Marquez](https://github.com/sergiomarquezdev)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sergiomarquezdev/yt-transcriber

Awesome Lists containing this project

README