An open API service indexing awesome lists of open source software.

https://github.com/redatman/transcriber

A fast video transcription CLI tool powered by whisper.cpp. Extracts audio from video files and transcribes it to text with GPU acceleration.
https://github.com/redatman/transcriber

Last synced: 5 days ago
JSON representation

A fast video transcription CLI tool powered by whisper.cpp. Extracts audio from video files and transcribes it to text with GPU acceleration.

Awesome Lists containing this project

README

          

# transcriber

A fast video transcription CLI tool powered by whisper.cpp. Extracts audio from video files and transcribes it to text with GPU acceleration.

## Features

- **Video → Audio → Text**: Automatic audio extraction via FFmpeg
- **GPU Accelerated**: Metal (macOS M*), Vulkan (NVIDIA), CUDA, or CPU fallback
- **Multiple Output Formats**: TXT, SRT (subtitle), JSON (with word-level timestamps), MD (Markdown with YAML front matter)
- **Batch Processing**: Transcribe entire directories recursively
- **Model Quantization**: Q4_K/Q5_K/Q6_K/Q8_0 for speed/size tradeoffs
- **Configurable**: YAML configuration file with CLI argument overrides
- **Progress Reporting**: Real-time progress bars for downloads and transcription

## Installation

### Prerequisites

- **FFmpeg**: Required for audio extraction
- macOS: `brew install ffmpeg`
- Ubuntu: `sudo apt install ffmpeg`
- Windows: Download from https://ffmpeg.org/

### Homebrew (macOS / Linux)

```bash
brew tap RedAtman/tap
brew install transcriber
```

> Taps into [RedAtman/homebrew-tap](https://github.com/RedAtman/homebrew-tap). Formula is automatically updated on each release.

### From Source

```bash
git clone
cd transcriber
cargo build --release
```

The binary is at `./target/release/transcriber`.

## Usage

### Single File Transcription

```bash
# Transcribe with default settings (base model, txt output)
transcriber -i video.mp4

# Specify model and language
transcriber -i video.mp4 -m medium -l zh

# Custom output directory and SRT format
transcriber -i video.mp4 -o ./subtitles --format srt
```

### Batch Processing

```bash
# Transcribe all videos in a directory
transcriber -d ./videos

# Skip already-transcribed files
transcriber -d ./videos --skip-existing

# Multiple output formats
transcriber -d ./videos --format "txt,srt,json"
```

### Inference Parameters

```bash
# Initial prompt for decoder context
transcriber -i video.mp4 --initial-prompt "technical terms"

# Sampling temperature (0.0 = deterministic, 1.0 = more random)
transcriber -i video.mp4 --temperature 0.2

# Suppress non-speech tokens
transcriber -i video.mp4 --suppress-non-speech

# No-speech detection threshold
transcriber -i video.mp4 --no-speech-threshold 0.5

# Split on word boundaries
transcriber -i video.mp4 --split-on-word

# Combined example
transcriber -i video.mp4 -m medium -l en --initial-prompt "technology" --temperature 0.3
transcriber -i video.mp4 -m medium -l zh --initial-prompt "科技" --temperature 0.3
```

### Media Metadata

```bash
# Add custom metadata to the output (stored in Markdown YAML front matter)
transcriber -i video.mp4 --format md --meta title="My Talk" --meta location=Beijing

# Media metadata is auto-detected via ffprobe when using md format:
# - source: filename, size, format, bitrate, duration
# - video: codec, resolution, FPS
# - audio: codec, sample rate, channels
```

### Configuration

```bash
# Generate default config file
transcriber init

# Use custom config
transcriber -i video.mp4 --config ./my-config.yaml
```

Default config location: `~/.config/transcriber/config.yaml`

### Available Models

| Model | Size | Description |
|-------|------|-------------|
| tiny | 75 MB | Fastest, lowest quality |
| base | 148 MB | Default, balanced |
| small | 488 MB | Better quality |
| medium | 1.5 GB | High quality |
| large-v3-turbo | 800 MB | Best quality/speed ratio |

## Output Formats

- **TXT**: Plain text, one segment per line
- **SRT**: SubRip subtitle format with timestamps
- **JSON**: Structured data with word-level timing and metadata
- **MD**: Markdown with YAML front matter — includes auto-detected media metadata (source, video, audio info), custom metadata, and plain text body (no timestamps)

## License

MIT