https://github.com/redatman/transcriber
A fast video transcription CLI tool powered by whisper.cpp. Extracts audio from video files and transcribes it to text with GPU acceleration.
https://github.com/redatman/transcriber
Last synced: 5 days ago
JSON representation
A fast video transcription CLI tool powered by whisper.cpp. Extracts audio from video files and transcribes it to text with GPU acceleration.
- Host: GitHub
- URL: https://github.com/redatman/transcriber
- Owner: RedAtman
- License: mit
- Created: 2026-05-01T14:01:25.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-05-01T16:00:28.000Z (about 1 month ago)
- Last Synced: 2026-05-01T16:10:46.476Z (about 1 month ago)
- Language: Rust
- Size: 41 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# transcriber
A fast video transcription CLI tool powered by whisper.cpp. Extracts audio from video files and transcribes it to text with GPU acceleration.
## Features
- **Video → Audio → Text**: Automatic audio extraction via FFmpeg
- **GPU Accelerated**: Metal (macOS M*), Vulkan (NVIDIA), CUDA, or CPU fallback
- **Multiple Output Formats**: TXT, SRT (subtitle), JSON (with word-level timestamps), MD (Markdown with YAML front matter)
- **Batch Processing**: Transcribe entire directories recursively
- **Model Quantization**: Q4_K/Q5_K/Q6_K/Q8_0 for speed/size tradeoffs
- **Configurable**: YAML configuration file with CLI argument overrides
- **Progress Reporting**: Real-time progress bars for downloads and transcription
## Installation
### Prerequisites
- **FFmpeg**: Required for audio extraction
- macOS: `brew install ffmpeg`
- Ubuntu: `sudo apt install ffmpeg`
- Windows: Download from https://ffmpeg.org/
### Homebrew (macOS / Linux)
```bash
brew tap RedAtman/tap
brew install transcriber
```
> Taps into [RedAtman/homebrew-tap](https://github.com/RedAtman/homebrew-tap). Formula is automatically updated on each release.
### From Source
```bash
git clone
cd transcriber
cargo build --release
```
The binary is at `./target/release/transcriber`.
## Usage
### Single File Transcription
```bash
# Transcribe with default settings (base model, txt output)
transcriber -i video.mp4
# Specify model and language
transcriber -i video.mp4 -m medium -l zh
# Custom output directory and SRT format
transcriber -i video.mp4 -o ./subtitles --format srt
```
### Batch Processing
```bash
# Transcribe all videos in a directory
transcriber -d ./videos
# Skip already-transcribed files
transcriber -d ./videos --skip-existing
# Multiple output formats
transcriber -d ./videos --format "txt,srt,json"
```
### Inference Parameters
```bash
# Initial prompt for decoder context
transcriber -i video.mp4 --initial-prompt "technical terms"
# Sampling temperature (0.0 = deterministic, 1.0 = more random)
transcriber -i video.mp4 --temperature 0.2
# Suppress non-speech tokens
transcriber -i video.mp4 --suppress-non-speech
# No-speech detection threshold
transcriber -i video.mp4 --no-speech-threshold 0.5
# Split on word boundaries
transcriber -i video.mp4 --split-on-word
# Combined example
transcriber -i video.mp4 -m medium -l en --initial-prompt "technology" --temperature 0.3
transcriber -i video.mp4 -m medium -l zh --initial-prompt "科技" --temperature 0.3
```
### Media Metadata
```bash
# Add custom metadata to the output (stored in Markdown YAML front matter)
transcriber -i video.mp4 --format md --meta title="My Talk" --meta location=Beijing
# Media metadata is auto-detected via ffprobe when using md format:
# - source: filename, size, format, bitrate, duration
# - video: codec, resolution, FPS
# - audio: codec, sample rate, channels
```
### Configuration
```bash
# Generate default config file
transcriber init
# Use custom config
transcriber -i video.mp4 --config ./my-config.yaml
```
Default config location: `~/.config/transcriber/config.yaml`
### Available Models
| Model | Size | Description |
|-------|------|-------------|
| tiny | 75 MB | Fastest, lowest quality |
| base | 148 MB | Default, balanced |
| small | 488 MB | Better quality |
| medium | 1.5 GB | High quality |
| large-v3-turbo | 800 MB | Best quality/speed ratio |
## Output Formats
- **TXT**: Plain text, one segment per line
- **SRT**: SubRip subtitle format with timestamps
- **JSON**: Structured data with word-level timing and metadata
- **MD**: Markdown with YAML front matter — includes auto-detected media metadata (source, video, audio info), custom metadata, and plain text body (no timestamps)
## License
MIT