https://github.com/nnanto/transcriber
Audio transcription tool powered by whisper.cpp, designed for real-time transcription. NO API/CLOUD
https://github.com/nnanto/transcriber
audio-transcription whisper-cpp
Last synced: 5 months ago
JSON representation
Audio transcription tool powered by whisper.cpp, designed for real-time transcription. NO API/CLOUD
- Host: GitHub
- URL: https://github.com/nnanto/transcriber
- Owner: nnanto
- License: mit
- Created: 2025-05-29T02:54:44.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-05-29T05:49:45.000Z (11 months ago)
- Last Synced: 2025-06-29T18:51:04.242Z (9 months ago)
- Topics: audio-transcription, whisper-cpp
- Language: Go
- Homepage:
- Size: 40 KB
- Stars: 3
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🎙️ Transcriber
> CLI for audio transcription tool powered by whisper.cpp, designed for real-time transcription.
[](https://github.com/nnanto/transcriber/stargazers)
[](https://golang.org)
[](LICENSE)
[]()
## ✨ Features
- 🎯 **Real-time transcription** - Record and transcribe audio on the fly
- 🚀 **Cross-platform** - Works on Linux, macOS, and Windows
- ⚙️ **Configurable** - Flexible configuration options
- 🔧 **Multiple models** - Support for various Whisper model sizes
- 💻 **CLI-friendly** - Easy-to-use command-line interface
## 🚀 Quick Start
### Prerequisites
- Install `whisper-cli`
- Mac OS: `brew install whisper-cpp`
- Other OS: Follow the [whisper.cpp installation guide](https://github.com/ggml-org/whisper.cpp?tab=readme-ov-file#quick-start)
- Install `ffmpeg` for audio recording
- Mac OS: `brew install ffmpeg`
- Linux: Use your package manager (e.g., `apt install ffmpeg`)
- Windows: Download from [FFmpeg official site](https://ffmpeg.org/download.html)
- Audio recording capabilities (microphone)
- At least 4GB RAM (recommended for larger models)
### Installation
#### Option 1: Download pre-built binary (Recommended)
Download the latest release for your platform from [GitHub Releases](https://github.com/nnanto/transcriber/releases):
**One-line MacOS installation**
```bash
sh -c "$(curl -fsSL https://raw.githubusercontent.com/nnanto/transcriber/main/scripts/install-macos.sh)"
```
**Linux/macOS:**
```bash
# Download and install
curl -L https://github.com/nnanto/transcriber/releases/download/latest/transcriber-linux-amd64.tar.gz | tar -xz
sudo mv transcriber-* /usr/local/bin/transcriber
# Or for macOS
curl -L https://github.com/nnanto/transcriber/releases/download/latest/transcriber-darwin-amd64.tar.gz | tar -xz
sudo mv transcriber-* /usr/local/bin/transcriber
# Make executable
chmod +x /usr/local/bin/transcriber
```
**Windows:**
1. Download `transcriber-windows-amd64.zip` from [releases page](https://github.com/nnanto/transcriber/releases)
2. Extract the ZIP file
3. Add the extracted folder to your PATH or move `transcriber.exe` to a folder in your PATH
**Verify installation:**
```bash
transcriber version
```
#### Option 2: Install from source
```bash
git clone https://github.com/nnanto/transcriber.git
cd transcriber
make install
```
#### Option 3: Build locally
```bash
git clone https://github.com/nnanto/transcriber.git
cd transcriber
make build
```
### First Run
1. **Install whisper-cli** if not already done.
See [prerequisites](#prerequisites)
2. **Download a Whisper model** (required on first use):
```bash
transcriber download-model
```
You can specify custom model using `--model` option. Available models are found in the [whisper.cpp HF models](https://huggingface.co/ggerganov/whisper.cpp/tree/main)
You can also specify a custom model path in the configuration file.
3. **Start transcribing**:
```bash
transcriber run --output ./transcriptions
```
4. **Check your configuration**:
```bash
transcriber config
```
## 📖 Usage Guide
### Commands Overview
| Command | Description | Example |
|---------|-------------|---------|
| `run` | Record and transcribe in real-time | `transcriber run --duration 2m` |
| `process` | Process existing audio files | `transcriber process --input ./audio` |
| `config` | Show current configuration | `transcriber config` |
| `download` | Download Whisper models | `transcriber download-model --model large` |
| `stop` | Stop all running processes | `transcriber stop` |
| `version` | Show version info | `transcriber version` |
### Real-time Transcription
Start recording and transcribing immediately:
```bash
# Record for 30 minutes (default)
transcriber run
# Record for specific duration
transcriber run --duration 5m --output ./my-transcriptions
# Custom configuration
transcriber run --config ./custom-config --duration 1h
```
### Model Management
Download and manage Whisper models:
```bash
# Download specific model
transcriber download-model --model ggml-large-v3-turbo-q5_0
```
Available models are found in the [whisper.cpp HF models](https://huggingface.co/ggerganov/whisper.cpp/tree/main)
## ⚙️ Configuration
The configuration file is automatically created at `~/.transcriber/config.json`:
```json
{
"model_path": "~/.transcriber/models/ggml-large-v3-turbo-q5_0.bin",
"language": "English",
"temp_dir": "/tmp/transcriber",
"output_format": "txt",
"whisper_cmd": "whisper-cli",
"recording_cmd": "ffmpeg",
"chunk_duration_in_secs": 30,
"min_required_unique_word_count": 5
}
```
### Configuration Options
- **model_path**: Path to the Whisper model file
- **language**: Language for transcription (e.g., "English", "Spanish", "auto")
- **temp_dir**: Directory for temporary audio files during processing
- **output_format**: Output format (`txt`, `json`)
- **whisper_cmd**: Command to use for Whisper transcription (default: "whisper-cli")
- **recording_cmd**: Command to use for audio recording (default: "ffmpeg")
- **chunk_duration_in_secs**: Duration in seconds for each audio chunk during real-time transcription (default: 30)
- **min_required_unique_word_count**: Minimum number of unique words required to process a chunk (default: 5)
## 🛠️ Development Guide
### Project Structure
```
transcriber/
├── cmd.go # CLI command handling
├── main.go # Application entry point
├── transcriber.go # Core transcription logic
├── config.go # Configuration management
├── audio.go # Audio recording/processing
├── models.go # Model download/management
├── Makefile # Build automation
└── README.md # This file
```
### Building from Source
```bash
# Clone the repository
git clone https://github.com/nnanto/transcriber.git
cd transcriber
# Install dependencies
go mod download
# Build for development (with race detection)
make dev
# Build for production
make build
# Run tests
make test
# Build for all platforms
make release-local
```
## 🐛 Troubleshooting
### Common Issues
#### "Model not found" error
```bash
# Download the required model first
transcriber download-model [--model ggml-base]
```
#### Permission denied on macOS/Linux
```bash
# Make sure the binary is executable
chmod +x transcriber
# Or install system-wide
make install
```
#### High CPU usage
- Try using a smaller model (`ggml-tiny` or `ggml-base`)
- Reduce recording quality in config
- Limit recording duration
#### Audio recording issues
- Check microphone permissions
- Verify audio device availability
- Test with shorter durations first
### Getting Help
- Check the [Issues](https://github.com/nnanto/transcriber/issues) page
- Review configuration with `transcriber config`
- Enable verbose logging in development builds
## 📋 System Requirements
### Minimum Requirements
- **OS**: Linux, macOS 10.14+, Windows 10+
- **RAM**: 2GB (4GB recommended)
- **Storage**: 1GB free space
- **Go**: 1.19+ (for building from source)
### Model Size Requirements
- **tiny**: ~39MB, ~125MB RAM
- **base**: ~142MB, ~210MB RAM
- **small**: ~466MB, ~550MB RAM
- **medium**: ~1.5GB, ~2GB RAM
- **large**: ~2.9GB, ~4GB RAM
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
---
Made with ❤️ by the Transcriber team