https://github.com/app-vox/vox
Your privacy-first voice-to-text tool. Local Whisper transcription with optional LLM correction so your audio never leaves your computer.
https://github.com/app-vox/vox
dictation electron llm macos menu-bar-app privacy productivity react speech-recognition typescript voice-to-text whisper
Last synced: 3 months ago
JSON representation
Your privacy-first voice-to-text tool. Local Whisper transcription with optional LLM correction so your audio never leaves your computer.
- Host: GitHub
- URL: https://github.com/app-vox/vox
- Owner: app-vox
- License: mit
- Created: 2026-02-08T17:36:21.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-02-15T20:31:29.000Z (4 months ago)
- Last Synced: 2026-02-15T20:47:19.071Z (4 months ago)
- Topics: dictation, electron, llm, macos, menu-bar-app, privacy, productivity, react, speech-recognition, typescript, voice-to-text, whisper
- Language: TypeScript
- Homepage: https://app-vox.github.io/vox
- Size: 13.1 MB
- Stars: 12
- Watchers: 1
- Forks: 2
- Open Issues: 47
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS
- Security: SECURITY.md
Awesome Lists containing this project
README
# Vox
Open-source voice-to-text app with local Whisper transcription and AI-powered correction.
[](https://usevox.app/)
[](https://github.com/app-vox/vox/actions/workflows/ci.yml)
[](https://github.com/app-vox/vox/actions/workflows/release.yml)
[](https://codecov.io/gh/app-vox/vox)
[](LICENSE)
[](https://www.apple.com/macos/)
[](https://buymeacoffee.com/rodrigoluizs)
Hold a keyboard shortcut, speak, and Vox transcribes your voice locally using [whisper.cpp](https://github.com/ggerganov/whisper.cpp), optionally corrects it with AI, and pastes the text into your active app.
## Demo

> ⚠️ **Platform Support**
> Vox currently runs on **macOS** (Apple Silicon and Intel). Cross-platform support for Windows and Linux is planned for future releases.
## Table of Contents
- [Quick Start](#quick-start)
- [Features](#features)
- [Use Cases](#use-cases)
- [How Vox Compares](#how-vox-compares)
- [Requirements](#requirements)
- [Configuration](#configuration)
- [Usage](#usage)
- [FAQ](#faq)
- [Development](#development)
- [Contributing](#contributing)
- [License](#license)
## Quick Start
Download the latest version from the [releases page](https://github.com/app-vox/vox/releases/latest) and drag `Vox.app` to your Applications folder.
## First Launch
When you first launch Vox, you'll need to:
1. **Download a Whisper Model** — Go to Settings > Local Model and download at least one speech recognition model. The "small" model (Recommended) is a good starting point.
2. **Grant Permissions** — Vox needs:
- **Microphone**: Required for voice recording
- **Accessibility**: Required for keyboard shortcuts and auto-paste
3. **Configure Shortcuts** (optional) — Customize keyboard shortcuts in Settings > Shortcuts
4. **Enable AI Improvements** (optional) — Configure LLM provider in Settings > AI Improvements
Vox will guide you through this setup process with visual indicators showing what's incomplete.
Once configured, hold `Alt+Space` to start recording.
## Features
- **🔒 100% Local transcription** — Powered by whisper.cpp, audio stays on your device
- **🤖 AI correction** — Removes filler words and fixes grammar (optional)
- **⚙️ Custom prompts** — Tailor corrections for medical, technical, creative, or any workflow
- **⌨️ Hold or toggle modes** — Press-and-hold or toggle recording on/off
- **📋 Auto-paste** — Text appears in your focused app via Cmd+V
- **🎯 Multiple models** — Choose speed vs accuracy (tiny to large)
- **☁️ Multiple LLM providers** — OpenAI-compatible or AWS Bedrock
- **🎨 Menu bar app** — Runs quietly in the background with dark/light mode support
## Use Cases
### 👨⚕️ Medical Professionals
Preserve medical terminology and standard abbreviations. Vox understands context and won't autocorrect "OA" to "okay" or "PT" to "patient."
**Example custom prompt:**
> "Preserve medical terminology, standard abbreviations (e.g., OA, PT, BP), and format as clinical notes."
### 👨💻 Developers & Engineers
Format technical dictation as concise documentation. Remove filler words while keeping technical terms intact.
**Example custom prompt:**
> "Format as technical documentation. Be concise, remove filler words, preserve code terms and abbreviations."
### ✍️ Writers & Content Creators
Enhance prose while maintaining your unique voice. Turn spoken ideas into polished text ready for editing.
**Example custom prompt:**
> "Enhance prose for readability while maintaining the author's voice. Fix grammar but keep the casual tone."
### 🌍 Language Learners
Practice speaking by translating and correcting your speech in real-time.
**Example custom prompt:**
> "Translate to German and correct grammar. Output only the German translation."
### 📝 Note-Taking & Productivity
Capture thoughts quickly without typing. Perfect for meetings, brainstorming, or journaling.
## How Vox Compares
| Feature | Vox | Dragon NaturallySpeaking | macOS Dictation | Whisper Desktop Apps |
|---------|-----|--------------------------|-----------------|---------------------|
| **Price** | Free & Open Source | $300+ | Free (limited) | Varies ($0-50) |
| **Privacy** | 100% Local | Cloud-based | Cloud-based | Mostly local |
| **Custom Prompts** | ✅ Full control | ❌ Limited | ❌ None | ⚠️ Some apps |
| **AI Enhancement** | ✅ Your own API | ❌ None | ⚠️ Basic | ⚠️ Varies |
| **Offline Mode** | ✅ Full | ⚠️ Limited | ❌ Requires internet | ✅ Most |
| **macOS Native** | ✅ Menu bar app | ⚠️ Full app | ✅ Built-in | ✅ Varies |
| **Custom Shortcuts** | ✅ Configurable | ✅ Yes | ⚠️ Limited | ✅ Most |
| **Open Source** | ✅ FSL-1.1-ALv2 | ❌ Proprietary | ❌ Proprietary | ⚠️ Some |
**Why Vox?**
- **Privacy-first**: Your audio never leaves your device
- **Flexibility**: Use any OpenAI-compatible LLM or AWS Bedrock
- **Customization**: Tailor AI corrections to your exact needs
- **Free & Open**: No subscription, no cloud lock-in
## Requirements
- **macOS** (Apple Silicon or Intel)
- **LLM provider** (optional) — for text correction:
- OpenAI-compatible endpoint with API key
- Or AWS Bedrock credentials with model access
## Configuration
### Whisper Models
Download at least one model from the Whisper tab:
| Model | Size | Speed | Accuracy |
|--------|---------|--------|----------|
| tiny | ~75 MB | Fastest| Lower |
| base | ~140 MB | Fast | Decent |
| small | ~460 MB | Good | Good |
| medium | ~1.5 GB | Slow | Better |
| large | ~3 GB | Slowest| Best |
### LLM Provider
**Foundry (OpenAI-compatible)**
- Endpoint URL
- API key
- Model name (e.g., `gpt-4o`)
**AWS Bedrock**
- AWS region
- Credentials (access key, profile, or default chain)
- Model ID (e.g., `anthropic.claude-3-5-sonnet-20241022-v2:0`)
### Shortcuts
Customize keyboard shortcuts in the Shortcuts tab:
- **Hold mode** (default: `Alt+Space`)
- **Toggle mode** (default: `Alt+Shift+Space`)
## Usage
Once configured, Vox runs as a menu bar icon.
Press your shortcut to record. The floating indicator shows:
- **Red** — Recording
- **Yellow** — Transcribing
- **Blue** — Correcting (if LLM enabled)
Release (hold mode) or press again (toggle mode) to stop. Text is pasted automatically.
If correction fails, raw transcription is used. If transcription is empty (silence/noise), nothing is pasted.
## Development
### Setup
> Requires [cmake](https://cmake.org/download).
```bash
git clone https://github.com/app-vox/vox.git
cd vox
make install # installs npm deps + builds whisper.cpp
```
### Run
```bash
make dev # development with hot reload
npm test # run tests
npm run dist # build production app
```
Built with Electron, React, TypeScript, and whisper.cpp.
## Contributing
Contributions welcome! To contribute:
1. Fork and create a feature branch
2. Make your changes
3. Run `npm run typecheck && npm run lint && npm test`
4. Commit with [Conventional Commits](https://www.conventionalcommits.org/) (e.g., `feat(audio): add noise gate`)
5. Open a pull request
⚠️ See more details in [CONTRIBUTING.md](CONTRIBUTING.md).
## FAQ
### Is Vox really free?
Yes, Vox is 100% free and open-source. Transcription runs locally using Whisper.cpp. If you use optional AI enhancement, you'll need your own API keys (OpenAI-compatible or AWS Bedrock), but there are no fees from Vox.
### Does my audio leave my device?
No. Transcription happens entirely on your Mac. Only if you enable AI enhancement does the *text* (not audio) get sent to your configured LLM provider for correction. Your audio recordings never leave your device.
### What's the difference between local transcription and AI enhancement?
- **Local transcription**: Whisper.cpp converts your speech to text on your device. Fast, accurate, 100% private.
- **AI enhancement** (optional): Sends the transcribed *text* to an LLM to remove filler words ("um", "uh"), fix grammar, or apply custom corrections based on your prompt.
### Which Whisper model should I use?
- **Small** (~460MB): Best balance of speed and accuracy. Recommended for most users.
- **Tiny/Base**: Faster but less accurate. Good for quick notes.
- **Medium/Large**: Slower but more accurate. Good for technical/medical content or noisy environments.
You can switch models anytime in Settings.
### Can I use Vox with Claude/ChatGPT/other LLMs?
Yes! Vox works with:
- **OpenAI-compatible APIs**: OpenAI, Anthropic (via Bedrock), OpenRouter, local LLMs with OpenAI-compatible endpoints
- **AWS Bedrock**: Claude, Llama, Mistral, and other Bedrock models
### Does Vox work offline?
Yes. Local transcription works 100% offline. AI enhancement requires internet (since it calls your LLM provider API), but you can disable it and use raw transcription offline.
### Why does Vox need Accessibility permissions?
Vox needs Accessibility access to:
1. Listen for your custom keyboard shortcuts globally
2. Simulate `Cmd+V` to paste transcribed text into your active app
Without this, Vox can't detect shortcuts or auto-paste text.
### Can I contribute to Vox?
Absolutely! Vox is open-source. See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. We welcome bug reports, feature requests, and pull requests.
### What about Windows and Linux?
macOS-only for now, but cross-platform support is planned. Follow the repo for updates!
## License
This project is licensed under the [Functional Source License, Version 1.1, ALv2 Future License](LICENSE).
You can use, modify, and redistribute the code for any purpose **except** building a competing commercial product or service. After two years, each release automatically converts to the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
See [LICENSE](LICENSE) for full details.