https://github.com/app-vox/vox

Your privacy-first voice-to-text tool. Local Whisper transcription with optional LLM correction so your audio never leaves your computer.
https://github.com/app-vox/vox

dictation electron llm macos menu-bar-app privacy productivity react speech-recognition typescript voice-to-text whisper

Last synced: 3 months ago
JSON representation

Your privacy-first voice-to-text tool. Local Whisper transcription with optional LLM correction so your audio never leaves your computer.

Host: GitHub
URL: https://github.com/app-vox/vox
Owner: app-vox
License: mit
Created: 2026-02-08T17:36:21.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-02-15T20:31:29.000Z (4 months ago)
Last Synced: 2026-02-15T20:47:19.071Z (4 months ago)
Topics: dictation, electron, llm, macos, menu-bar-app, privacy, productivity, react, speech-recognition, typescript, voice-to-text, whisper
Language: TypeScript
Homepage: https://app-vox.github.io/vox
Size: 13.1 MB
Stars: 12
Watchers: 1
Forks: 2
Open Issues: 47
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS
- Security: SECURITY.md

Awesome Lists containing this project

README

          # Vox

Open-source voice-to-text app with local Whisper transcription and AI-powered correction.

[![Website](https://img.shields.io/badge/Website-usevox.app-6366f1?style=flat&logo=safari&logoColor=white)](https://usevox.app/)

[![CI](https://github.com/app-vox/vox/actions/workflows/ci.yml/badge.svg)](https://github.com/app-vox/vox/actions/workflows/ci.yml)

[![Release](https://github.com/app-vox/vox/actions/workflows/release.yml/badge.svg)](https://github.com/app-vox/vox/actions/workflows/release.yml)

[![codecov](https://codecov.io/gh/app-vox/vox/graph/badge.svg)](https://codecov.io/gh/app-vox/vox)

[![License: FSL-1.1-ALv2](https://img.shields.io/badge/License-FSL--1.1--ALv2-blue.svg)](LICENSE)

[![macOS](https://img.shields.io/badge/macOS-12.0+-000000?logo=apple&logoColor=white)](https://www.apple.com/macos/)

[![Buy Me a Coffee](https://img.shields.io/badge/Buy%20Me%20a%20Coffee-ffdd00?logo=buy-me-a-coffee&logoColor=black)](https://buymeacoffee.com/rodrigoluizs)

Hold a keyboard shortcut, speak, and Vox transcribes your voice locally using [whisper.cpp](https://github.com/ggerganov/whisper.cpp), optionally corrects it with AI, and pastes the text into your active app.

## Demo



![Vox Demo](docs/images/demo.gif)



> ⚠️ **Platform Support**

> Vox currently runs on **macOS** (Apple Silicon and Intel). Cross-platform support for Windows and Linux is planned for future releases.

## Table of Contents

- [Quick Start](#quick-start)

- [Features](#features)

- [Use Cases](#use-cases)

- [How Vox Compares](#how-vox-compares)

- [Requirements](#requirements)

- [Configuration](#configuration)

- [Usage](#usage)

- [FAQ](#faq)

- [Development](#development)

- [Contributing](#contributing)

- [License](#license)

## Quick Start

Download the latest version from the [releases page](https://github.com/app-vox/vox/releases/latest) and drag `Vox.app` to your Applications folder.

## First Launch

When you first launch Vox, you'll need to:

1. **Download a Whisper Model** — Go to Settings > Local Model and download at least one speech recognition model. The "small" model (Recommended) is a good starting point.

2. **Grant Permissions** — Vox needs:

   - **Microphone**: Required for voice recording

   - **Accessibility**: Required for keyboard shortcuts and auto-paste

3. **Configure Shortcuts** (optional) — Customize keyboard shortcuts in Settings > Shortcuts

4. **Enable AI Improvements** (optional) — Configure LLM provider in Settings > AI Improvements

Vox will guide you through this setup process with visual indicators showing what's incomplete.

Once configured, hold `Alt+Space` to start recording.

## Features

- **🔒 100% Local transcription** — Powered by whisper.cpp, audio stays on your device

- **🤖 AI correction** — Removes filler words and fixes grammar (optional)

- **⚙️ Custom prompts** — Tailor corrections for medical, technical, creative, or any workflow

- **⌨️ Hold or toggle modes** — Press-and-hold or toggle recording on/off

- **📋 Auto-paste** — Text appears in your focused app via Cmd+V

- **🎯 Multiple models** — Choose speed vs accuracy (tiny to large)

- **☁️ Multiple LLM providers** — OpenAI-compatible or AWS Bedrock

- **🎨 Menu bar app** — Runs quietly in the background with dark/light mode support

## Use Cases

### 👨‍⚕️ Medical Professionals

Preserve medical terminology and standard abbreviations. Vox understands context and won't autocorrect "OA" to "okay" or "PT" to "patient."

**Example custom prompt:**

> "Preserve medical terminology, standard abbreviations (e.g., OA, PT, BP), and format as clinical notes."

### 👨‍💻 Developers & Engineers

Format technical dictation as concise documentation. Remove filler words while keeping technical terms intact.

**Example custom prompt:**

> "Format as technical documentation. Be concise, remove filler words, preserve code terms and abbreviations."

### ✍️ Writers & Content Creators

Enhance prose while maintaining your unique voice. Turn spoken ideas into polished text ready for editing.

**Example custom prompt:**

> "Enhance prose for readability while maintaining the author's voice. Fix grammar but keep the casual tone."

### 🌍 Language Learners

Practice speaking by translating and correcting your speech in real-time.

**Example custom prompt:**

> "Translate to German and correct grammar. Output only the German translation."

### 📝 Note-Taking & Productivity

Capture thoughts quickly without typing. Perfect for meetings, brainstorming, or journaling.

## How Vox Compares

| Feature | Vox | Dragon NaturallySpeaking | macOS Dictation | Whisper Desktop Apps |

|---------|-----|--------------------------|-----------------|---------------------|

| **Price** | Free & Open Source | $300+ | Free (limited) | Varies ($0-50) |

| **Privacy** | 100% Local | Cloud-based | Cloud-based | Mostly local |

| **Custom Prompts** | ✅ Full control | ❌ Limited | ❌ None | ⚠️ Some apps |

| **AI Enhancement** | ✅ Your own API | ❌ None | ⚠️ Basic | ⚠️ Varies |

| **Offline Mode** | ✅ Full | ⚠️ Limited | ❌ Requires internet | ✅ Most |

| **macOS Native** | ✅ Menu bar app | ⚠️ Full app | ✅ Built-in | ✅ Varies |

| **Custom Shortcuts** | ✅ Configurable | ✅ Yes | ⚠️ Limited | ✅ Most |

| **Open Source** | ✅ FSL-1.1-ALv2 | ❌ Proprietary | ❌ Proprietary | ⚠️ Some |

**Why Vox?**

- **Privacy-first**: Your audio never leaves your device

- **Flexibility**: Use any OpenAI-compatible LLM or AWS Bedrock

- **Customization**: Tailor AI corrections to your exact needs

- **Free & Open**: No subscription, no cloud lock-in

## Requirements

- **macOS** (Apple Silicon or Intel)

- **LLM provider** (optional) — for text correction:

  - OpenAI-compatible endpoint with API key

  - Or AWS Bedrock credentials with model access

## Configuration

### Whisper Models

Download at least one model from the Whisper tab:

| Model  | Size    | Speed  | Accuracy |

|--------|---------|--------|----------|

| tiny   | ~75 MB  | Fastest| Lower    |

| base   | ~140 MB | Fast   | Decent   |

| small  | ~460 MB | Good   | Good     |

| medium | ~1.5 GB | Slow   | Better   |

| large  | ~3 GB   | Slowest| Best     |

### LLM Provider

**Foundry (OpenAI-compatible)**

- Endpoint URL

- API key

- Model name (e.g., `gpt-4o`)

**AWS Bedrock**

- AWS region

- Credentials (access key, profile, or default chain)

- Model ID (e.g., `anthropic.claude-3-5-sonnet-20241022-v2:0`)

### Shortcuts

Customize keyboard shortcuts in the Shortcuts tab:

- **Hold mode** (default: `Alt+Space`)

- **Toggle mode** (default: `Alt+Shift+Space`)

## Usage

Once configured, Vox runs as a menu bar icon.

Press your shortcut to record. The floating indicator shows:

- **Red** — Recording

- **Yellow** — Transcribing

- **Blue** — Correcting (if LLM enabled)

Release (hold mode) or press again (toggle mode) to stop. Text is pasted automatically.

If correction fails, raw transcription is used. If transcription is empty (silence/noise), nothing is pasted.

## Development

### Setup

> Requires [cmake](https://cmake.org/download).

```bash

git clone https://github.com/app-vox/vox.git

cd vox

make install   # installs npm deps + builds whisper.cpp

```

### Run

```bash

make dev        # development with hot reload

npm test        # run tests

npm run dist    # build production app

```

Built with Electron, React, TypeScript, and whisper.cpp.

## Contributing

Contributions welcome! To contribute:

1. Fork and create a feature branch

2. Make your changes

3. Run `npm run typecheck && npm run lint && npm test`

4. Commit with [Conventional Commits](https://www.conventionalcommits.org/) (e.g., `feat(audio): add noise gate`)

5. Open a pull request

⚠️ See more details in [CONTRIBUTING.md](CONTRIBUTING.md).

## FAQ

### Is Vox really free?

Yes, Vox is 100% free and open-source. Transcription runs locally using Whisper.cpp. If you use optional AI enhancement, you'll need your own API keys (OpenAI-compatible or AWS Bedrock), but there are no fees from Vox.

### Does my audio leave my device?

No. Transcription happens entirely on your Mac. Only if you enable AI enhancement does the *text* (not audio) get sent to your configured LLM provider for correction. Your audio recordings never leave your device.

### What's the difference between local transcription and AI enhancement?

- **Local transcription**: Whisper.cpp converts your speech to text on your device. Fast, accurate, 100% private.

- **AI enhancement** (optional): Sends the transcribed *text* to an LLM to remove filler words ("um", "uh"), fix grammar, or apply custom corrections based on your prompt.

### Which Whisper model should I use?

- **Small** (~460MB): Best balance of speed and accuracy. Recommended for most users.

- **Tiny/Base**: Faster but less accurate. Good for quick notes.

- **Medium/Large**: Slower but more accurate. Good for technical/medical content or noisy environments.

You can switch models anytime in Settings.

### Can I use Vox with Claude/ChatGPT/other LLMs?

Yes! Vox works with:

- **OpenAI-compatible APIs**: OpenAI, Anthropic (via Bedrock), OpenRouter, local LLMs with OpenAI-compatible endpoints

- **AWS Bedrock**: Claude, Llama, Mistral, and other Bedrock models

### Does Vox work offline?

Yes. Local transcription works 100% offline. AI enhancement requires internet (since it calls your LLM provider API), but you can disable it and use raw transcription offline.

### Why does Vox need Accessibility permissions?

Vox needs Accessibility access to:

1. Listen for your custom keyboard shortcuts globally

2. Simulate `Cmd+V` to paste transcribed text into your active app

Without this, Vox can't detect shortcuts or auto-paste text.

### Can I contribute to Vox?

Absolutely! Vox is open-source. See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. We welcome bug reports, feature requests, and pull requests.

### What about Windows and Linux?

macOS-only for now, but cross-platform support is planned. Follow the repo for updates!

## License

This project is licensed under the [Functional Source License, Version 1.1, ALv2 Future License](LICENSE).

You can use, modify, and redistribute the code for any purpose **except** building a competing commercial product or service. After two years, each release automatically converts to the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).

See [LICENSE](LICENSE) for full details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/app-vox/vox

Awesome Lists containing this project

README