An open API service indexing awesome lists of open source software.

https://github.com/jstarfilms/koe

Lightning-fast, privacy-focused voice dictation for desktop and mobile. Electron + Expo, powered by Groq.
https://github.com/jstarfilms/koe

android desktop-app electron expo expo-router groq ios mobile-app offline-vad onnxruntime react react-native speech-to-text vibecoding voice-dictation whisper

Last synced: about 2 months ago
JSON representation

Lightning-fast, privacy-focused voice dictation for desktop and mobile. Electron + Expo, powered by Groq.

Awesome Lists containing this project

README

          


Koe Logo

Koe (声)


Lightning-Fast, Privacy-First Voice Dictation for Windows, iOS, and Android

[![Release](https://img.shields.io/github/v/release/JStaRFilms/Koe)](https://github.com/JStaRFilms/Koe/releases)
[![License](https://img.shields.io/badge/license-ISC-green.svg)](LICENSE)
[![Electron](https://img.shields.io/badge/Electron-40.6.1-47848F?logo=electron)](https://electronjs.org/)
[![Groq](https://img.shields.io/badge/Powered%20by-Groq-orange)](https://groq.com/)

---

## What is Koe?

**Koe** (声, Japanese for "voice") is a free, open-source alternative to subscription-based voice dictation tools. Press a hotkey (Desktop) or a button (Mobile), speak naturally, and get polished AI text typed at your cursor or copied to your clipboard.

Unlike cloud-based solutions that charge monthly fees, Koe uses your own [Groq API key](https://console.groq.com/keys) and stays free for up to 8 hours of transcription a day on Groq's free tier.

### Why Koe?

| Feature | WhisperFlow ($8+/mo) | Built-in OS Dictation | **Koe (Free)** |
|---------|---------------------|----------------------|----------------|
| Cost | Subscription | Free | **Free (BYOK)** |
| Accuracy | High | Poor | **High (Whisper)** |
| AI Enhancement | Yes | No | **Yes** |
| Privacy | Cloud audio | Local | **Local VAD + BYOK** |
| Global Hotkey | Yes | Limited | **Yes** |
| Auto-Paste | Yes | No | **Yes** |

---

- **Cross-Platform** — Native performance on Windows (Desktop) and iOS/Android (Mobile)
- **Global Hotkey (Desktop)** — Press `Ctrl + Shift + Space` anywhere to start or stop dictation
- **Clipboard-First (Mobile)** — High-fidelity audio capture with instant polished results copied to your clipboard
- **Pause Naturally** — Koe keeps listening through short pauses instead of treating every breath like the end of a recording
- **Rolling Segments** — Long recordings are processed in the background as ordered chunks, so performance stays fast even on longer sessions
- **Instant Transcription** — Groq Whisper handles speech-to-text at high speed
- **AI Text Enhancement** — Each segment is refined before it is committed, so only polished text is returned
- **Auto-Type (Desktop)** — Refined text is typed progressively into the focused text field while you are still talking
- **Minimalist UI** — A premium, high-contrast interface designed for focus and speed
- **Transcription History** — One-click copy and retry for saved transcripts
- **Usage Dashboard** — Track daily audio seconds, request pressure, and queue activity

---

### Desktop (Windows)

1. Download the latest `.exe` from [Releases](https://github.com/JStaRFilms/Koe/releases).
2. Install and launch. Koe will live in your system tray.

### Mobile (iOS & Android)

1. Clone the repo and navigate to `apps/mobile`.
2. Install [Expo Go](https://expo.dev/go) on your device.
3. Run `pnpm dev:mobile` and scan the QR code.
*Note: Native builds (.ipa/.apk) can be generated via EAS.*

### Build Everything from Source

```bash
# Clone the repository
git clone https://github.com/JStaRFilms/Koe.git
cd Koe

# Install all dependencies (Monorepo)
pnpm install

# Run Desktop
pnpm dev

# Build for production
pnpm build

# Run Mobile
pnpm dev:mobile
```

If `pnpm dev` fails with `Electron failed to install correctly`, pnpm likely skipped Electron's install script during dependency setup. This repo now allowlists the required build/install scripts for pnpm 10+, and an existing checkout can be repaired with:

```bash
pnpm rebuild electron esbuild protobufjs electron-winstaller
```

### Release Builds

- Real release artifacts should be built on GitHub Actions, not locally
- Push a matching version tag such as `v1.1.3` after updating `package.json`
- The release workflow will build Windows and macOS and attach artifacts to that GitHub Release
- See [docs/release-process.md](docs/release-process.md)

### Vercel Deployment

- The marketing website is the Next.js app in `koe-website/`
- In Vercel Project Settings -> Build and Deployment -> Root Directory, set the root to `koe-website`
- Leave the framework as Next.js for that project
- A root-level `vercel.json` is also included as a fallback so root builds target `koe-website`

### Requirements

- Windows 10/11 (for Desktop) or iOS/Android (for Mobile)
- [Groq API Key](https://console.groq.com/keys) (free tier available)
- Microphone access

---

## Quick Start

1. **Launch** Koe — it minimizes to your system tray
2. **Configure** — Right-click the tray icon → **Settings** → Enter your Groq API key
3. **Dictate** — Click any text field and press `Ctrl + Shift + Space`
4. **Speak** — The pill UI appears. Talk naturally, including pauses
5. **Done** — Press the hotkey again when you're finished. Koe finalizes the session, copies the full refined transcript, and keeps it in history

---

## Usage Guide

### Global Hotkeys

| Action | Shortcut |
|--------|----------|
| Start / Stop Recording | `Ctrl + Shift + Space` |
| Retry Last Failed / Latest Transcript | `Ctrl + Shift + ,` |
| Open Settings | Tray menu |

### How Recording Works

- Koe records one continuous session until you stop it
- Internally, it breaks longer recordings into ordered segments
- Segments are transcribed and refined in the background
- Refined text is typed in order as it becomes ready
- When the session ends, Koe keeps one full final transcript in clipboard and history

### The Pill UI

The floating pill is designed to stay out of the way while still telling you what matters:

- **Idle** — Waiting for the next dictation
- **Listening** — Live voice levels and active recording state
- **Warning** — Mic fallback or chunk failure without immediately killing the session
- **Processing** — Finalizing remaining work after you stop
- **Complete** — Brief success state before hiding

### Settings

Configure via the settings window (right-click the tray icon):

| Setting | Description | Default |
|---------|-------------|---------|
| Groq API Key | Your API key from [console.groq.com](https://console.groq.com/keys) | — |
| Language | Transcription language (`auto` for detection) | `auto` |
| Prompt Style | How Koe refines the transcript | `Clean` |
| Auto-Paste | Automatically type into the focused window | `enabled` |
| Theme | Dark / Light mode | `dark` |

---

Koe uses a shared core architecture to ensure consistency across Desktop and Mobile. Business logic lives in `@koe/core`, while platform-specific drivers handle audio and output.

See the [Detailed Architecture Guide](docs/Architecture.md) for more info.

### Platform Specifics

| Feature | Desktop (Windows) | Mobile (iOS/Android) |
|---------|-------------------|----------------------|
| **Trigger** | Global Hotkey | Capture Button |
| **Output** | Auto-Paste / Type | Clipboard-First |
| **Storage** | `electron-store` | `SecureStore` |
| **Capture Logic** | Local VAD + ordered segments | Metering-driven chunk rotation + ordered segments |

### Privacy-First Design

1. **Desktop speech detection** runs locally using ONNX WebAssembly
2. **Mobile recording control** stays on-device until a chunk is ready to transcribe
3. **Retry audio** is stored only for failed or unresolved segments
4. **Your API key** is stored locally on each platform and only used for transcription/refinement requests

---

## Tech Stack

| Layer | Technology |
|-------|------------|
| **Framework** | Electron + Vite |
| **Frontend** | Vanilla JavaScript, Custom CSS |
| **Audio Capture** | Web Audio API |
| **Voice Detection** | `@ricky0123/vad-web` (Silero VAD) |
| **Transcription** | Groq Whisper API (`whisper-large-v3-turbo`) |
| **Text Enhancement** | Groq chat refinement pipeline |
| **Storage** | `electron-store` + temp retry files |
| **Packaging** | `electron-builder` |

---

## Groq API Limits

Koe is designed to stay inside Groq's free-tier limits:

| Metric | Limit | Approximate Usage |
|--------|-------|-------------------|
| Requests per minute | 20 | ~6 transcribed segments/minute with paced refinement |
| Requests per day | 2,000 | ~8 hours of normal dictation |
| Audio per day | 28,800 sec | 8 hours |

The built-in scheduler tracks request pressure and keeps the app responsive while staying inside the cap.

---

## Roadmap

### Completed
- [x] Global hotkey toggle (Desktop)
- [x] Local VAD speech detection
- [x] Groq Whisper transcription
- [x] AI transcript refinement
- [x] Auto-paste to focused window (Desktop)
- [x] Transcription history & Usage dashboard
- [x] Mobile App (iOS/Android V1)
- [x] Shared Core Extraction

### Planned
- [x] Custom AI prompts
- [x] Keyboard shortcut customization
- [ ] Export history as `.txt` / `.md`
- [x] Native macOS support (Electron)
- [ ] Android IME (Custom Keyboard) implementation

### Future
- [ ] Snippet library with voice shortcuts
- [ ] App-specific tone profiles
- [ ] Cloud sync across devices
- [ ] Team collaboration features

See [Feature Requests](docs/issues/) for the full backlog.

---

## Contributing

Contributions are welcome. Please see the repo docs and existing code patterns before opening a PR.

### Development Setup

```bash
# Fork and clone
git clone https://github.com/your-username/Koe.git
cd Koe

# Install dependencies
pnpm install

# Start development
pnpm dev
```

### Monorepo Structure

Koe is transitioning to a monorepo to support multiple platforms:

- **Root**: Legacy Electron Desktop app and shared workspace configuration
- **`apps/mobile`**: Expo-based mobile client (iOS/Android)
- **`packages/koe-core`**: Shared business logic, types, and API services

### Development Commands

| Target | Command | Description |
|--------|---------|-------------|
| **Desktop** | `pnpm dev` | Start the Electron app in dev mode |
| **Mobile** | `pnpm dev:mobile` | Start the Expo development server |
| **Core** | `pnpm build:core` | Build the shared logic package |
| **All** | `pnpm type-check` | Run type-checking across all packages |

### Project Structure

```text
Koe/
├── apps/ # Application projects
│ └── mobile/ # Expo mobile app
├── packages/ # Shared logic
│ └── koe-core/ # Core services (Whisper, Sessions)
├── src/ # Legacy Desktop source
│ ├── main/ # Electron main process
│ └── renderer/ # UI code
├── docs/ # Documentation & Tasks
├── pnpm-workspace.yaml # Workspace config
└── package.json # Root manifest & scripts
```

---

## Troubleshooting

### "No audio detected"
- Ensure microphone permissions are granted in Windows Settings
- Check that your default recording device is selected
- If another app is holding the mic, Koe will try another available input and warn you in the pill UI

### "API rate limit exceeded"
- Wait for the per-minute window to clear
- Check the usage dashboard for queue pressure
- Very long continuous dictation can still pile up requests on the free tier

### "Auto-paste not working"
- Some applications block simulated keystrokes
- Disable auto-paste in Settings and use `Ctrl + V` manually
- Run Koe as administrator if the issue persists

### App won't launch
- Ensure you're on Windows 10/11 64-bit
- Check that [Visual C++ Redistributables](https://aka.ms/vs/17/release/vc_redist.x64.exe) are installed
- Check Windows Event Viewer for crash details

---

## Acknowledgments

- **Groq** — For the fast Whisper and chat APIs
- **Silero** — For the VAD model
- **@ricky0123** — For the `vad-web` library
- **WhisperFlow** — For helping prove the category exists

---

## License

Koe is licensed under the ISC License. See the [LICENSE](LICENSE) file for details.

---


Built with ❤️ by J StaR Films Studios


Star us on GitHub if you find Koe useful.