https://github.com/jackdoe/speak

push to talk asr with whisper.cpp on metal
https://github.com/jackdoe/speak

Last synced: about 1 month ago
JSON representation

push to talk asr with whisper.cpp on metal

Host: GitHub
URL: https://github.com/jackdoe/speak
Owner: jackdoe
Created: 2026-02-07T12:36:57.000Z (4 months ago)
Default Branch: master
Last Pushed: 2026-02-07T14:37:57.000Z (4 months ago)
Last Synced: 2026-02-07T23:17:12.485Z (4 months ago)
Language: Swift
Size: 73.2 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Speak

Push-to-talk transcription for macOS. Hold a key, speak, release — your words appear wherever your cursor is.

Uses [whisper.cpp](https://github.com/ggml-org/whisper.cpp) with Metal GPU acceleration on Apple Silicon.

## Features

- **Push-to-talk** — F12 to transcribe, F11 to transcribe + send (hits Return)

- **Menu bar app** — lives in the status bar, no dock icon

- **Metal accelerated** — runs whisper.cpp on the GPU

- **Voice activity detection** — filters silence, supports 10+ minute recordings with pauses

- **Recording overlay** — floating indicator with VAD state, duration, and audio level

- **Model management** — download models from HuggingFace directly in the app

- **Configurable** — hotkeys, output mode (type/paste), VAD thresholds, all whisper.cpp parameters

- **Instant start** — optional always-on microphone for zero-latency recording

## Requirements

- macOS 14+

- Apple Silicon (M1/M2/M3/M4)

- Xcode Command Line Tools

- CMake (`brew install cmake`)

## Build

```bash

git clone --recursive https://github.com/user/speak.git

cd speak

make app

```

This builds whisper.cpp with Metal, compiles the Swift app, and creates `Speak.app`.

To install to `/Applications`:

```bash

make install

```

## First Launch

The app guides you through setup:

1. **Microphone permission** — for audio capture

2. **Accessibility permission** — for global hotkeys and text output

3. **Model download** — pick a whisper model (Large V3 Turbo Q5 recommended)

4. **Preferences** — configure hotkeys and behavior

## Usage

- **F12** (hold) — record speech, (release) — transcribe and output text

- **F11** (hold) — same as F12 but also presses Return after (for chat apps)

- **Menu bar icon** — switch models, toggle mic warm, open settings

## Make Targets

| Command | Description |

|---------|-------------|

| `make whisper` | Build whisper.cpp with Metal |

| `make build` | Build whisper.cpp + release binary |

| `make debug` | Build debug binary |

| `make run` | Build debug + run |

| `make app` | Create Speak.app bundle |

| `make install` | Copy Speak.app to /Applications |

| `make uninstall` | Remove from /Applications |

| `make clean` | Remove build artifacts |

| `make models` | Download base.en + medium.en + large-v3 models |

| `make models-large-turbo-q5` | Download large-v3-turbo-q5 (recommended) |

## Project Structure

```

Sources/Speak/

├── App/              HotkeyManager, StatusBarController, SpeakApp

├── Audio/            AudioEngine, RingBuffer, VoiceActivityDetector

├── Whisper/          WhisperContext, ModelManager, ModelDownloader, WhisperSettings

├── Pipeline/         TranscriptionPipeline

├── UI/               SettingsView, RecordingOverlay, OnboardingView

├── Utils/            TextOutput, PerformanceMonitor, LoginItemManager

└── Benchmark/        BenchmarkRunner, BenchmarkView

```

## Models

Models are stored in `~/Library/Application Support/Speak/models/`. Download via the app settings or CLI:

| Model | Size | Speed (5s audio on M1) |

|-------|------|----------------------|

| tiny.en | 75 MB | ~0.15s |

| base.en | 142 MB | ~0.3s |

| small.en | 466 MB | ~0.8s |

| large-v3-turbo-q5 | 547 MB | ~0.5s |

| medium.en | 1.5 GB | ~2.5s |

| large-v3 | 2.9 GB | ~5s |

## Built by

Written by Claude Opus 4.6 in team mode — architecture, research, and implementation done by a coordinated team of specialist agents (ML engineer, audio engineer, UI engineer) working in parallel.

## License

whisper.cpp is MIT licensed. This project wraps it with a native macOS interface.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jackdoe/speak

Awesome Lists containing this project

README