https://github.com/georgemandis/stenographer
Speech-to-text CLI powered by native macOS Speech Recognition. Transcribe audio files or live mic input across 63 languages, on-device.
https://github.com/georgemandis/stenographer
cli command-line-tool macos speech-recognition speech-to-text transcription
Last synced: 7 days ago
JSON representation
Speech-to-text CLI powered by native macOS Speech Recognition. Transcribe audio files or live mic input across 63 languages, on-device.
- Host: GitHub
- URL: https://github.com/georgemandis/stenographer
- Owner: georgemandis
- Created: 2026-05-29T00:09:09.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-06-08T21:37:19.000Z (26 days ago)
- Last Synced: 2026-06-16T12:33:31.983Z (19 days ago)
- Topics: cli, command-line-tool, macos, speech-recognition, speech-to-text, transcription
- Language: Zig
- Homepage:
- Size: 26.4 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# stenographer
Speech-to-text from the command line, powered by native macOS Speech Recognition.
Transcribe audio files or live microphone input into text. Supports 63 languages, on-device recognition (no network required), and partial results for real-time transcription. No API keys, no downloads — uses Apple's built-in speech recognizer.
Written in Zig. Uses Apple's Speech and AVFoundation frameworks via Objective-C runtime bindings.
## Install
### Homebrew
```bash
brew install georgemandis/tap/stenographer
```
### From source
Requires [Zig 0.16+](https://ziglang.org/download/) and macOS.
```bash
git clone https://github.com/georgemandis/stenographer.git
cd stenographer
zig build -Doptimize=ReleaseFast
```
## Usage
### Transcribe an audio file
```bash
$ stenographer transcribe recording.wav
Hello world this is a test of the sound classification system
$ stenographer transcribe interview.mp3 --json
{"text":"Hello world this is a test of the sound classification system"}
$ stenographer transcribe podcast.m4a --on-device
# Forces on-device recognition (no network, lower latency)
```
### Transcribe from the microphone
```bash
$ stenographer listen
Listening for 10.0s...
The quick brown fox jumps over the lazy dog
$ stenographer listen --duration=5000
Listening for 5.0s...
Hello world
$ stenographer listen --duration=30000 --json
Listening for 30.0s...
{"text":"This is a longer recording with multiple sentences"}
```
### Multi-language support
```bash
$ stenographer transcribe french_audio.aiff --locale=fr-FR
Bonjour le monde comment allez-vous aujourd'hui
$ stenographer transcribe japanese.wav --locale=ja-JP --on-device
$ stenographer locales
ar-SA
ca-ES
cs-CZ
da-DK
de-AT
de-CH
de-DE
...
zh-TW
$ stenographer locales --json
["ar-SA","ca-ES","cs-CZ",...]
```
## Composability
```bash
# Transcribe then detect language
stenographer transcribe audio.wav | lingua detect
# Transcribe and extract entities
stenographer transcribe meeting.wav | lingua entities
# Transcribe and analyze sentiment
stenographer transcribe review.wav | lingua sentiment --per-sentence
# Save a transcript to a file
stenographer transcribe meeting.wav --on-device > transcript.txt
```
## Options
```
stenographer [options]
Commands:
transcribe Transcribe an audio file
listen Transcribe from the microphone
locales List supported languages
Options:
--locale=CODE Language locale (default: en-US)
--on-device Force on-device recognition (no network)
--duration=MS Listen duration in ms (default: 10000)
--json Output as JSON
--help, -h Show this help message
--version, -v Show version
```
## Requirements
- macOS 10.15+ (Catalina or later)
- Zig 0.16+
- Speech recognition permission (prompted on first use)
- For `--on-device`: macOS will download language models as needed
## How It Works
stenographer uses Apple's [Speech](https://developer.apple.com/documentation/speech) framework with `SFSpeechRecognizer` for both file and live transcription.
- **File transcription:** `SFSpeechURLRecognitionRequest` loads audio files and runs recognition via a result handler block
- **Live mic:** `AVAudioEngine` captures microphone input, feeding buffers to `SFSpeechAudioBufferRecognitionRequest`
- **Block ABI:** ObjC block construction (`_NSConcreteStackBlock` / `_NSConcreteGlobalBlock`) for recognition result handlers and audio tap callbacks
- **Run loop:** `CFRunLoopRunInMode` pumps the event loop for async recognition callbacks
## Related Projects
- [lingua](https://github.com/georgemandis/lingua) — NLP CLI (NaturalLanguage framework)
- [cacophony](https://github.com/georgemandis/cacophony) — Sound classification CLI (SoundAnalysis framework)
- [tezcatl](https://github.com/georgemandis/tezcatl) — Headless web rendering CLI (WebKit)
- [loupe](https://github.com/georgemandis/loupe) — Computer vision CLI (Vision framework)
- [whereami](https://github.com/georgemandis/whereami) — Location CLI (CoreLocation)
- [nearme](https://github.com/georgemandis/nearme) — Local search CLI (MapKit)
## Credits
Created by [George Mandis](https://george.mand.is) during [Recurse Center](https://www.recurse.com/).