https://github.com/matanhakim/local-speech-to-text
Offline, local speech-to-text for any language: set up Whisper, dictate by hotkey, and a drop-in transcription skill.
https://github.com/matanhakim/local-speech-to-text
claude-code dictation faster-whisper hebrew local-first offline speech-to-text voice-to-text whisper
Last synced: about 8 hours ago
JSON representation
Offline, local speech-to-text for any language: set up Whisper, dictate by hotkey, and a drop-in transcription skill.
- Host: GitHub
- URL: https://github.com/matanhakim/local-speech-to-text
- Owner: matanhakim
- License: mit
- Created: 2026-06-25T20:12:25.000Z (4 days ago)
- Default Branch: main
- Last Pushed: 2026-06-28T07:29:18.000Z (1 day ago)
- Last Synced: 2026-06-28T09:15:40.488Z (1 day ago)
- Topics: claude-code, dictation, faster-whisper, hebrew, local-first, offline, speech-to-text, voice-to-text, whisper
- Language: Python
- Size: 27.3 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# local-speech-to-text
**Free, offline speech-to-text for any language, running entirely on your own
machine.** No cloud, no API key, no per-minute meter.
Voice input in AI tools is almost always English-first. Most built-in dictation
and voice features cover English and a short list of major languages; many
languages aren't covered at all. The moment you want to dictate or transcribe in,
say, Hebrew, you get pushed toward a paid provider, an API key, and a meter.
This repo needs none of that. Everything runs locally on
[`faster-whisper`](https://github.com/SYSTRAN/faster-whisper) - on a regular CPU
laptop - so you can dictate and transcribe in a language the big tools leave out.
## Three standalone tools
Each works on its own; pick what you need.
| # | Part | What it does |
|---|------|--------------|
| 1 | [**install-whisper/**](install-whisper/) | Set up the local transcription model (Whisper via faster-whisper). The foundation for the other two. |
| 2 | [**dictation/**](dictation/) | System-wide push-to-talk: tap a key, speak, and your words are pasted into whatever field has focus - an editor, a browser, the Claude Code prompt. |
| 3 | [**skill/**](skill/) | Turn a recording (meeting, call, voice memo) into a timestamped Markdown transcript. Ships as a drop-in Claude Code skill, and runs standalone too. |
## Why bother talking instead of typing
The bottleneck to good output from a language model is often how much context you
bother to give it. Typing pushes you toward short, pruned input; speaking lets
you spill everything that's actually in your head. The dictation tool (part 2)
exists to remove that bottleneck: tap a key, think out loud, and the model gets
the full paragraph you would never have typed. The skill (part 3) does the same
for context captured earlier - record a long voice memo, hand over the
transcript.
## Hardware
The default model (`ivrit-ai/whisper-large-v3-turbo-ct2`, a Hebrew fine-tune)
runs comfortably on a CPU laptop with 16-32 GB RAM - no GPU needed. For other
languages or smaller machines, see the model guide in
[`install-whisper/`](install-whisper/). Only **NVIDIA** GPUs accelerate this
stack; on everything else it runs on the CPU, which is fine - transcription runs
at roughly real time.
## Adapt it to your language
Hebrew is the worked example throughout. To switch languages, change `MODEL` and
`LANGUAGE` in `dictation/dictate.py`, and pass `-l ` / `-m ` to
`skill/transcribe.py`. For any non-English language, prefer `large-v3-turbo` over
the tiny/base models, and check Hugging Face for a fine-tune in your language
first - it usually beats the generic model of the same size.
## Privacy
The speech-to-text is **fully local** - audio never leaves the machine. Models
download once from Hugging Face, then run offline.
## Platform note
The dictation daemon (part 2) is **Windows-focused** (global key hook,
auto-paste, and beeps use Win32 APIs). The model setup (part 1) and the
transcription script (part 3) are cross-platform.
## Background
Companion to the blog post
[*Free, Offline Speech-to-Text for Non-English Languages*](https://www.matanhakim.com/posts/2026-06-29-local-transcription/).
## License
[MIT](LICENSE) © Matan Hakim