https://github.com/bcelary/gnome-speech2text
Local speech-to-text extension for GNOME Shell using Whisper.cpp. Record audio with keyboard shortcuts, transcribe offline, and insert text anywhere - fully private with no cloud APIs.
https://github.com/bcelary/gnome-speech2text
gnome gnome-shell-extension python python-service whisper whisper-cpp
Last synced: 6 days ago
JSON representation
Local speech-to-text extension for GNOME Shell using Whisper.cpp. Record audio with keyboard shortcuts, transcribe offline, and insert text anywhere - fully private with no cloud APIs.
- Host: GitHub
- URL: https://github.com/bcelary/gnome-speech2text
- Owner: bcelary
- License: mit
- Created: 2025-10-12T19:14:00.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2026-03-01T09:46:16.000Z (3 months ago)
- Last Synced: 2026-03-01T13:22:53.954Z (3 months ago)
- Topics: gnome, gnome-shell-extension, python, python-service, whisper, whisper-cpp
- Language: Python
- Homepage:
- Size: 1.4 MB
- Stars: 9
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# GNOME Speech2Text using Whisper.cpp
**Press shortcut → Speak → Get Text**
Local speech-to-text for GNOME Shell. No cloud. No APIs.
Status indicator in system tray (top-right panel) always shows recording/processing state.
## Choose Your Experience
- **Minimal** - Errors only, stay-out-of-the-way mode
- **Normal** - Brief notifications, multitask while recording
- **Focused** - Modal during recording only, transcription in background
- **Blocking** - Full-screen modal, focused workflow (blocks during recording + transcription)
## Features
- Tray icon presents status (Idle/Recording/Transcribing)
- Keyboard shortcut (Super+Alt+Space)
- Multi-language support
- Auto text insertion (X11 only) or clipboard
- Customizable models and Voice Activity Detection
- Fast local transcription (no cloud/APIs)
## How It Works
Three components required:
- **Extension** - GNOME Shell UI, shortcuts, dialogs
- **D-Bus Service** - Python backend (audio recording, processing)
- **whisper.cpp** - [ggerganov/whisper.cpp](https://github.com/ggerganov/whisper.cpp) server for transcription
All three must be installed separately (see Installation below).
## Installation
### Quick Install (Recommended)
Install extension from [extensions.gnome.org](https://extensions.gnome.org/extension/8706/speech2text-with-whispercpp/), then:
**1. Install Dependencies**
```bash
# Ubuntu/Debian
sudo apt install build-essential cmake python3 pipx ffmpeg python3-dbus python3-gi wl-clipboard xdotool xclip
# Fedora
sudo dnf install gcc gcc-c++ cmake python3 pipx ffmpeg python3-dbus python3-gobject wl-clipboard xdotool xclip
```
**2. Install whisper.cpp**
```bash
# Clone
git clone https://github.com/ggml-org/whisper.cpp.git
cd whisper.cpp
# Build with CUDA support (NVIDIA GPU)
cmake -B build -DGGML_CUDA=1 -DCMAKE_INSTALL_PREFIX=~/.local
cmake --build build -j --config Release
cmake --install build
# Add to shell environment (~/.bashrc or ~/.zshrc) for CLI usage
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH="$HOME/.local/lib:$LD_LIBRARY_PATH"' >> ~/.bashrc
source ~/.bashrc
# Add to GNOME environment for the service
mkdir -p ~/.config/environment.d
cat >> ~/.config/environment.d/custom-env.conf < *Extension preferences for customizing behavior and keyboard shortcuts*
## Usage
1. Press `Super+Alt+Space` (or click the extension's round circle icon)
2. Speak
3. Press `Super+Alt+Space` (or icon) again to stop recording
4. Obtain the result or Review transcription and Act (if using preview action)
## Troubleshooting
**Check installation:**
```bash
make status
gnome-extensions enable speech2text-whispercpp@bcelary.github
```
**View logs:**
```bash
./scripts/tail-logs.sh # Extension logs
./scripts/tail-service-logs.sh # Service logs
```
**Note:** Text insertion requires X11. On Wayland, use clipboard mode.
## Development
```bash
make help # See all available targets
```
For service development, see [service/README.md](./service/README.md).
## Uninstall
```bash
make uninstall
```
## License
MIT - see [LICENSE](LICENSE)
Forked from [kavehtehrani/gnome-speech2text](https://github.com/kavehtehrani/gnome-speech2text)