An open API service indexing awesome lists of open source software.

https://github.com/bcelary/gnome-speech2text

Local speech-to-text extension for GNOME Shell using Whisper.cpp. Record audio with keyboard shortcuts, transcribe offline, and insert text anywhere - fully private with no cloud APIs.
https://github.com/bcelary/gnome-speech2text

gnome gnome-shell-extension python python-service whisper whisper-cpp

Last synced: 6 days ago
JSON representation

Local speech-to-text extension for GNOME Shell using Whisper.cpp. Record audio with keyboard shortcuts, transcribe offline, and insert text anywhere - fully private with no cloud APIs.

Awesome Lists containing this project

README

          

# GNOME Speech2Text using Whisper.cpp

**Press shortcut → Speak → Get Text**

Local speech-to-text for GNOME Shell. No cloud. No APIs.

Status indicator in system tray (top-right panel) always shows recording/processing state.

## Choose Your Experience

- **Minimal** - Errors only, stay-out-of-the-way mode
- **Normal** - Brief notifications, multitask while recording
- **Focused** - Modal during recording only, transcription in background
- **Blocking** - Full-screen modal, focused workflow (blocks during recording + transcription)

## Features

- Tray icon presents status (Idle/Recording/Transcribing)
- Keyboard shortcut (Super+Alt+Space)
- Multi-language support
- Auto text insertion (X11 only) or clipboard
- Customizable models and Voice Activity Detection
- Fast local transcription (no cloud/APIs)

## How It Works

Three components required:
- **Extension** - GNOME Shell UI, shortcuts, dialogs
- **D-Bus Service** - Python backend (audio recording, processing)
- **whisper.cpp** - [ggerganov/whisper.cpp](https://github.com/ggerganov/whisper.cpp) server for transcription

All three must be installed separately (see Installation below).

## Installation

### Quick Install (Recommended)

Install extension from [extensions.gnome.org](https://extensions.gnome.org/extension/8706/speech2text-with-whispercpp/), then:

**1. Install Dependencies**

```bash
# Ubuntu/Debian
sudo apt install build-essential cmake python3 pipx ffmpeg python3-dbus python3-gi wl-clipboard xdotool xclip

# Fedora
sudo dnf install gcc gcc-c++ cmake python3 pipx ffmpeg python3-dbus python3-gobject wl-clipboard xdotool xclip
```

**2. Install whisper.cpp**

```bash
# Clone
git clone https://github.com/ggml-org/whisper.cpp.git
cd whisper.cpp

# Build with CUDA support (NVIDIA GPU)
cmake -B build -DGGML_CUDA=1 -DCMAKE_INSTALL_PREFIX=~/.local
cmake --build build -j --config Release
cmake --install build

# Add to shell environment (~/.bashrc or ~/.zshrc) for CLI usage
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH="$HOME/.local/lib:$LD_LIBRARY_PATH"' >> ~/.bashrc
source ~/.bashrc

# Add to GNOME environment for the service
mkdir -p ~/.config/environment.d
cat >> ~/.config/environment.d/custom-env.conf < *Extension preferences for customizing behavior and keyboard shortcuts*

## Usage

1. Press `Super+Alt+Space` (or click the extension's round circle icon)
2. Speak
3. Press `Super+Alt+Space` (or icon) again to stop recording
4. Obtain the result or Review transcription and Act (if using preview action)

## Troubleshooting

**Check installation:**
```bash
make status
gnome-extensions enable speech2text-whispercpp@bcelary.github
```

**View logs:**
```bash
./scripts/tail-logs.sh # Extension logs
./scripts/tail-service-logs.sh # Service logs
```

**Note:** Text insertion requires X11. On Wayland, use clipboard mode.

## Development

```bash
make help # See all available targets
```

For service development, see [service/README.md](./service/README.md).

## Uninstall

```bash
make uninstall
```

## License

MIT - see [LICENSE](LICENSE)

Forked from [kavehtehrani/gnome-speech2text](https://github.com/kavehtehrani/gnome-speech2text)