https://github.com/ashbuk/speak-to-ai
Native Linux voice-to-text app with hotkey support
https://github.com/ashbuk/speak-to-ai
appimage flatpak-apps golang linux voice-to-text whisper-cpp
Last synced: about 1 month ago
JSON representation
Native Linux voice-to-text app with hotkey support
- Host: GitHub
- URL: https://github.com/ashbuk/speak-to-ai
- Owner: AshBuk
- License: mit
- Created: 2025-04-22T01:19:28.000Z (6 months ago)
- Default Branch: master
- Last Pushed: 2025-08-25T13:28:02.000Z (about 2 months ago)
- Last Synced: 2025-08-25T15:35:59.770Z (about 2 months ago)
- Topics: appimage, flatpak-apps, golang, linux, voice-to-text, whisper-cpp
- Language: Go
- Homepage:
- Size: 8.32 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# Speak to AI
> Native Linux voice-to-text app 🗣️
[](https://github.com/AshBuk/speak-to-ai/actions/workflows/ci.yml)
[](https://github.com/AshBuk/speak-to-ai/actions/workflows/build-releases.yml)
[](https://github.com/AshBuk/speak-to-ai/releases)
[](https://go.dev/)
[](https://snyk.io/test/github/AshBuk/speak-to-ai)[](#-system-requirements)
[](#-features)
[](#-features)
[](https://github.com/AshBuk/speak-to-ai/releases)
[](https://github.com/AshBuk/speak-to-ai/releases)**A minimalist, privacy-focused desktop application that enables voice input (speech to text) for redactors, IDE or AI assistants without sending your voice to the cloud. Uses the Whisper model locally for speech recognition. Written in Go, an optimized desktop application for Linux.**
## Features
- **Cross-platform support** for X11 and Wayland
- **Desktop Environment Support**: Native integration with GNOME, KDE, and other Linux DEs
- **Privacy-first**: no data sent to external servers
- **Portable**: available as AppImage and Flatpak- **100% Offline** speech recognition using Whisper.cpp
- **System tray integration** with recording status (🎤 / 💤)
- **Key binding support** (AltGr + ,) and customizable hotkeys
- **Automatic typing** in active window after transcription
- **Clipboard support** for copying transcribed text
- **WebSocket API** for external integrations (optional)
- **Visual notifications** for statuses## ✦ Installation
Get prebuilt packages on the [Releases](https://github.com/AshBuk/speak-to-ai/releases) page:
- AppImage: portable binary for most Linux distributions
- Flatpak: sandboxed install### AppImage
Download the latest AppImage from [Releases](https://github.com/AshBuk/speak-to-ai/releases):
```bash
# Download the file, then:
chmod +x speak-to-ai-*.AppImage
# Open:
./speak-to-ai-*.AppImage
```### Flatpak
Download and install the Flatpak from [Releases](https://github.com/AshBuk/speak-to-ai/releases):
```bash
# Download the file, then:
flatpak install --user io.github.ashbuk.speak-to-ai.flatpak
# Run the application
flatpak run io.github.ashbuk.speak-to-ai
```## ✦ Configuration
Configuration file is automatically created at:
- **AppImage**: `~/.config/speak-to-ai/config.yaml`
- **Flatpak**: `~/.var/app/io.github.ashbuk.speak-to-ai/config/speak-to-ai/config.yaml`### Desktop Environment Compatibility
#### GNOME & KDE (Recommended)
Global hotkeys work seamlessly using the `org.freedesktop.portal.GlobalShortcuts` portal:
- **GNOME**: Full native support, no additional configuration needed
- **KDE Plasma**: Full native support, no additional configuration needed#### Other Desktop Environments
For DEs without GlobalShortcuts portal support (XFCE, MATE, i3, etc.):
- Hotkeys may be limited by Flatpak sandboxing
- Optional: Grant input device access for better hotkey support:```bash
flatpak override --user --device=input io.github.ashbuk.speak-to-ai
```Then restart the app. This is optional and only needed on DEs without GlobalShortcuts portal.
## ✦ Project Status
Functionality and Go code are ready. Currently improving UI/UX as for now it's more geek-friendly than user-friendly. Working on quality AppImage and Flatpak builds.
## ✦ For Developers
Start onboarding with:
- [ARCHITECTURE.md](ARCHITECTURE.md) — system architecture and component design
- [DEVELOPMENT.md](DEVELOPMENT.md) — development workflow and build instructions
- [CONTRIBUTING.md](CONTRIBUTING.md) — contribution guidelines and how to help improve the project
- [docker/README.md](docker/README.md) — Docker-based development## ✦ Architecture & Components
- **Local Daemon**: Go application handling hotkeys, audio recording, and output
- **Whisper Engine**: Uses `whisper.cpp` binary for speech recognition
- **Audio Recording**: Supports `arecord` and `ffmpeg` backends
- **Text Output**:
- **Active Window Mode**: Automatically types transcribed text into the currently active window
- **Clipboard Mode**: Copies transcribed text to system clipboard
- **Combined Mode**: Both typing and clipboard operations
- **WebSocket Server**: Provides API for external applications (optional, port 8080)## ✦ System Requirements
- **OS**: Linux (Ubuntu 20.04+, Fedora 35+, or similar)
- **Desktop**: X11 or Wayland environment
- **Audio**: Microphone/recording capability
- **Storage**: ~200MB for model and dependencies
- **Memory**: ~500MB RAM during operation## ✦ Acknowledgments
- [whisper.cpp](https://github.com/ggerganov/whisper.cpp) for the excellent C++ implementation of OpenAI Whisper
- [getlantern/systray](https://github.com/getlantern/systray) for cross-platform system tray support
- OpenAI for the original Whisper model---
Sharing with the community for privacy-conscious Linux users
---
## ✦ LicenseMIT — see `LICENSE`.
## Sponsor
[](https://github.com/sponsors/AshBuk) [](https://www.paypal.com/donate/?hosted_button_id=R3HZH8DX7SCJG)
If you find Speak-to-AI useful, please consider supporting development. Your support helps improve the app, real-time streaming, and security hardening.