https://github.com/rambip/sttui
Modern speech to text in your terminal. Use for scripting, writing, vibe coding and more !
https://github.com/rambip/sttui
productivity python speech-to-text tui vibe-coding
Last synced: 3 months ago
JSON representation
Modern speech to text in your terminal. Use for scripting, writing, vibe coding and more !
- Host: GitHub
- URL: https://github.com/rambip/sttui
- Owner: rambip
- License: mit
- Created: 2026-03-14T10:09:38.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-03-22T20:20:01.000Z (3 months ago)
- Last Synced: 2026-03-23T06:21:19.699Z (3 months ago)
- Topics: productivity, python, speech-to-text, tui, vibe-coding
- Language: Python
- Homepage:
- Size: 292 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Roadmap: ROADMAP.md
- Agents: AGENTS.md
Awesome Lists containing this project
README
# `sttui`: Speech To Text in your terminal
**No browser. No Web UI. Fast speech-to-text with the best models**
[](https://badge.fury.io/py/sttui)
https://github.com/user-attachments/assets/252ba77e-d3f3-4689-bcc1-77f536f10c60
# Setup
```bash
pip install sttui
```
(or if you have `uv` installed, `uvx sttui`)
Then, you must have an account on [openrouter](enrouter.ai/), and get an API key.
To register it, run:
```bash
sttui auth
```
Storage of your key
Your key will be stored inside ~/.config/sttui/auth.json
Make sure you don't commit this file !
# Config
When you first start the app, a config file is created at: `~/.config/sttui/config.toml`
You can specify the default model (without the `openrouter` prefix), the prompt, and the maximum audio length.
```toml
[transcription]
model = "mistralai/voxtral-small-24b-2507"
prompt = """
You are a helpful assistant that can hear audio and write text.
Return a transcription of the user audio as json. If the user request is empty, return null.
{
"transcription": ""
}
{
"transcription": null
}
"""
max_seconds = 600
```
⚠️ Make sure that the prompt asks the model to answer in this json format, it's the one expected by `sttui`
# Commands
```bash
# Start interactive dictation TUI
sttui
# Equivalent explicit run command
sttui run
# Show CLI help
sttui --help
# Set or update API key
sttui auth
# TUI + write transcript to stdout on Enter
sttui --stdout
# Override model and recording cap for this run
sttui --model google/gemini-2.5-flash --max-seconds 120
# Use a custom config file
sttui --config ~/.config/sttui/config.toml
# Record, transcribe, and send to an HTTP endpoint
sttui send --post https://example.com --body '{"text": $0}'
# Send transcript to a shell command
sttui send --command 'xargs -I {} notify-send "{}"'
# Chain multiple sends (with 1s delay between them)
sttui send --post https://example.com/foo --body '{"a": $1}' \
--post https://example.com/bar --body '{}' \
--delay 1000
# Background lifecycle (no TUI)
sttui background start
sttui background stop
sttui background toggle
# Same with desktop notifications
sttui background --notify start
```
## Send Command Templates
In `--body` templates, use `$0` for the full transcript, `$1`/`$2`/etc. for individual parts.
Values are JSON-escaped automatically when a `--body` template is provided.
All recordings and transcripts are stored in `~/.local/share/sttui/recordings/`.
# Contributing
This is a side-project of mine. I must admit there is mostly AI-generated code, but I try to review and ensure good practices.
I don't have strong opinions about how this project should evolve. If you find it useful, feel free to contribute !