https://github.com/marcindulak/stt-mcp-server-linux
Local speech-to-text MCP server for Tmux on Linux (for use not only with Claude Code)
https://github.com/marcindulak/stt-mcp-server-linux
claude-code linux mcp mcp-server tmux
Last synced: about 1 month ago
JSON representation
Local speech-to-text MCP server for Tmux on Linux (for use not only with Claude Code)
- Host: GitHub
- URL: https://github.com/marcindulak/stt-mcp-server-linux
- Owner: marcindulak
- License: apache-2.0
- Created: 2025-09-07T00:38:30.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2026-04-28T16:13:03.000Z (about 2 months ago)
- Last Synced: 2026-04-28T16:33:46.046Z (about 2 months ago)
- Topics: claude-code, linux, mcp, mcp-server, tmux
- Language: Python
- Homepage:
- Size: 77.1 KB
- Stars: 22
- Watchers: 1
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- awesome-mcp - marcindulak/stt-mcp-server-linux - to-text for Tmux on Linux ☆`19` (Media Processing / Video)
- awesome-ai - marcindulak/stt-mcp-server-linux - to-text MCP server for Linux (for use with Claude Code) ☆`17` (Claude Code / Laravel)
- awesome-openclaw-skills - stt-mcp-server-linux - to-talk speech transcription setup for Linux using a Python MCP server. Runs locally in Docke... | - | (MCP Servers & Protocol)
README
[](https://github.com/marcindulak/stt-mcp-server-linux/actions/workflows/test.yml)
> Co-Authored-By: Claude
# Functionality overview
Local speech-to-text MCP server for Linux.
The speech-to-text functionality can also be used in a standalone mode in Tmux, without relying on MCP.
Claude Code is required to run inside a Tmux session to enable the transcribed text injection into Claude's input stream.
The MCP server runs in a Docker container with access to host input and audio devices.
The server provides a `transcribe` tool accessible through MCP protocol.
When the tool is activated, the server monitors the `Right Ctrl` key for push-to-talk functionality.
Key press detection uses `/dev/input` keyboard devices.
Audio recording uses `/dev/snd` microphone device.
On `Right Ctrl` key release, speech-to-text transcription occurs (using Whisper tiny model by default).
The transcribed text is injected into Claude's input stream via Tmux send-keys.
The MCP server is Linux-only due to `/dev` device dependencies.
> [!WARNING]
> This project will create `~/.stt-mcp-server-linux` directory.
# Usage examples
The instructions follow below.
1. Install [Docker Engine](https://docs.docker.com/engine/install/) or [Docker Desktop](https://docs.docker.com/desktop/)
2. Install [Tmux](https://github.com/tmux/tmux).
If you are unfamiliar with, Tmux watch this [YouTube tutorial](https://www.youtube.com/watch?v=UxbiDtEXuxg&list=PLT98CRl2KxKGiyV1u6wHDV8VwcQdzfuKe) and checkout out this [cheat sheet](https://tmuxcheatsheet.com) for a shortcuts reference.
3. Clone this repository, and `cd` into it:
```
git clone https://github.com/marcindulak/stt-mcp-server-linux
cd stt-mcp-server-linux
export STT_MCP_SERVER_LINUX_PATH=$(pwd)
```
4. Build the Docker image of the MCP server:
```
bash scripts/build_docker_image.sh
```
5. Download the Whisper tiny model under `~/.stt-mcp-server-linux/whisper`:
```
bash scripts/download_whisper_model.sh
```
6. Configure Tmux, so `~/.tmux.conf` contains at least:
```
# Enable mouse support for scrolling
set -g mouse on
# Set large scrollback lines buffer
set -g history-limit 1000000
# Hide status bar to reduce flicker
set -g status off
# Reduce escape key delay to reduce flicker
set -g escape-time 0
```
## MCP mode
In this mode, the local MCP server running in the Docker container exposes the `transcribe` tool.
The tool is activated from Claude Code, running in a Tmux session.
The tool sends the transcribed text into the Tmux session's input buffer.
7. Install [Claude Code](https://docs.anthropic.com/en/docs/claude-code/setup)
8. Add the MCP server to Claude (MCP client).
Navigate to any of your Claude directories.
```
bash "${STT_MCP_SERVER_LINUX_PATH}/scripts/add_mcp_server_to_claude.sh"
```
Verify the Claude connection to the MCP server with:
```
claude mcp list
```
Expected output:
```
stt-mcp-server-linux: ... ✓ Connected
```
> [!NOTE]
> The addition the MCP server needs to be performed only once, because the server is added with the `--scope user`.
> The first time setup is now complete!
9. Navigate to any of your Claude directories, start Claude in a new Tmux session stored under `~/.stt-mcp-server-linux/tmux`.
The reason for using a custom `TMUX_TMPDIR` location instead of the default `/tmp/tmux-$(id -u)` is to make it shareable between the Docker host and the container with correct file ownership.
```
TMUX_TMPDIR=~/.stt-mcp-server-linux/tmux tmux new-session -s claude 'claude'
```
and ask to `Run the transcribe tool provided by the stt-mcp-server-linux MCP server`.
Press the `Right Ctrl` key to activate `Push-to-Talk` functionality.
Release the key to perform the transcription and inject the resulting text into Claude.
> [!NOTE]
> Give the MCP server some time to initialize.
> You may need to explicitly verify its status with the `/mcp` command.
>
> Use `docker logs stt-mcp-server-linux` to check the progress.
>
> Once a `{... "message": "Waiting for Right Ctrl key press on ... keyboard ..."}` log line appears the transcription feature should be available.
>
> The `transcribe` tool returns immediately and runs in the background.
> Claude Code's terminal remains responsive for your input and commands while keyboard monitoring continues.
>
> To stop the transcription service, use `/quit` or close Claude Code.
## Standalone mode (without MCP)
The speech-to-text transcription can be performed without the MCP protocol.
7. Start a new Tmux session. The example here starts a session for bash:
```
TMUX_TMPDIR=~/.stt-mcp-server-linux/tmux tmux new-session -s bash 'bash'
```
8. Start the transcription service in standalone mode:
```
DEBUG=human MODE=standalone OUTPUT=tmux TMUX_SESSION=bash bash scripts/restart_mcp_server.sh
```
Press the `Right Ctrl` key to activate `Push-to-Talk` functionality.
The transcribed text will be inserted into the Tmux session's input buffer.
> [!WARNING]
> The current limitation is that only one transcription container can be running at a given time.
> If you start a new container, it will stop and replace the previous one.
# Running tests
Tests run inside Docker containers to have access to required dependencies.
## Unit tests
```
bash scripts/test_unit.sh
```
## Integration test
End-to-end integration test verifies the functionality of injecting text into the Tmux input:
```
bash scripts/test_tmux_integration.sh
```
## Type checking
Run mypy static type checking:
```
bash scripts/test_mypy.sh
```
# Implementation overview
The system uses object composition with separated responsibilities across multiple classes:
1. **MCPServer**: Handles JSON-RPC protocol communication with Claude using async/await. Manages an event loop that keeps the server responsive while background tasks execute. Routes MCP requests and schedules the speech-to-text service as background asyncio tasks.
2. **AudioRecorder**: Manages audio stream capture and buffering. Provides start/stop interface for recording sessions.
3. **TranscriptionEngine**: Abstract base with concrete implementations (WhisperEngine, VoskEngine) for different transcription models.
4. **OutputHandler**: Abstract base with concrete implementations (TmuxOutputHandler, StdoutOutputHandler) for different output destinations.
5. **KeyboardMonitor**: Handles keyboard device detection and Right Ctrl key event monitoring using evdev. Runs as an async coroutine allowing non-blocking keyboard monitoring.
6. **SpeechToTextService**: Main coordinator that orchestrates all components. Provides `start_async()` for MCP mode (background execution) and `start()` for standalone mode (blocking execution).
# Abandoned ideas
## ydotool for Wayland text injection
Considered using ydotool for keyboard access on Wayland systems. Abandoned because:
- Requires root privileges for /dev/uinput access
- Python wrappers are unmaintained (pydotool, pyydotool, TotoBotKey)
- Not packaged for Debian
## xdotool for X11 text injection
Attempted using xdotool for keyboard simulation on X11. Abandoned because:
- Most modern Linux systems run on Wayland, not X11 (check you system with `echo $XDG_SESSION_TYPE`)
## Direct MCP text injection
Investigated injecting text directly through MCP protocol. Abandoned because:
- It seems that MCP tools can only return content to Claude (as output), not inject into input stream
## Hot-swapping audio devices (microphones) without container restart
Attempted to support switching between microphones without restarting the container. Abandoned because:
- Docker's `--device` flag captures a snapshot of `/dev/snd` at container startup. Subsequently selected or hot-plugged audio devices are not visible inside the container.
- Bind mounting the entire `/dev` directory (`-v /dev:/dev`) to enable real-time device visibility is a security risk.
- This is a known Docker limitation documented in [moby/moby#39262](https://github.com/moby/moby/issues/39262).
- Workaround: Restart the container when switching microphones.