https://github.com/marcindulak/stt-mcp-server-linux
Local speech-to-text MCP server for Linux (to be used with Claude Code)
https://github.com/marcindulak/stt-mcp-server-linux
claude-code linux mcp mcp-server tmux
Last synced: 26 days ago
JSON representation
Local speech-to-text MCP server for Linux (to be used with Claude Code)
- Host: GitHub
- URL: https://github.com/marcindulak/stt-mcp-server-linux
- Owner: marcindulak
- License: apache-2.0
- Created: 2025-09-07T00:38:30.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2025-09-11T07:24:45.000Z (27 days ago)
- Last Synced: 2025-09-11T10:44:39.818Z (27 days ago)
- Topics: claude-code, linux, mcp, mcp-server, tmux
- Language: Python
- Homepage:
- Size: 39.1 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-claude-code - `stt-mcp-server-linux` - 2.0 (Tooling 🧰 [🔝](#awesome-claude-code) / Total Amount Raised: [🔝](#awesome-claude-code))
README
[](https://github.com/marcindulak/stt-mcp-server-linux/actions/workflows/test.yml)
> Co-Authored-By: Claude
# Functionality overview
Local speech-to-text MCP server for Linux.
> [!NOTE]
> Here is the summary of the usage, assuming the setup below has been performed.
>
> Navigate to any of your Claude directories, start Claude in a new Tmux session stored under `~/.tmux`:
>
> `TMUX_TMPDIR=~/.tmux tmux new-session -s claude 'claude'`
>
> and ask to `Run the transcribe tool provided by the stt-mcp-server-linux MCP server`.
>
> Press the `Right Ctrl` key to activate `Push-to-Talk` functionality.
> Release the key to perform the transcription and inject the resulting text into Claude.
> In Claude, press `esc to interrupt` and the `transcribe` tool will continue running in the background.
>
> Using this setup, to paste into Tmux, use `Left Ctrl+Shift+V`.
>
> To copy from Tmux, press `Shift` and use text mouse selection.The setup requires Claude to run inside a Tmux session to enable the transcribed text injection into Claude's input stream.
The MCP server runs in a Docker container with access to host input and audio devices.
The server provides a `transcribe` tool accessible through MCP protocol.
When the tool is activated, the server monitors the `Right Ctrl` key for push-to-talk functionality.
Key press detection uses `/dev/input` keyboard devices.
Audio recording uses `/dev/snd` microphone device.On `Right Ctrl` key release, speech-to-text transcription occurs (using Whisper tiny model by default).
The transcribed text is injected into Claude's input stream via Tmux send-keys.The MCP server is Linux-only due to `/dev` device dependencies.
# Usage examples
The full setup instructions follow below.
0. Install [Claude Code](https://docs.anthropic.com/en/docs/claude-code/setup)
1. Install [Docker Engine](https://docs.docker.com/engine/install/) or [Docker Desktop](https://docs.docker.com/desktop/)
2. Install [Tmux](https://github.com/tmux/tmux).
If you are unfamiliar with, Tmux watch this [YouTube tutorial](https://www.youtube.com/watch?v=UxbiDtEXuxg&list=PLT98CRl2KxKGiyV1u6wHDV8VwcQdzfuKe) and checkout out this [cheat sheet](https://tmuxcheatsheet.com) for a shortcuts reference.3. Clone this repository, and `cd` into it:
```
git clone https://github.com/marcindulak/stt-mcp-server-linux
cd stt-mcp-server-linux
export STT_MCP_SERVER_LINUX_PATH=$(readlink -f .)
```4. Build the Docker image of the MCP server:
```
bash scripts/build_docker_image.sh
```5. Download the Whisper tiny model under ~/.whisper`:
```
bash scripts/download_whisper_model.sh
```6. Add the MCP server to Claude (MCP client).
Navigate to any of your Claude directories.
```
bash "$STT_MCP_SERVER_LINUX_PATH"/scripts/add_mcp_server_to_claude.sh
```Verify the Claude connection to the MCP server with:
```
claude mcp list --debug
```Expected output:
```
stt-mcp-server-linux: ... ✓ Connected
```> [!WARNING]
> The need to add the MCP server per directory seems to be due to https://github.com/anthropics/claude-code/issues/68887. Configure Tmux, so `~/.tmux.conf` contains at least:
```
# Enable mouse support for scrolling
set -g mouse on# Set large scrollback lines buffer
set -g history-limit 1000000# Hide status bar to reduce flicker
set -g status off# Reduce escape key delay to reduce flicker
set -g escape-time 0
```8. Navigate to any of your Claude directories, start Claude in a new Tmux session stored under `~/.tmux`.
The reason for using a custom `TMUX_TMPDIR` location instead of the default `/tmp/tmux-$(id -u)` is to make it shareable between the Docker host and the container with correct file ownership.```
TMUX_TMPDIR=~/.tmux tmux new-session -s claude 'claude'
```and ask to `Run the transcribe tool provided by the stt-mcp-server-linux MCP server`.
Press the `Right Ctrl` key to activate `Push-to-Talk` functionality.
Release the key to perform the transcription and inject the resulting text into Claude.> [!WARNING]
> Give the MCP server some time to initialize and download the Whisper model.
>
> Use `docker logs stt-mcp-server-linux` to check the progress.
>
> Once a `{... "message": "Waiting for Right Ctrl key press on ... keyboard ..."}` log line appears the transcription feature should be available.
>
> In Claude, press `esc to interrupt` and the `transcribe` tool will continue running in the background.
>
> There seem to be no way to stop the MCP server transcribe tool, other than to `/quit` Claude.# Running tests
Tests run inside Docker containers to have access to required dependencies.
## Unit tests
```
bash scripts/test_unit.sh
```## Integration test
End-to-end integration test verifies the functionality of injecting text into the Tmux input:
```
bash scripts/test_tmux_integration.sh
```## Type checking
Run mypy static type checking:
```
bash scripts/test_mypy.sh
```# Implementation overview
The system uses object composition with separated responsibilities across multiple classes:
1. **MCPServer**: Handles JSON-RPC protocol communication with Claude. Manages tool registration and request routing.
2. **AudioRecorder**: Manages audio stream capture and buffering. Provides start/stop interface for recording sessions.
3. **TranscriptionEngine**: Abstract base with concrete implementations (WhisperEngine, VoskEngine) for different transcription models.
4. **OutputHandler**: Abstract base with concrete implementations (TmuxOutputHandler, StdoutOutputHandler) for different output destinations.
5. **KeyboardMonitor**: Handles keyboard device detection and Right Ctrl key event monitoring using evdev.
6. **SpeechToTextService**: Main coordinator that orchestrates all components. Handles the transcription workflow.
# Abandoned ideas
## ydotool for Wayland text injection
Considered using ydotool for keyboard access on Wayland systems. Abandoned because:
- Requires root privileges for /dev/uinput access
- Python wrappers are unmaintained (pydotool, pyydotool, TotoBotKey)
- Not packaged for Debian## xdotool for X11 text injection
Attempted using xdotool for keyboard simulation on X11. Abandoned because:
- Most modern Linux systems run on Wayland, not X11 (check you system with `echo $XDG_SESSION_TYPE`)## Direct MCP text injection
Investigated injecting text directly through MCP protocol. Abandoned because:
- It seems that MCP tools can only return content to Claude (as output), not inject into input stream