{"id":30986144,"url":"https://github.com/marcindulak/stt-mcp-server-linux","last_synced_at":"2026-05-08T07:31:58.393Z","repository":{"id":314231342,"uuid":"1051899047","full_name":"marcindulak/stt-mcp-server-linux","owner":"marcindulak","description":"Local speech-to-text MCP server for Tmux on Linux (for use not only with Claude Code)","archived":false,"fork":false,"pushed_at":"2026-04-28T16:13:03.000Z","size":79,"stargazers_count":22,"open_issues_count":0,"forks_count":4,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-28T16:33:46.046Z","etag":null,"topics":["claude-code","linux","mcp","mcp-server","tmux"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/marcindulak.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-07T00:38:30.000Z","updated_at":"2026-04-28T16:13:20.000Z","dependencies_parsed_at":"2025-09-11T10:54:45.108Z","dependency_job_id":null,"html_url":"https://github.com/marcindulak/stt-mcp-server-linux","commit_stats":null,"previous_names":["marcindulak/stt-mcp-server-linux"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/marcindulak/stt-mcp-server-linux","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcindulak%2Fstt-mcp-server-linux","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcindulak%2Fstt-mcp-server-linux/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcindulak%2Fstt-mcp-server-linux/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcindulak%2Fstt-mcp-server-linux/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/marcindulak","download_url":"https://codeload.github.com/marcindulak/stt-mcp-server-linux/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcindulak%2Fstt-mcp-server-linux/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32770988,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-08T02:36:36.067Z","status":"ssl_error","status_checked_at":"2026-05-08T02:36:07.210Z","response_time":54,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["claude-code","linux","mcp","mcp-server","tmux"],"created_at":"2025-09-12T14:36:41.947Z","updated_at":"2026-05-08T07:31:58.365Z","avatar_url":"https://github.com/marcindulak.png","language":"Python","funding_links":[],"categories":["Media Processing","Tooling 🧰","Claude Code","MCP Servers \u0026 Protocol"],"sub_categories":["Video","General","Laravel"],"readme":"[![test](https://github.com/marcindulak/stt-mcp-server-linux/actions/workflows/test.yml/badge.svg)](https://github.com/marcindulak/stt-mcp-server-linux/actions/workflows/test.yml)\n\n\u003e Co-Authored-By: Claude\n\n# Functionality overview\n\nLocal speech-to-text MCP server for Linux.\nThe speech-to-text functionality can also be used in a standalone mode in Tmux, without relying on MCP.\n\nClaude Code is required to run inside a Tmux session to enable the transcribed text injection into Claude's input stream.\n\nThe MCP server runs in a Docker container with access to host input and audio devices.\nThe server provides a `transcribe` tool accessible through MCP protocol.\nWhen the tool is activated, the server monitors the `Right Ctrl` key for push-to-talk functionality.\nKey press detection uses `/dev/input` keyboard devices.\nAudio recording uses `/dev/snd` microphone device.\n\nOn `Right Ctrl` key release, speech-to-text transcription occurs (using Whisper tiny model by default).\nThe transcribed text is injected into Claude's input stream via Tmux send-keys.\n\nThe MCP server is Linux-only due to `/dev` device dependencies.\n\n\u003e [!WARNING]\n\u003e This project will create `~/.stt-mcp-server-linux` directory.\n\n# Usage examples\n\nThe instructions follow below.\n\n1. Install [Docker Engine](https://docs.docker.com/engine/install/) or [Docker Desktop](https://docs.docker.com/desktop/)\n\n2. Install [Tmux](https://github.com/tmux/tmux).\n   If you are unfamiliar with, Tmux watch this [YouTube tutorial](https://www.youtube.com/watch?v=UxbiDtEXuxg\u0026list=PLT98CRl2KxKGiyV1u6wHDV8VwcQdzfuKe) and checkout out this [cheat sheet](https://tmuxcheatsheet.com) for a shortcuts reference.\n\n3. Clone this repository, and `cd` into it:\n\n   ```\n   git clone https://github.com/marcindulak/stt-mcp-server-linux\n   cd stt-mcp-server-linux\n   export STT_MCP_SERVER_LINUX_PATH=$(pwd)\n   ```\n\n4. Build the Docker image of the MCP server:\n\n   ```\n   bash scripts/build_docker_image.sh\n   ```\n\n5. Download the Whisper tiny model under `~/.stt-mcp-server-linux/whisper`:\n\n   ```\n   bash scripts/download_whisper_model.sh\n   ```\n\n6. Configure Tmux, so `~/.tmux.conf` contains at least:\n\n   ```\n   # Enable mouse support for scrolling\n   set -g mouse on\n\n   # Set large scrollback lines buffer\n   set -g history-limit 1000000\n\n   # Hide status bar to reduce flicker\n   set -g status off\n\n   # Reduce escape key delay to reduce flicker\n   set -g escape-time 0\n   ```\n\n## MCP mode\n\nIn this mode, the local MCP server running in the Docker container exposes the `transcribe` tool.\nThe tool is activated from Claude Code, running in a Tmux session.\nThe tool sends the transcribed text into the Tmux session's input buffer.\n\n7. Install [Claude Code](https://docs.anthropic.com/en/docs/claude-code/setup)\n\n8. Add the MCP server to Claude (MCP client).\n\n   Navigate to any of your Claude directories.\n\n   ```\n   bash \"${STT_MCP_SERVER_LINUX_PATH}/scripts/add_mcp_server_to_claude.sh\"\n   ```\n\n   Verify the Claude connection to the MCP server with:\n\n   ```\n   claude mcp list\n   ```\n\n   Expected output:\n\n   ```\n   stt-mcp-server-linux: ... ✓ Connected\n   ```\n\n\u003e [!NOTE]\n\u003e The addition the MCP server needs to be performed only once, because the server is added with the `--scope user`.\n\u003e The first time setup is now complete!\n\n9. Navigate to any of your Claude directories, start Claude in a new Tmux session stored under `~/.stt-mcp-server-linux/tmux`.\n   The reason for using a custom `TMUX_TMPDIR` location instead of the default `/tmp/tmux-$(id -u)` is to make it shareable between the Docker host and the container with correct file ownership.\n\n   ```\n   TMUX_TMPDIR=~/.stt-mcp-server-linux/tmux tmux new-session -s claude 'claude'\n   ```\n\n   and ask to `Run the transcribe tool provided by the stt-mcp-server-linux MCP server`.\n\n   Press the `Right Ctrl` key to activate `Push-to-Talk` functionality.\n   Release the key to perform the transcription and inject the resulting text into Claude.\n\n\u003e [!NOTE]\n\u003e Give the MCP server some time to initialize.\n\u003e You may need to explicitly verify its status with the `/mcp` command.\n\u003e\n\u003e Use `docker logs stt-mcp-server-linux` to check the progress.\n\u003e\n\u003e Once a `{... \"message\": \"Waiting for Right Ctrl key press on ... keyboard ...\"}` log line appears the transcription feature should be available.\n\u003e\n\u003e The `transcribe` tool returns immediately and runs in the background.\n\u003e Claude Code's terminal remains responsive for your input and commands while keyboard monitoring continues.\n\u003e\n\u003e To stop the transcription service, use `/quit` or close Claude Code.\n\n## Standalone mode (without MCP)\n\nThe speech-to-text transcription can be performed without the MCP protocol.\n\n7. Start a new Tmux session. The example here starts a session for bash:\n\n   ```\n   TMUX_TMPDIR=~/.stt-mcp-server-linux/tmux tmux new-session -s bash 'bash'\n   ```\n\n8. Start the transcription service in standalone mode:\n\n   ```\n   DEBUG=human MODE=standalone OUTPUT=tmux TMUX_SESSION=bash bash scripts/restart_mcp_server.sh\n   ```\n\n   Press the `Right Ctrl` key to activate `Push-to-Talk` functionality.\n\n   The transcribed text will be inserted into the Tmux session's input buffer.\n\n\u003e [!WARNING]\n\u003e The current limitation is that only one transcription container can be running at a given time.\n\u003e If you start a new container, it will stop and replace the previous one.\n\n# Running tests\n\nTests run inside Docker containers to have access to required dependencies.\n\n## Unit tests\n\n```\nbash scripts/test_unit.sh\n```\n\n## Integration test\n\nEnd-to-end integration test verifies the functionality of injecting text into the Tmux input:\n\n```\nbash scripts/test_tmux_integration.sh\n```\n\n## Type checking\n\nRun mypy static type checking:\n\n```\nbash scripts/test_mypy.sh\n```\n\n# Implementation overview\n\nThe system uses object composition with separated responsibilities across multiple classes:\n\n1. **MCPServer**: Handles JSON-RPC protocol communication with Claude using async/await. Manages an event loop that keeps the server responsive while background tasks execute. Routes MCP requests and schedules the speech-to-text service as background asyncio tasks.\n\n2. **AudioRecorder**: Manages audio stream capture and buffering. Provides start/stop interface for recording sessions.\n\n3. **TranscriptionEngine**: Abstract base with concrete implementations (WhisperEngine, VoskEngine) for different transcription models.\n\n4. **OutputHandler**: Abstract base with concrete implementations (TmuxOutputHandler, StdoutOutputHandler) for different output destinations.\n\n5. **KeyboardMonitor**: Handles keyboard device detection and Right Ctrl key event monitoring using evdev. Runs as an async coroutine allowing non-blocking keyboard monitoring.\n\n6. **SpeechToTextService**: Main coordinator that orchestrates all components. Provides `start_async()` for MCP mode (background execution) and `start()` for standalone mode (blocking execution).\n\n# Abandoned ideas\n\n## ydotool for Wayland text injection\nConsidered using ydotool for keyboard access on Wayland systems. Abandoned because:\n- Requires root privileges for /dev/uinput access\n- Python wrappers are unmaintained (pydotool, pyydotool, TotoBotKey)\n- Not packaged for Debian\n\n## xdotool for X11 text injection\nAttempted using xdotool for keyboard simulation on X11. Abandoned because:\n- Most modern Linux systems run on Wayland, not X11 (check you system with `echo $XDG_SESSION_TYPE`)\n\n## Direct MCP text injection\nInvestigated injecting text directly through MCP protocol. Abandoned because:\n- It seems that MCP tools can only return content to Claude (as output), not inject into input stream\n\n## Hot-swapping audio devices (microphones) without container restart\nAttempted to support switching between microphones without restarting the container. Abandoned because:\n- Docker's `--device` flag captures a snapshot of `/dev/snd` at container startup. Subsequently selected or hot-plugged audio devices are not visible inside the container.\n- Bind mounting the entire `/dev` directory (`-v /dev:/dev`) to enable real-time device visibility is a security risk.\n- This is a known Docker limitation documented in [moby/moby#39262](https://github.com/moby/moby/issues/39262).\n- Workaround: Restart the container when switching microphones.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarcindulak%2Fstt-mcp-server-linux","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmarcindulak%2Fstt-mcp-server-linux","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarcindulak%2Fstt-mcp-server-linux/lists"}