{"id":49238051,"url":"https://github.com/deepgram/voice-keyboard-linux","last_synced_at":"2026-04-24T17:37:31.487Z","repository":{"id":313402979,"uuid":"1018202318","full_name":"deepgram/voice-keyboard-linux","owner":"deepgram","description":"Linux virtual keyboard driver which types what you say using Deepgram Flux STT API","archived":false,"fork":false,"pushed_at":"2025-11-06T18:08:15.000Z","size":151,"stargazers_count":6,"open_issues_count":0,"forks_count":3,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-06T20:20:40.769Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"isc","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/deepgram.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-11T19:38:10.000Z","updated_at":"2025-11-06T18:08:12.000Z","dependencies_parsed_at":"2025-09-05T21:16:26.500Z","dependency_job_id":"0caba3ae-d156-49b9-9799-10c2da9793b3","html_url":"https://github.com/deepgram/voice-keyboard-linux","commit_stats":null,"previous_names":["deepgram/voice-keyboard-linux"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/deepgram/voice-keyboard-linux","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepgram%2Fvoice-keyboard-linux","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepgram%2Fvoice-keyboard-linux/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepgram%2Fvoice-keyboard-linux/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepgram%2Fvoice-keyboard-linux/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/deepgram","download_url":"https://codeload.github.com/deepgram/voice-keyboard-linux/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepgram%2Fvoice-keyboard-linux/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32234726,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-24T13:21:15.438Z","status":"ssl_error","status_checked_at":"2026-04-24T13:21:15.005Z","response_time":64,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-24T17:37:30.964Z","updated_at":"2026-04-24T17:37:31.480Z","avatar_url":"https://github.com/deepgram.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Voice Keyboard\n\nVoice keyboard is a demo application showcasing Deepgram's new turn-taking speech-to-text API: **Flux**.\n\nA voice-controlled Linux virtual keyboard that converts speech to text and types it into any application.\n\nAs a result of directly targeting Linux as a driver, this works with all Linux applications.\n\n## Features\n\n- **Voice-to-Text**: Real-time speech recognition using Deepgram's **Flux** API service (turn-taking STT)\n- **Virtual Keyboard**: Creates a virtual input device that works with all applications\n- **Incremental Typing**: Smart transcript updates with minimal backspacing for real-time corrections\n\n## Architecture\n\nThe application solves a common Linux privilege problem:\n- **Virtual keyboard creation** requires root access to `/dev/uinput`\n- **Audio input** requires user-space access to PipeWire/PulseAudio\n\n**Solution**: The application starts with root privileges, creates the virtual keyboard, then drops privileges to access the user's audio session.\n\n## Installation\n\n### Prerequisites\n\n```bash\n# Install Rust\ncurl --proto '=https' --tlsv1.2 -sSf https://rustup.rs | sh\n\n# Install required system packages (Fedora/RHEL)\nsudo dnf install alsa-lib-devel\n\n# Install required system packages (Ubuntu/Debian)\nsudo apt install libasound2-dev\n```\n\n### Build\n\n```bash\ngit clone \u003crepository-url\u003e\ncd voice-keyboard\ncargo build\n```\n\n### Acquire a Deepgram API key\n\nYou’ll need a Deepgram API key to authenticate with Flux.\n\n- Create or manage keys in the Deepgram console: [Create additional API keys](https://developers.deepgram.com/docs/create-additional-api-keys)\n- Export the key so the app can pick it up (recommended):\n  ```bash\n  export DEEPGRAM_API_KEY=\"dg_your_api_key_here\"\n  ```\n- The client sends the header `Authorization: Token \u003cDEEPGRAM_API_KEY\u003e`.\n- For CI or systemd services, set `DEEPGRAM_API_KEY` in the environment for the service user.\n- Security tip: treat API keys like passwords. Prefer env vars over committing keys to files.\n\n## Usage\n\n### Easy Method (Recommended)\n\nUse the provided runner script:\n\n```bash\n./run.sh\n```\n\n### Manual Method\n\n```bash\n# Build and run with proper privilege handling\ncargo build\nsudo -E ./target/debug/voice-keyboard --test-stt\n```\n\n**Important**: Always use `sudo -E` to preserve environment variables needed for audio access.\n\n## Speech-to-Text Service\n\nThis application uses **Deepgram Flux**, the company's new turn‑taking STT API. The default WebSocket URL is `wss://api.deepgram.com/v2/listen`.\n\n## Command Line Options\n\n```bash\nvoice-keyboard [OPTIONS]\n\nOPTIONS:\n    --test-audio        Test audio input and show levels\n    --test-stt          Test speech-to-text functionality (default if no other mode specified)\n    --debug-stt         Debug speech-to-text (print transcripts without typing)\n    --stt-url \u003cURL\u003e     Custom STT service URL (default: wss://api.deepgram.com/v2/listen)\n    -h, --help          Print help information\n    -V, --version       Print version information\n```\n\n**Note**: If no mode is specified, the application defaults to `--test-stt` behavior.\n\n## How It Works\n\n1. **Initialization**: Application starts with root privileges\n2. **Virtual Keyboard**: Creates `/dev/uinput` device as root\n3. **Privilege Drop**: Drops to original user privileges\n4. **Audio Access**: Accesses PipeWire/PulseAudio in user space\n5. **Speech Recognition**: Streams audio to **Deepgram Flux** STT service\n6. **Incremental Typing**: Updates text in real-time with smart backspacing\n7. **Turn Finalization**: Clears tracking on \"EndOfTurn\" events (user presses Enter manually)\n\n### Transcript Handling\n\nThe application provides sophisticated real-time transcript updates:\n\n- **Incremental Updates**: As speech is recognized, the application updates the typed text by finding the common prefix between the current and new transcript, backspacing only the changed portion, and typing the new ending\n- **Smart Backspacing**: Minimizes cursor movement by only removing characters that actually changed\n- **Turn Management**: On \"EndOfTurn\" events, the application clears its internal tracking but doesn't automatically press Enter, allowing users to review before submitting\n\n## About Deepgram Flux (Early Access)\n\n- **Endpoint**: `wss://api.deepgram.com/v2/listen`\n- **What it is**: Flux is Deepgram's turn‑taking, low‑latency STT API designed for conversational experiences.\n- **Authentication**: Send an `Authorization` header. Common forms:\n  - `Token \u003cDEEPGRAM_API_KEY\u003e` (what this app uses)\n  - `token \u003cDEEPGRAM_API_KEY\u003e` or `Bearer \u003cJWT\u003e` are also accepted by the platform\n- **Message types** (each server message includes a JSON `type` field):\n  - `Connected` — initial connection confirmation\n  - `TurnInfo` — streaming transcription updates with fields: `event` (`Update`, `StartOfTurn`, `Preflight`, `SpeechResumed`, `EndOfTurn`), `turn_index`, `audio_window_start`, `audio_window_end`, `transcript`, `words[] { word, confidence }`, `end_of_turn_confidence`\n  - `Error` — fatal error with fields: `code`, `description` (may also include a close code)\n  - `Configuration` — echoes/acknowledges configuration (e.g., thresholds) when provided\n- **Client close protocol**: After sending your final audio, send a control message:\n  - `{ \"type\": \"CloseStream\" }`\n  The server will flush any remaining responses and then close the WebSocket.\n- **Update cadence**: Flux produces updates about every **240 ms** with a typical worst‑case latency of ~**500 ms**.\n- **Common query parameters** (as supported by the preview spec):\n  - `model`, `encoding`, `sample_rate`, `preflight_threshold`, `eot_threshold`, `eot_timeout_ms`, `keyterm`, `mip_opt_out`, `tag`\n\n## Security\n\n- **Minimal Root Time**: Only root during virtual keyboard creation\n- **Environment Preservation**: Maintains user's audio session access\n- **Clean Privilege Drop**: Properly drops both user and group privileges\n- **No System Changes**: No permanent system configuration required\n\n## Troubleshooting\n\n### Audio Issues\n\nIf you get \"Host is down\" or \"I/O error\" when testing audio:\n\n1. **Use `sudo -E`**: Always preserve environment variables\n2. **Check PipeWire**: Ensure PipeWire is running: `systemctl --user status pipewire`\n3. **Test without sudo**: Try `./target/debug/voice-keyboard --test-audio` (will fail on keyboard creation but audio should work)\n\n### Permission Issues\n\nIf you get \"Permission denied\" for `/dev/uinput`:\n\n1. **Check uinput module**: `sudo modprobe uinput`\n2. **Verify device exists**: `ls -la /dev/uinput`\n3. **Use sudo**: The application is designed to run with `sudo -E`\n\n## Development\n\n### Project Structure\n\n```\nsrc/\n├── main.rs              # Main application and privilege dropping\n├── virtual_keyboard.rs  # Virtual keyboard device management\n├── audio_input.rs       # Audio capture and processing\n├── stt_client.rs        # WebSocket STT client\n└── input_event.rs       # Linux input event constants\n```\n\n### Key Components\n\n- **OriginalUser**: Captures and restores user context\n- **VirtualKeyboard**: Manages uinput device lifecycle with smart transcript updates\n- **AudioInput**: Cross-platform audio capture\n- **SttClient**: WebSocket-based speech-to-text client\n- **AudioBuffer**: Manages audio chunking for STT streaming\n\n## License\n\nISC License. See LICENSE.txt\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeepgram%2Fvoice-keyboard-linux","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeepgram%2Fvoice-keyboard-linux","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeepgram%2Fvoice-keyboard-linux/lists"}