{"id":26122200,"url":"https://github.com/mpaepper/vibevoice","last_synced_at":"2025-04-13T13:15:44.597Z","repository":{"id":277354899,"uuid":"930567533","full_name":"mpaepper/vibevoice","owner":"mpaepper","description":"Fast local speech-to-text for any app using faster-whisper","archived":false,"fork":false,"pushed_at":"2025-03-31T13:51:48.000Z","size":1037,"stargazers_count":61,"open_issues_count":0,"forks_count":5,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-13T13:15:34.143Z","etag":null,"topics":["ai","machine-learning","python"],"latest_commit_sha":null,"homepage":"https://www.paepper.com/blog/posts/vibe-coding-with-vibevoice-fast-speech-to-text-for-any-app/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mpaepper.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-10T20:59:33.000Z","updated_at":"2025-04-04T04:30:59.000Z","dependencies_parsed_at":"2025-02-13T14:31:14.036Z","dependency_job_id":"778dea4d-3da6-45fc-92dd-ab01d3fb8ca6","html_url":"https://github.com/mpaepper/vibevoice","commit_stats":null,"previous_names":["mpaepper/vibevoice"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpaepper%2Fvibevoice","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpaepper%2Fvibevoice/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpaepper%2Fvibevoice/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpaepper%2Fvibevoice/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mpaepper","download_url":"https://codeload.github.com/mpaepper/vibevoice/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248717238,"owners_count":21150389,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","machine-learning","python"],"created_at":"2025-03-10T14:44:24.509Z","updated_at":"2025-04-13T13:15:44.588Z","avatar_url":"https://github.com/mpaepper.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Vibevoice 🎙️\n\nHi, I'm [Marc Päpper](https://x.com/mpaepper) and I wanted to vibe code like [Karpathy](https://x.com/karpathy/status/1886192184808149383) ;D, so I looked around and found the cool work of [Vlad](https://github.com/vlad-ds/whisper-keyboard). I extended it to run with a local whisper model, so I don't need to pay for OpenAI tokens.\nI hope you have fun with it!\n\n## What it does 🚀\n\n![Demo Video](docs/vibevoice-demo-caption.gif)\n\nSimply run `cli.py` and start dictating text anywhere in your system:\n1. Hold down right control key (Ctrl_r)\n2. Speak your text\n3. Release the key\n4. Watch as your spoken words are transcribed and automatically typed!\n\nWorks in any application or window - your text editor, browser, chat apps, anywhere you can type!\n\nNEW: LLM voice command mode:\n\n1. Hold down the scroll_lock key (I think it's normally not used anymore that's why I chose it)\n2. Speak what you want the LLM to do\n3. The LLM receives your transcribed text and a screenshot of your current view\n4. The LLM answer is typed into your keyboard (streamed)\n\nWorks everywhere on your system and the LLM always has the screen context\n\n## Installation 🛠️\n\n```bash\ngit clone https://github.com/mpaepper/vibevoice.git\ncd vibevoice\npip install -r requirements.txt\npython src/vibevoice/cli.py\n```\n\n## Requirements 📋\n\n### Python Dependencies\n- Python 3.12 or higher\n\n### System Requirements\n- CUDA-capable GPU (recommended) -\u003e in server.py you can enable cpu use\n- CUDA 12.x\n- cuBLAS\n- cuDNN 9.x\n- In case you get this error: `OSError: PortAudio library not found` run `sudo apt install libportaudio2`\n- [Ollama](https://ollama.com) for AI command mode (with multimodal models for screenshot support)\n\n#### Setting up Ollama\n1. Install Ollama by following the instructions at [ollama.com](https://ollama.com)\n2. Pull a model that supports both text and images for best results:\n   ```bash\n   ollama pull gemma3:27b  # Great model which can run on RTX 3090 or similar\n   ```\n3. Make sure Ollama is running in the background:\n   ```bash\n   ollama serve\n   ```\n\n#### Handling the CUDA requirements\n\n* Make sure that you have CUDA \u003e= 12.4 and cuDNN \u003e= 9.x\n* I had some trouble at first with Ubuntu 24.04, so I did the following:\n\n```bash\nsudo apt update \u0026\u0026 sudo apt upgrade\nsudo apt autoremove nvidia* --purge\nubuntu-drivers devices\nsudo ubuntu-drivers autoinstall\nwget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb\nsudo dpkg -i cuda-keyring_1.1-1_all.deb \u0026\u0026 sudo apt update\nsudo apt install cuda-toolkit-12-8\n```\nor alternatively:\n\n``` \nwget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb\nsudo dpkg -i cuda-keyring_1.1-1_all.deb\nsudo apt update\nsudo apt install cudnn9-cuda-12\n```\n\n* Then after rebooting, it worked well.\n\n## Usage 💡\n\n1. Start the application:\n```bash\npython src/vibevoice/cli.py\n```\n\n2. Hold down right control key (Ctrl_r) while speaking\n3. Release to transcribe\n4. Your text appears wherever your cursor is!\n\n### Configuration\n\nYou can customize various aspects of VibeVoice with the following environment variables:\n\n#### Keyboard Controls\n- `VOICEKEY`: Change the dictation activation key (default: \"ctrl_r\")\n  ```bash\n  export VOICEKEY=\"ctrl\"  # Use left control instead\n  ```\n- `VOICEKEY_CMD`: Set the key for AI command mode (default: \"scroll_lock\")\n  ```bash\n  export VOICEKEY_CMD=\"ctsl\"  # Use left control instead of Scroll Lock key\n  ```\n\n#### AI and Screenshot Features\n- `OLLAMA_MODEL`: Specify which Ollama model to use (default: \"gemma3:27b\")\n  ```bash\n  export OLLAMA_MODEL=\"gemma3:4b\"  # Use a smaller VLM in case you have less GPU RAM\n  ```\n- `INCLUDE_SCREENSHOT`: Enable or disable screenshots in AI command mode (default: \"true\")\n  ```bash\n  export INCLUDE_SCREENSHOT=\"false\"  # Disable screenshots (but they are local only anyways)\n  ```\n- `SCREENSHOT_MAX_WIDTH`: Set the maximum width for screenshots (default: \"1024\")\n  ```bash\n  export SCREENSHOT_MAX_WIDTH=\"800\"  # Smaller screenshots\n  ```\n\n#### Screenshot Dependencies\nTo use the screenshot functionality:\n```bash\nsudo apt install gnome-screenshot\n```\n\n## Usage Modes 💡\n\nVibeVoice supports two modes:\n\n### 1. Dictation Mode\n1. Hold down the dictation key (default: right Control)\n2. Speak your text\n3. Release to transcribe\n4. Your text appears wherever your cursor is!\n\n### 2. AI Command Mode\n1. Hold down the command key (default: Scroll Lock)\n2. Ask a question or give a command\n3. Release the key\n4. The AI will analyze your request (and current screen if enabled) and type a response\n\n## Credits 🙏\n\n- Original inspiration: [whisper-keyboard](https://github.com/vlad-ds/whisper-keyboard) by Vlad\n- [Faster Whisper](https://github.com/guillaumekln/faster-whisper) for the optimized Whisper implementation\n- Built by [Marc Päpper](https://www.paepper.com)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmpaepper%2Fvibevoice","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmpaepper%2Fvibevoice","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmpaepper%2Fvibevoice/lists"}