{"id":22160904,"url":"https://github.com/themanyone/voice_typing","last_synced_at":"2025-10-28T00:44:58.725Z","repository":{"id":157277650,"uuid":"630316972","full_name":"themanyone/voice_typing","owner":"themanyone","description":"State-of-the-art offline voice typing everywhere + txt terminals (Linux or WFL sesson on Windows.) with a simple bash script. Usable with X. Does not require X.","archived":false,"fork":false,"pushed_at":"2025-05-08T11:55:37.000Z","size":47,"stargazers_count":113,"open_issues_count":1,"forks_count":9,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-05-26T08:55:10.687Z","etag":null,"topics":["bash","command-line","free","lightweight","open-source","privacy","simple","speech-recognition","speech-to-text"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/themanyone.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":"FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-04-20T06:02:06.000Z","updated_at":"2025-05-20T22:45:58.000Z","dependencies_parsed_at":null,"dependency_job_id":"8e2d133c-a265-4881-87b2-7bb5e8a1985b","html_url":"https://github.com/themanyone/voice_typing","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/themanyone/voice_typing","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/themanyone%2Fvoice_typing","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/themanyone%2Fvoice_typing/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/themanyone%2Fvoice_typing/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/themanyone%2Fvoice_typing/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/themanyone","download_url":"https://codeload.github.com/themanyone/voice_typing/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/themanyone%2Fvoice_typing/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281366883,"owners_count":26488696,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-27T02:00:05.855Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bash","command-line","free","lightweight","open-source","privacy","simple","speech-recognition","speech-to-text"],"created_at":"2024-12-02T04:11:14.835Z","updated_at":"2025-10-28T00:44:58.713Z","avatar_url":"https://github.com/themanyone.png","language":"Shell","funding_links":["https://buymeacoffee.com/isreality"],"categories":[],"sub_categories":[],"readme":"# Voice Typing with Openai-Whisper\n\nState-of-the-art, offline voice typing in Linux tty (or WFL sesson on Windows.) with a tiny bash script.\n\n**No grapical OS required.** Yet it *does* work with GUI X, wayland, whatever. Speak text everywhere, like a boss.\n\n- Privacy-focused. Uses [Whisper AI](https://github.com/openai/whisper) or [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) for offline speech recognition,\n- Hands-free using `sox` for rudimentary voice activity detection (VAD).\n- Leverages `ydotool` to type text into any active window (but does not require a graphical OS).\n- Low memory requirements. Resources may be freed between each spoken interaction.\n\n## Caveats\n\nWhen `voice_typing` detects speech, it trims unwanted background noise, and then loads Whisper, which causes a noticeable wait before text appears. It is good for occasional use. And it is the most economical on resources.\n\nFor heavier usage, instead of loading and unloading Whisper multiple times, we have added `voice_client`. It connects to your [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) server. The server may run continuously on the same machine, or a dedicated GPU server somewhere across the network. Try it. Users might discover significant speedup. :)\n\nFor even-faster, continuous, networked dictation with more features, try the [whisper_dictation](https://github.com/themanyone/whisper_dictation.git) AI assistant project. Features include a conversational chatbot, AI image generation, and voice-controlled program launchers leveraging the full power of Python.\n\n**End feature creep.** This project is just a starting point, and will remain so. [There is no end to what you might do from here](https://github.com/ReimuNotMoe/ydotool). If you have time, take [record.py](https://github.com/themanyone/whisper_dictation/blob/main/record.py) from [whisper_dictation](https://github.com/themanyone/whisper_dictation.git). Just (click to download the file) and adapt this script to use that instead of `sox`. It runs a delay loop \"time machine\" that rewinds to catch the beginning of speech. So you don't have to clear your throat or say, \"hey Linus\" before talking. Gstreamer-1.0 is required to run it though. And some people prefer minimal [emebeded systems on a chip](https://www.reddit.com/r/embedded/comments/16xakmp/how_to_design_a_simple_pcb_running_linux/).\n\n## Requirements\n- [Whisper AI](https://github.com/openai/whisper) or [Whisper.cpp](https://github.com/ggerganov/whisper.cpp)\n- [ffmpeg](https://ffmpeg.org/)\n- [sox](https://sox.sourceforge.net/)\n- [lame](https://lame.sourceforge.io/)\n- \u003cdel\u003e[xdotool](https://github.com/jordansissel/xdotool)\u003c/del\u003e\n  - [ydotool](https://github.com/ReimuNotMoe/ydotool)\n- [tmux](https://github.com/tmux/tmux/wiki) or [screen](https://linuxize.com/post/how-to-use-linux-screen/) (optional)\n- [curl](https://curl.se/) (for clients)\n\n## Install Dependencies\n\nThis assumes [Whisper AI](https://github.com/openai/whisper) or [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) and dependencies are installed and working. Most are available through the official software update app for each platform. Please examine `voice_typing` and `voice_client` scripts and see how easy they are to customize for any occasion. They are around 50 lines is all. Do not run untrusted code.\n\nFedora/Centos:\n```\ndnf -y install sox curl lame ydotool\nsudo cp /usr/lib/systemd/system/ydotool.service /etc/systemd/system/\nsudo chmod +x /etc/systemd/system/ydotool.service\nsudo systemctl daemon-reload\nsudo systemctl enable ydotool\nsudo systemctl start ydotool\nsudo chmod +s $(which ydotool)\n```\n\nYou might need Rpmfusion-freeworld installed to get versions of `lame` and `sox` that write mp3 files. `sudo dnf install \\ https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm`\n\nThe `ydotool` package has instructions in `/usr/share/doc/ydotool/README.md` where they say the man page may not be up to date.\n\nDebian-based systems:\n```\nsudo apt install sox curl lame ydotool openai-whisper libsox-fmt-mp3 scdoc\n```\n\nIf ydotool is not available, or you need a later version, snwfdhmp commented:\n\n```\ngit clone https://github.com/ReimuNotMoe/ydotool\ncd ydotool\nmkdir build\ncd build\ncmake -DSYSTEMD_SYSTEM_SERVICE=ON -DSYSTEMD_USER_SERVICE=OFF ..\nmake -j `nproc`\nsudo ln -s $(pwd)/ydotool /usr/local/bin/ydotool\nsudo ln -s $(pwd)/ydotoold /usr/local/bin/ydotoold\nsudo cp ./ydotoold.service /etc/systemd/system/\nsudo systemctl daemon-reload\nsudo systemctl enable ydotoold.service # later changed to ydotool.service\nsudo systemctl start ydotoold.service\n```\nFor quick troubleshooting\n```\nsudo systemctl status ydotoold  # Should show \"Active: active (running)\"\njournalctl -u ydotoold -b | tail -n 20\n```\n\n## Setup\n\nEdit `.bashrc` and add the line, `export YDOTOOL_SOCKET=/tmp/.ydotool_socket`\n\n```\ngit clone https://github.com/themanyone/voice_typing.git\nsudo systemctl enable ydotool.service\nsudo systemctl start ydotool.service\nydotool type hello!\ncd voice_typing\n./voice_typing\n```\n\nSpeak and text appears. No other interaction is required.\n\n## Optional Whisper.cpp client/server setup.\n\nCompile [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) with some type of acceleration for best results. We are using cuBLAS for about 4x speedup. If it complains about unsupported compiler, the best option is to search for rpms to install the compatability version of `gcc`, currently `gcc-13`. Fedora 42 with version 15 of gcc is not supported. But it can work if you remove the compatability version of `gcc-14` and replace it with `gcc-13` from Fedora 41 repos. [Refer to the writeup for whisier_dictatio](https://github.com/themanyone/whisper_dictation#Preparation).\n\nYou may run into [issues](https://github.com/ggerganov/whisper.cpp/issues/1587) that make you want to try compiling with `-allow-unsupported-compiler`. Not recommended. The `-ng` (no graphics) flag will make it work though. Runing with `-ng` is not ideal, but matrix multiplcations will still use cuBLAS for CPU, so about 2x speedup similar to openBLAS.\n\nTo minimize resources, launch `server` with `ggml-tiny.en.bin`. It uses just over 111 MiB VRAM on our budget laptop. (48MiB with `ggml-tiny.en-q4_0.bin` quantized to 4Bits, which is also usable with no graphics card).\n\n```shell\n./whisper-server -l en -m models/ggml-tiny.en.bin --port 7777 --convert\n```\n\nEdit `voice_client` to change the server location from localhost to wherever it resides on the network.\n\nRun it.\n```shell\n./voice_client\n```\n\n## Notes\n\n- Adjust mic volume for best results. If recording never stops, mic volume is up too high. If you can't adjust volume for some reason, edit `voice_typing` or `voice_client`. And change silence-detection threshold from 4% and 2% to something higher.\n```rec -c 1 -r 22050 -t mp3 \"$tmp\" silence 1 0.2 6% 1 1.0 5%```\n\n- Optionally create a Keybinding for mic mute/unmute. If there is continuous noise in the background, it goes into a recording loop and never gets around to typing text.\n\n- First run of `voice_typing` might be slow as it needs to download the model (better yet, use whisper or whisper.cpp from cli first to download the model (tiny))\n\n## Troubleshooting\n\"failed to connect socket `/tmp/.ydotool_socket': Permission denied\" Error\n\nWhen encountering the error \"failed to connect socket `/tmp/.ydotool_socket': Permission denied,\" it's essential to ensure that the current user has sufficient permissions to access the socket file. Here are some steps to troubleshoot this issue:\n\nCheck User Permissions and Service Status.\n    Ensure that the user has been added to the \"input\" group and has the necessary permissions to access the socket file.\n    Verify the status of the ydotool service to ensure it is running as expected.\n\nSetuid Bit on the Executable.\n    Consider setting the setuid bit on the ydotool executable using the command:\n\n    sudo chmod +s $(which ydotool)\n\nThis step can help address permission issues when running ydotool as a user.\n\nAddress Already in Use.\n    If encountering the error \"failed to bind socket: Address already in use,\" it may be necessary to delete the socket file from /tmp to resolve the issue.\n\nLinking to the Expected Socket.\n    If ydotool started as a user looks for the socket \"/run/user/1000/.ydotool_socket\" but the daemon as a systemwide service listens to /tmp/.ydotool_socket, consider creating a link to the expected socket to ensure proper functionality.\n\nReport others issues in the [GitHub issue tracker](https://github.com/themanyone/voice_typing).\n\nThanks for trying voice_typing!\n\n## Similar Projects\n\n- [Whisper Typer Tool](https://github.com/dynamiccreator/whisper-typer-tool)\n- [Whisper Dictation](https://github.com/themanyone/whisper_dictation.git)\n\n### Thanks for trying out Voice Typing!\n\n- GitHub https://github.com/themanyone\n- YouTube https://www.youtube.com/themanyone\n- Mastodon https://mastodon.social/@themanyone\n- Linkedin https://www.linkedin.com/in/henry-kroll-iii-93860426/\n- Buy me a coffee https://buymeacoffee.com/isreality\n- [TheNerdShow.com](http://thenerdshow.com/)\n\nCopyright (C) 2024-2025 Henry Kroll III, www.thenerdshow.com.\nSee [LICENSE](LICENSE) for details.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthemanyone%2Fvoice_typing","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthemanyone%2Fvoice_typing","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthemanyone%2Fvoice_typing/lists"}