https://github.com/thiswillbeyourgithub/quick-whisper-typer
Simple shell script to enter text using whisper using yad, openai, possibly chatgpt
https://github.com/thiswillbeyourgithub/quick-whisper-typer
Last synced: 5 months ago
JSON representation
Simple shell script to enter text using whisper using yad, openai, possibly chatgpt
- Host: GitHub
- URL: https://github.com/thiswillbeyourgithub/quick-whisper-typer
- Owner: thiswillbeyourgithub
- License: gpl-3.0
- Created: 2023-10-24T10:34:46.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-03-27T18:41:26.000Z (11 months ago)
- Last Synced: 2025-04-04T09:51:11.843Z (11 months ago)
- Language: Python
- Size: 325 KB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Quick Whisper Typer
Super simple python script to start recording sound, send it to whisper then have it type for you anywhere.
* Can also modify text according to voice commands.
* Latency is as low as I could (instant if deepgram is used, <1s for openai's whisper).
* It can be seen as a minimalist alternative to [AquaVoice](https://withaqua.com/) and can be extended easily to replace [Deepgram's Shortcut feature](https://deepgram.com/learn/introducing-shortcut-by-poised-voice-ai-tool).t
## The way each task works
### write
1. starts recording
2. when you're done press shift (escape or spacebar to cancel)
3. whisper will transcribe your speech
4.a if `--auto_paste` is True: your current clipboard will be saved, replaced by the transcription, "ctrl+v" will automatically be pressed, then your old clipboard will replace again like nothing happened.
4.b if `--auto_paste` is False: your clipboard will be replaced by the transcription
### transform_clipboard
1. starts recording
2. when you're done press shift (escape or spacebar to cancel)
3. whisper will transcribe your speech
4. the transcription will be interpreted as an instruction for `--llm_model` on how to transform the text found in your clipboard
5. the result will either be pasted or stored in the clipboard like for `--task=write`
### new_voice_chat
1. starts recording
2. when you're done press shift (escape or spacebar to cancel)
3. whisper will transcribe your speech
4. the transcription will be interpreted as the first user message in a conversation with `--llm_model`
5. the result will either be pasted or stored in the clipboard like for `--task=write`, and optionaly read aloud if `--voice_engine` is set
6. To continue the conversation, use the task `--task=continue_voice_chat`
# Examples
* I want to write text: `python quick_whisper_typer.py --task=write --auto_paste`
* I want to translate text: copy the text in to the clipboard then `python quick_whisper_typer.py --task=transform_clipboard --auto_paste`
* I want to start a vocal conversation: `python quick_whisper_typer.py --task="new_voice_chat" --voice_engine='openai'`
* I want to continue the conversation: `python quick_whisper_typer.py --task="continue_voice_chat" --voice_engine='openai'`
* I want to call it from anywhere without setting up keybindings, use `--loop` then press `shift` key several times from anywhere and you'll see a notification appear to trigger the tasks.
## Features
* Supports any spoken languages supported by whisper
* Supports both openai's whisper and [deepgram's whisper](deepgram.com)
* Supports for local transcription by supplying a custom URL.
* For example start [whispercpp](https://github.com/ggerganov/whisper.cpp) with `./server -m models/small_acft_q8_0.bin --threads 8 --audio-ctx 1500 -l fr --no-gpu --debug-mode --convert -p 1` ([models from FUTO](https://github.com/futo-org/whisper-acft/)) and use `--custom_transcription_url="http://127.0.0.1:8080/inference"`
* You can set these environment variables for custom transcription:
* `CUSTOM_WHISPER_API_KEY`: API key for the custom transcription server
* `CUSTOM_WHISPER_MODEL`: Model name to use with the custom transcription server
* Minimalist code
* Low latency: it starts as fast as possible to be ready to listen to you
* Four supported voice_engine: openai, [piper](https://github.com/rhasspy/piper), [deepgram](deepgram.com), espeak (fallback if any of the other fails)
* Optional audio cleanup and long silence removal via sox
* `--loop` to trigger the script from anywhere just by pressing shift multiple times. You can define any king of argument to customize your loop shortcuts by passing a dict to `--loop_tasks`
* Support virtually any type of LLM (ChatGPT, Claude, Huggingface, Llama, etc) thanks to [litellm](https://docs.litellm.ai/).
* Supposedly multiplatform, but I can't test it on anything else than Linux so please open an issue to tell me how it went!
## How to
* Make sure your environment contains the appropriate api keys (eg as OPENAI_API_KEY, MISTRAL_API_KEY, DEEPGRAM_API_KEY etc)
* *optional: add a keyboard shortcut to call this script. See my i3 bindings below.*
* If using deepgram: make sure you are on python 3.10+
* create a venv: `uv venv --python 3.10` and activate it `source .venv/bin/activate`
* `pip install -r requirements.txt`
* if you have issues installing the python package `playsound`, try installing `playsound3` instead.
### Run in the background using systemd units
To always have quick_whisper_typer running in the background, you can do this after modifying the `quick_whisper_typer_launcher.sh` file and `chmod +x it`:
```shell
mkdir -p ~/.config/systemd/user/
echo "[Unit]
Description=quick_whisper_typer
After=graphical-session.target
[Service]
Type=simple
ExecStart=[YOUR_APPROPRIATE_PATH]/Quick_Whisper_Typer/quick_whisper_typer_launcher.sh
Restart=on-failure
RestartSec=10
[Install]
; WantedBy=graphical-session.target
WantedBy=default.target
" > ~/.config/systemd/user/quick_whisper_typer.service
systemctl --user enable quick_whisper_typer.service
systemctl --user start quick_whisper_typer.service
```
### i3 bindings
```
mode "$mode_launch_microphone" {
# enter text
bindsym f exec /PATH/TO/quick_whisper_typer.py --task write, mode "default
# edit clipboard
bindsym e exec /PATH/TO/quick_whisper_typer.py --task=transform_clipboard, mode "default"
bindsym v exec /PATH/TO/quick_whisper_typer.py --task=continue_voice_chat, mode "default"
bindsym shift+V exec /PATH/TO/quick_whisper_typer.py --task=new_voice_chat, mode "default"
bindsym Return mode "default"
bindsym Escape mode "default"
}
```
# Credits
* `.ogg` files were in my `/usr/share/sounds/ubuntu/notifications` folder.