https://github.com/remifabre/voice2chatgpt
One-key voice-to-transcription tool: record speech, transcribe locally with Whisper, and send to ChatGPT or improve with a local LLM.
https://github.com/remifabre/voice2chatgpt
chatgpt linux llm ollama open-source productivity python speech-recognition transcription voice voice-to-text whisper
Last synced: 4 months ago
JSON representation
One-key voice-to-transcription tool: record speech, transcribe locally with Whisper, and send to ChatGPT or improve with a local LLM.
- Host: GitHub
- URL: https://github.com/remifabre/voice2chatgpt
- Owner: RemiFabre
- Created: 2025-05-02T18:27:39.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-05-15T06:45:48.000Z (5 months ago)
- Last Synced: 2025-05-15T07:34:13.053Z (5 months ago)
- Topics: chatgpt, linux, llm, ollama, open-source, productivity, python, speech-recognition, transcription, voice, voice-to-text, whisper
- Language: Python
- Homepage: https://github.com/RemiFabre/voice2chatgpt
- Size: 82 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐๏ธ Voice2ChatGPT
**Instant voice capture for transcription, clipboard, and ChatGPT interaction โ all in one keypress.**
## ๐ Main Use Case
This tool makes it **effortless** to capture voice notes or ideas during your workflow. You hit a single key, talk, and it:
- records your voice;
- transcribes it using a local Whisper model;
- copies the text to your clipboard;
- optionally pastes it directly into ChatGPT;
- saves the audio and transcript into a clean, timestamped folder.This is ideal for:
- code commentary,
- journaling,
- bug reporting,
- voice-based chat prompting,
- hands-free idea dumps.---
## โจ Features
- ๐ค Voice recording from a keypress (with visual feedback).
- ๐ Local Whisper transcription (via `faster-whisper`).
- ๐ Automatically copies text to clipboard.
- ๐ง [Optional] Local LLM cleanup & smart filename generation (via Ollama).
- ๐ฌ Paste directly into ChatGPT (existing or new tab).
- ๐๏ธ Saved as daily folders with time-based subfolders (`recordings/YYYY-MM-DD/HH-MM-SS/`).
- โจ๏ธ Can be launched with a **global keyboard shortcut**.---
## ๐งฐ Requirements
Tested on **Ubuntu 22.04** with:
- Python 3.10+
- `faster-whisper` (for transcription)
- `ollama` with a small model (e.g. `gemma:2b`) [optional]
- `xdotool`, `ffmpeg`, `playsound`, `pyautogui`, `pyperclip`, `pynput`, `requests`---
## ๐ฆ Installation
Create a fresh Python virtual environment:
```bash
python3 -m venv ~/.virtualenvs/voice2chatgpt
source ~/.virtualenvs/voice2chatgpt/bin/activate
pip install -r requirements.txt
````You may also need system packages:
```bash
sudo apt install portaudio19-dev xdotool ffmpeg scrot
```> Tip: If `playsound` gives warnings, ignore them or switch to a custom sound player.
---
## ๐ง Optional: Local LLM setup
To enable the text improvement and filename suggestion feature (mode 4):
1. [Install Ollama](https://ollama.com/)
2. If needed run `ollama serve`
3. Run:```bash
ollama run gemma:2b
```
4. Make sure `OLLAMA_URL` and `OLLAMA_MODEL` are configured in `voice_transcriber.py`.If Ollama is not available, the script will still function normally (just without smart cleanup).
---
## ๐ฑ๏ธ Launch with a Global Shortcut (Ubuntu only)
You can launch the tool with a single shortcut from anywhere:
1. Use the `run_transcriber.sh` file in this repo as a launcher.
2. Edit the paths inside it:
```bash
#!/bin/bash
source /home/YOUR_USER/.virtualenvs/voice2chatgpt/bin/activate
cd /home/YOUR_USER/path/to/voice2chatgpt
gnome-terminal -- bash -c 'python3 voice_transcriber.py; exec bash'
```3. Make it executable:
```bash
chmod +x run_transcriber.sh
```4. Go to **Settings > Keyboard > Shortcuts**, add a **custom shortcut**:
* Name: `Voice2ChatGPT`
* Command: `/full/path/to/run_transcriber.sh`
* Shortcut: for example `Ctrl + Alt + U`That's it! From now on, pressing your chosen shortcut will open a terminal, start recording, and you can begin speaking immediately.
> ๐ง Similar shortcut systems can be set up on other OSes using AutoHotKey (Windows) or Automator (macOS), but are not included in this guide.
---
## ๐๏ธ Folder Structure
Each session is stored in:
```
recordings/
โโโ 2025-05-03/
โโโ 14-38-12/
โโโ audio.wav
โโโ transcript.txt
```If mode 4 is used, the folder will be renamed to include the suggested topic (e.g., `14-38-12_MercuryDashboardFix`).
---
## ๐งช Modes (choose after recording)
| Key | Action |
| --- | ------------------------------------ |
| 1 | Show transcription (default) |
| 2 | Paste into existing ChatGPT tab |
| 3 | Open ChatGPT and paste |
| 4 | Use local LLM to clean text & rename |
| 5 | Cancel (discard all) |> Text is always copied to clipboard automatically.
---
## ๐ ๏ธ TODO / Known Limitations
* Local LLM punctuation is optional, and may be slow on GPUs with limited VRAM.
* Visual ChatGPT field detection relies on screenshots (may be fragile).
* Currently Linux-only for automation features (xdotool, pyautogui).---
## ๐งก Credits
* Whisper transcription by [faster-whisper](https://github.com/guillaumekln/faster-whisper)
* Optional LLM via [Ollama](https://ollama.com/)
* ChatGPT integration via Firefox + xdotool