An open API service indexing awesome lists of open source software.

https://github.com/remifabre/voice2chatgpt

One-key voice-to-transcription tool: record speech, transcribe locally with Whisper, and send to ChatGPT or improve with a local LLM.
https://github.com/remifabre/voice2chatgpt

chatgpt linux llm ollama open-source productivity python speech-recognition transcription voice voice-to-text whisper

Last synced: 4 months ago
JSON representation

One-key voice-to-transcription tool: record speech, transcribe locally with Whisper, and send to ChatGPT or improve with a local LLM.

Awesome Lists containing this project

README

          

# ๐ŸŽ™๏ธ Voice2ChatGPT

**Instant voice capture for transcription, clipboard, and ChatGPT interaction โ€“ all in one keypress.**

## ๐Ÿš€ Main Use Case

This tool makes it **effortless** to capture voice notes or ideas during your workflow. You hit a single key, talk, and it:

- records your voice;
- transcribes it using a local Whisper model;
- copies the text to your clipboard;
- optionally pastes it directly into ChatGPT;
- saves the audio and transcript into a clean, timestamped folder.

This is ideal for:

- code commentary,
- journaling,
- bug reporting,
- voice-based chat prompting,
- hands-free idea dumps.

---

## โœจ Features

- ๐ŸŽค Voice recording from a keypress (with visual feedback).
- ๐Ÿ”  Local Whisper transcription (via `faster-whisper`).
- ๐Ÿ“‹ Automatically copies text to clipboard.
- ๐Ÿง  [Optional] Local LLM cleanup & smart filename generation (via Ollama).
- ๐Ÿ’ฌ Paste directly into ChatGPT (existing or new tab).
- ๐Ÿ—‚๏ธ Saved as daily folders with time-based subfolders (`recordings/YYYY-MM-DD/HH-MM-SS/`).
- โŒจ๏ธ Can be launched with a **global keyboard shortcut**.

---

## ๐Ÿงฐ Requirements

Tested on **Ubuntu 22.04** with:

- Python 3.10+
- `faster-whisper` (for transcription)
- `ollama` with a small model (e.g. `gemma:2b`) [optional]
- `xdotool`, `ffmpeg`, `playsound`, `pyautogui`, `pyperclip`, `pynput`, `requests`

---

## ๐Ÿ“ฆ Installation

Create a fresh Python virtual environment:

```bash
python3 -m venv ~/.virtualenvs/voice2chatgpt
source ~/.virtualenvs/voice2chatgpt/bin/activate
pip install -r requirements.txt
````

You may also need system packages:

```bash
sudo apt install portaudio19-dev xdotool ffmpeg scrot
```

> Tip: If `playsound` gives warnings, ignore them or switch to a custom sound player.

---

## ๐Ÿง  Optional: Local LLM setup

To enable the text improvement and filename suggestion feature (mode 4):

1. [Install Ollama](https://ollama.com/)
2. If needed run `ollama serve`
3. Run:

```bash
ollama run gemma:2b
```
4. Make sure `OLLAMA_URL` and `OLLAMA_MODEL` are configured in `voice_transcriber.py`.

If Ollama is not available, the script will still function normally (just without smart cleanup).

---

## ๐Ÿ–ฑ๏ธ Launch with a Global Shortcut (Ubuntu only)

You can launch the tool with a single shortcut from anywhere:

1. Use the `run_transcriber.sh` file in this repo as a launcher.

2. Edit the paths inside it:

```bash
#!/bin/bash
source /home/YOUR_USER/.virtualenvs/voice2chatgpt/bin/activate
cd /home/YOUR_USER/path/to/voice2chatgpt
gnome-terminal -- bash -c 'python3 voice_transcriber.py; exec bash'
```

3. Make it executable:

```bash
chmod +x run_transcriber.sh
```

4. Go to **Settings > Keyboard > Shortcuts**, add a **custom shortcut**:

* Name: `Voice2ChatGPT`
* Command: `/full/path/to/run_transcriber.sh`
* Shortcut: for example `Ctrl + Alt + U`

That's it! From now on, pressing your chosen shortcut will open a terminal, start recording, and you can begin speaking immediately.

> ๐Ÿง  Similar shortcut systems can be set up on other OSes using AutoHotKey (Windows) or Automator (macOS), but are not included in this guide.

---

## ๐Ÿ—ƒ๏ธ Folder Structure

Each session is stored in:

```
recordings/
โ””โ”€โ”€ 2025-05-03/
โ””โ”€โ”€ 14-38-12/
โ”œโ”€โ”€ audio.wav
โ””โ”€โ”€ transcript.txt
```

If mode 4 is used, the folder will be renamed to include the suggested topic (e.g., `14-38-12_MercuryDashboardFix`).

---

## ๐Ÿงช Modes (choose after recording)

| Key | Action |
| --- | ------------------------------------ |
| 1 | Show transcription (default) |
| 2 | Paste into existing ChatGPT tab |
| 3 | Open ChatGPT and paste |
| 4 | Use local LLM to clean text & rename |
| 5 | Cancel (discard all) |

> Text is always copied to clipboard automatically.

---

## ๐Ÿ› ๏ธ TODO / Known Limitations

* Local LLM punctuation is optional, and may be slow on GPUs with limited VRAM.
* Visual ChatGPT field detection relies on screenshots (may be fragile).
* Currently Linux-only for automation features (xdotool, pyautogui).

---

## ๐Ÿงก Credits

* Whisper transcription by [faster-whisper](https://github.com/guillaumekln/faster-whisper)
* Optional LLM via [Ollama](https://ollama.com/)
* ChatGPT integration via Firefox + xdotool