https://github.com/yuhao-he/voice-input-assistant
Cloud based voice input
https://github.com/yuhao-he/voice-input-assistant
gcp gemini producitivity voice-to-text
Last synced: 3 months ago
JSON representation
Cloud based voice input
- Host: GitHub
- URL: https://github.com/yuhao-he/voice-input-assistant
- Owner: yuhao-he
- Created: 2026-02-10T07:45:48.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-03-07T09:21:16.000Z (3 months ago)
- Last Synced: 2026-03-07T16:44:44.650Z (3 months ago)
- Topics: gcp, gemini, producitivity, voice-to-text
- Language: Python
- Homepage:
- Size: 818 KB
- Stars: 3
- Watchers: 0
- Forks: 2
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-gcp-llm-projects - Voice Input Assistant - A cloud-based voice input assistant using GCP and Gemini for enhanced productivity. #Voice #Productivity #Gemini (Projects / Chatbots & Assistants)
README
# VIA
**VIA (Voice Input Assistant)** is a real-time transcription desktop application designed to bridge the gap between your voice and your computer. It provides high-accuracy, real-time speech transcription and acts as your intelligent typing assistant across any application you use.
Supports macOS, Windows, and Linux (X11 only; Wayland is not supported).
## Features & Capabilities
- **Instant Voice Input:** Easily trigger transcription via push-to-talk (hold) or tap-to-talk modes.
- **AI Post-Processing:** Optionally refine, format, or improve your transcribed text automatically using Gemini before it even hits your screen.
- **Auto-Paste Integration:** Your transcribed and processed text is immediately inserted into whatever app or window you are actively working in.
- **Floating Transcript Interface:** View your live transcription through a non-intrusive floating overlay.
- **History & Editing Control:** Access a chat-like history menu of your prior voice notes, giving you full control to copy, edit, or re-insert transcripts as needed.
## Prerequisites
- **Python 3.10+**
- **Google Cloud account** with a billing-enabled project
- A **Google Cloud API key**
## API Key Setup
1. Go to [console.cloud.google.com](https://console.cloud.google.com)
2. Select or create a project with billing enabled
3. Create an API key (**APIs & Services → Credentials → Create Credentials → API key**)
4. Restrict the API key to required APIs (**APIs & Services → Library**):
- **Cloud Speech-to-Text API**
- **Generative Language API** *(for intelligent post-processing)*
5. Launch the app, click the 🎙 icon in the menu bar → **Show / Hide Settings**, and paste the key into the **Google Cloud API Key** field
## Installation & Usage
You can run VIA either by downloading a pre-built standalone app or by building it from source.
### 🍎 macOS
#### Option 1: Using Pre-built Binaries (Recommended)
1. Go to the [Releases](https://github.com/yuhao-he/voice-input-assistant/releases) page.
2. Download the `VoiceInputAssistant.dmg` file from the latest release.
3. Open the `.dmg` and drag **VoiceInputAssistant.app** to your Applications folder.
4. Open the application. *Note: Since this app is currently unsigned by an Apple Developer, macOS Gatekeeper may block it initially. To bypass this, **Right-Click** the app in Finder and select **Open**.*
#### Option 2: Building from Source
```bash
# Clone the repository
git clone https://github.com/yuhao-he/voice-input-assistant.git
cd voice-input-assistant
# Create a virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run the app
python main.py
```
#### macOS Permissions (Required)
Before VIA can listen to your voice or inject text, you must grant it permissions in macOS.
1. Go to **System Settings -> Privacy & Security**.
2. **Accessibility:** Enable this permission so VIA can send keyboard commands to automatically paste text.
- *If using Option 1 (Binary):* Add and enable `VoiceInputAssistant.app`.
- *If using Option 2 (Source):* Enable your host application (e.g., `Terminal`, `iTerm2`, or `VS Code`).
3. **Input Monitoring:** Enable this permission so VIA can listen for your global push-to-talk hotkeys when it is running in the background.
- *If using Option 1 (Binary):* Add and enable `VoiceInputAssistant.app`.
- *If using Option 2 (Source):* Enable your host application.
4. **Microphone (Option 1 Only):** The first time you press the hotkey in the standalone app, macOS will prompt you to allow Microphone access. Click **OK**.
> **Note on using the `Fn` (Globe) key as a hotkey:** If you intend to use the `Fn` key as your hotkey, you will likely need to disable macOS system shortcuts that use it to prevent conflicts.
---
### 🪟 Windows
#### Option 1: Using Pre-built Binaries (Recommended)
1. Go to the [Releases](https://github.com/yuhao-he/voice-input-assistant/releases) page.
2. Download the `VoiceInputAssistant-Windows.zip` file.
3. Extract the ZIP file and run `VoiceInputAssistant.exe`.
#### Option 2: Building from Source
```powershell
# Clone the repo
git clone https://github.com/yuhao-he/voice-input-assistant.git
cd voice-input-assistant
# Create and activate virtual environment
python -m venv venv
venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
pip install "PyQt6==6.6.1" "PyQt6-Qt6==6.6.2"
# Run the app
python main.py
```
---
### 🐧 Linux
#### Option 1: Using Pre-built Binaries (Recommended)
1. Go to the [Releases](https://github.com/yuhao-he/voice-input-assistant/releases) page.
2. Download the `VoiceInputAssistant-Linux.tar.gz` file.
3. Extract the archive and run the `VoiceInputAssistant` executable.
#### Option 2: Building from Source
To build on Linux, you will need C headers for the audio bindings (`libasound2-dev` or `portaudio19-dev`).
```bash
# Clone the repository
git clone https://github.com/yuhao-he/voice-input-assistant.git
cd voice-input-assistant
# Create a virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run the app
python main.py
```
## Running the App
Once running, press **`Ctrl` + `Shift` + `Alt` + `Q`** anywhere to show or hide the settings menu.
You can also access the settings menu via the microphone icon in your system tray or macOS menu bar.