https://github.com/yuhao-he/voice-input-assistant

Cloud based voice input
https://github.com/yuhao-he/voice-input-assistant

gcp gemini producitivity voice-to-text

Last synced: 3 months ago
JSON representation

Cloud based voice input

Host: GitHub
URL: https://github.com/yuhao-he/voice-input-assistant
Owner: yuhao-he
Created: 2026-02-10T07:45:48.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-03-07T09:21:16.000Z (3 months ago)
Last Synced: 2026-03-07T16:44:44.650Z (3 months ago)
Topics: gcp, gemini, producitivity, voice-to-text
Language: Python
Homepage:
Size: 818 KB
Stars: 3
Watchers: 0
Forks: 2
Open Issues: 5
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-gcp-llm-projects - Voice Input Assistant - A cloud-based voice input assistant using GCP and Gemini for enhanced productivity. #Voice #Productivity #Gemini (Projects / Chatbots & Assistants)

README

# VIA

**VIA (Voice Input Assistant)** is a real-time transcription desktop application designed to bridge the gap between your voice and your computer. It provides high-accuracy, real-time speech transcription and acts as your intelligent typing assistant across any application you use.

Supports macOS, Windows, and Linux (X11 only; Wayland is not supported).

## Features & Capabilities

- **Instant Voice Input:** Easily trigger transcription via push-to-talk (hold) or tap-to-talk modes.
- **AI Post-Processing:** Optionally refine, format, or improve your transcribed text automatically using Gemini before it even hits your screen.
- **Auto-Paste Integration:** Your transcribed and processed text is immediately inserted into whatever app or window you are actively working in.
- **Floating Transcript Interface:** View your live transcription through a non-intrusive floating overlay.
- **History & Editing Control:** Access a chat-like history menu of your prior voice notes, giving you full control to copy, edit, or re-insert transcripts as needed.

## Prerequisites

- **Python 3.10+**
- **Google Cloud account** with a billing-enabled project
- A **Google Cloud API key**

## API Key Setup

1. Go to [console.cloud.google.com](https://console.cloud.google.com)
2. Select or create a project with billing enabled
3. Create an API key (**APIs & Services → Credentials → Create Credentials → API key**)
4. Restrict the API key to required APIs (**APIs & Services → Library**):
- **Cloud Speech-to-Text API**
- **Generative Language API** *(for intelligent post-processing)*
5. Launch the app, click the 🎙 icon in the menu bar → **Show / Hide Settings**, and paste the key into the **Google Cloud API Key** field

## Installation & Usage

You can run VIA either by downloading a pre-built standalone app or by building it from source.

### 🍎 macOS

#### Option 1: Using Pre-built Binaries (Recommended)
1. Go to the [Releases](https://github.com/yuhao-he/voice-input-assistant/releases) page.
2. Download the `VoiceInputAssistant.dmg` file from the latest release.
3. Open the `.dmg` and drag **VoiceInputAssistant.app** to your Applications folder.
4. Open the application. *Note: Since this app is currently unsigned by an Apple Developer, macOS Gatekeeper may block it initially. To bypass this, **Right-Click** the app in Finder and select **Open**.*

#### Option 2: Building from Source
```bash
# Clone the repository
git clone https://github.com/yuhao-he/voice-input-assistant.git
cd voice-input-assistant

# Create a virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run the app
python main.py
```

#### macOS Permissions (Required)
Before VIA can listen to your voice or inject text, you must grant it permissions in macOS.
1. Go to **System Settings -> Privacy & Security**.
2. **Accessibility:** Enable this permission so VIA can send keyboard commands to automatically paste text.
- *If using Option 1 (Binary):* Add and enable `VoiceInputAssistant.app`.
- *If using Option 2 (Source):* Enable your host application (e.g., `Terminal`, `iTerm2`, or `VS Code`).
3. **Input Monitoring:** Enable this permission so VIA can listen for your global push-to-talk hotkeys when it is running in the background.
- *If using Option 1 (Binary):* Add and enable `VoiceInputAssistant.app`.
- *If using Option 2 (Source):* Enable your host application.
4. **Microphone (Option 1 Only):** The first time you press the hotkey in the standalone app, macOS will prompt you to allow Microphone access. Click **OK**.

> **Note on using the `Fn` (Globe) key as a hotkey:** If you intend to use the `Fn` key as your hotkey, you will likely need to disable macOS system shortcuts that use it to prevent conflicts.

---

### 🪟 Windows

#### Option 2: Building from Source
```powershell
# Clone the repo
git clone https://github.com/yuhao-he/voice-input-assistant.git
cd voice-input-assistant

# Create and activate virtual environment
python -m venv venv
venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
pip install "PyQt6==6.6.1" "PyQt6-Qt6==6.6.2"

# Run the app
python main.py
```

---

### 🐧 Linux

#### Option 2: Building from Source
To build on Linux, you will need C headers for the audio bindings (`libasound2-dev` or `portaudio19-dev`).
```bash
# Clone the repository
git clone https://github.com/yuhao-he/voice-input-assistant.git
cd voice-input-assistant

# Create a virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run the app
python main.py
```

## Running the App

Once running, press **`Ctrl` + `Shift` + `Alt` + `Q`** anywhere to show or hide the settings menu.

You can also access the settings menu via the microphone icon in your system tray or macOS menu bar.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/yuhao-he/voice-input-assistant

Awesome Lists containing this project

README