An open API service indexing awesome lists of open source software.

https://github.com/yuhao-he/voice-input-assistant

Cloud based voice input
https://github.com/yuhao-he/voice-input-assistant

gcp gemini producitivity voice-to-text

Last synced: 3 months ago
JSON representation

Cloud based voice input

Awesome Lists containing this project

README

          

# VIA






**VIA (Voice Input Assistant)** is a real-time transcription desktop application designed to bridge the gap between your voice and your computer. It provides high-accuracy, real-time speech transcription and acts as your intelligent typing assistant across any application you use.

Supports macOS, Windows, and Linux (X11 only; Wayland is not supported).

## Features & Capabilities

- **Instant Voice Input:** Easily trigger transcription via push-to-talk (hold) or tap-to-talk modes.
- **AI Post-Processing:** Optionally refine, format, or improve your transcribed text automatically using Gemini before it even hits your screen.
- **Auto-Paste Integration:** Your transcribed and processed text is immediately inserted into whatever app or window you are actively working in.
- **Floating Transcript Interface:** View your live transcription through a non-intrusive floating overlay.
- **History & Editing Control:** Access a chat-like history menu of your prior voice notes, giving you full control to copy, edit, or re-insert transcripts as needed.

## Prerequisites

- **Python 3.10+**
- **Google Cloud account** with a billing-enabled project
- A **Google Cloud API key**

## API Key Setup

1. Go to [console.cloud.google.com](https://console.cloud.google.com)
2. Select or create a project with billing enabled
3. Create an API key (**APIs & Services → Credentials → Create Credentials → API key**)
4. Restrict the API key to required APIs (**APIs & Services → Library**):
- **Cloud Speech-to-Text API**
- **Generative Language API** *(for intelligent post-processing)*
5. Launch the app, click the 🎙 icon in the menu bar → **Show / Hide Settings**, and paste the key into the **Google Cloud API Key** field

## Installation & Usage

You can run VIA either by downloading a pre-built standalone app or by building it from source.

### 🍎 macOS

#### Option 1: Using Pre-built Binaries (Recommended)
1. Go to the [Releases](https://github.com/yuhao-he/voice-input-assistant/releases) page.
2. Download the `VoiceInputAssistant.dmg` file from the latest release.
3. Open the `.dmg` and drag **VoiceInputAssistant.app** to your Applications folder.
4. Open the application. *Note: Since this app is currently unsigned by an Apple Developer, macOS Gatekeeper may block it initially. To bypass this, **Right-Click** the app in Finder and select **Open**.*

#### Option 2: Building from Source
```bash
# Clone the repository
git clone https://github.com/yuhao-he/voice-input-assistant.git
cd voice-input-assistant

# Create a virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run the app
python main.py
```

#### macOS Permissions (Required)
Before VIA can listen to your voice or inject text, you must grant it permissions in macOS.
1. Go to **System Settings -> Privacy & Security**.
2. **Accessibility:** Enable this permission so VIA can send keyboard commands to automatically paste text.
- *If using Option 1 (Binary):* Add and enable `VoiceInputAssistant.app`.
- *If using Option 2 (Source):* Enable your host application (e.g., `Terminal`, `iTerm2`, or `VS Code`).
3. **Input Monitoring:** Enable this permission so VIA can listen for your global push-to-talk hotkeys when it is running in the background.
- *If using Option 1 (Binary):* Add and enable `VoiceInputAssistant.app`.
- *If using Option 2 (Source):* Enable your host application.
4. **Microphone (Option 1 Only):** The first time you press the hotkey in the standalone app, macOS will prompt you to allow Microphone access. Click **OK**.

> **Note on using the `Fn` (Globe) key as a hotkey:** If you intend to use the `Fn` key as your hotkey, you will likely need to disable macOS system shortcuts that use it to prevent conflicts.

---

### 🪟 Windows

#### Option 1: Using Pre-built Binaries (Recommended)
1. Go to the [Releases](https://github.com/yuhao-he/voice-input-assistant/releases) page.
2. Download the `VoiceInputAssistant-Windows.zip` file.
3. Extract the ZIP file and run `VoiceInputAssistant.exe`.

#### Option 2: Building from Source
```powershell
# Clone the repo
git clone https://github.com/yuhao-he/voice-input-assistant.git
cd voice-input-assistant

# Create and activate virtual environment
python -m venv venv
venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
pip install "PyQt6==6.6.1" "PyQt6-Qt6==6.6.2"

# Run the app
python main.py
```

---

### 🐧 Linux

#### Option 1: Using Pre-built Binaries (Recommended)
1. Go to the [Releases](https://github.com/yuhao-he/voice-input-assistant/releases) page.
2. Download the `VoiceInputAssistant-Linux.tar.gz` file.
3. Extract the archive and run the `VoiceInputAssistant` executable.

#### Option 2: Building from Source
To build on Linux, you will need C headers for the audio bindings (`libasound2-dev` or `portaudio19-dev`).
```bash
# Clone the repository
git clone https://github.com/yuhao-he/voice-input-assistant.git
cd voice-input-assistant

# Create a virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run the app
python main.py
```

## Running the App

Once running, press **`Ctrl` + `Shift` + `Alt` + `Q`** anywhere to show or hide the settings menu.

You can also access the settings menu via the microphone icon in your system tray or macOS menu bar.