An open API service indexing awesome lists of open source software.

https://github.com/yukukotani/pi-voice

Headless voice interface for the Pi Coding Agent
https://github.com/yukukotani/pi-voice

pi voice-ai

Last synced: 4 months ago
JSON representation

Headless voice interface for the Pi Coding Agent

Awesome Lists containing this project

README

          

# pi-voice

Headless voice interface for the [Pi Coding Agent](https://github.com/badlogic/pi-mono). Hold a key, speak, and pi executes your instructions with voice feedback.

#### Demo using ElevenLabs provider (make sure unmuted)

https://github.com/user-attachments/assets/76adb941-83cf-4394-b8d2-f6d73a1df8bc

## Installation

```bash
npm i -g pi-voice
# or
bun i -g pi-voice
```

## Usage

pi-voice is a daemon-style application that runs in the background once started. You can push-to-talk with the agent.

```bash
pi-voice start # start the daemon in the background
pi-voice status # show state, PID, and uptime
pi-voice stop # stop the daemon
```

The push-to-talk trigger defaults to `Cmd+Shift+I` (macOS) / `Win+Shift+I` (Windows). Hold the key to record, release to send.

## Setting

### pi agent configuration

pi-voice launches a Pi agent session with the directory where `pi-voice start` was executed. This means **all standard pi configuration works as-is**:

- `AGENTS.md` — walked up from `cwd` to the filesystem root
- `.pi/settings.json` — project-level settings
- `.pi/skills/`, `.pi/extensions/`, `.pi/prompts/` — project-level resources
- `~/.pi/agent/` — global settings, skills, extensions, prompts, and models
- and more

Refer to the [Pi documentation](https://github.com/badlogic/pi-mono/tree/main/packages/coding-agent) for details on these settings.

### pi-voice configuration

You can configure pi-voice in `.pi/pi-voice.json`:

```json
{
"key": "ctrl+t",
"provider": "local"
}
```

| Key | Description |
| --- | --- |
| `key` | Push-to-talk shortcut. Combine modifiers (`ctrl`, `shift`, `alt`/`opt`, `meta`/`cmd`) and a main key with `+`. Examples: `"ctrl+t"`, `"alt+space"`, `"ctrl+shift+r"`. Default: `"meta+shift+i"`. |
| `provider` | Speech provider for STT & TTS. `"local"`, `"gemini"` (Vertex AI or Gemini API), `"openai"`, or `"elevenlabs"`. Default: `"local"`. |

### Environment variables

| Provider | Required variables |
| --- | --- |
| `local` | None (model is auto-downloaded on first launch). Optional: `WHISPER_MODEL_PATH` (custom model path), `WHISPER_MODEL` (model name, default `medium-q5_0`), `SAY_VOICE` (macOS `say` voice name, e.g. `"Kyoko"`). |
| `gemini` | **Vertex AI:** `GOOGLE_CLOUD_PROJECT`, `GOOGLE_CLOUD_LOCATION` (optional, default `us-central1`). **Gemini API:** `GEMINI_API_KEY` or `GOOGLE_API_KEY`. If `GOOGLE_CLOUD_PROJECT` is set, Vertex AI is used; set `GOOGLE_GENAI_USE_VERTEXAI=false` to force API key mode. |
| `openai` | `OPENAI_API_KEY` |
| `elevenlabs` | `ELEVENLABS_API_KEY`. Optional: `ELEVENLABS_VOICE_ID` (TTS voice, default `CwhRBWXzGAHq8TQ4Fs17`), `ELEVENLABS_TTS_MODEL` (default `eleven_flash_v2_5`). |

#### Logging

The daemon writes structured JSON logs to both the console and a log file. The default log file path is `$XDG_CONFIG_HOME/pi-voice/daemon.log` (falls back to `~/.config/pi-voice/daemon.log`).

To override the log file path:

```bash
export PI_VOICE_LOG_PATH=/path/to/custom.log
```

#### Whisper model (local provider)

The `local` provider uses [Whisper](https://github.com/openai/whisper) for STT and the macOS `say` command for TTS. On first launch, a ggml-format Whisper model (`medium-q5_0`, ~514 MB) is automatically downloaded to `~/.pi-agent/whisper/` and cached for subsequent runs.

To use a different model, set `WHISPER_MODEL`:

```bash
export WHISPER_MODEL=base # smaller & faster
```

Or point to your own model file directly:

```bash
export WHISPER_MODEL_PATH=/path/to/ggml-custom.bin
```

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, build commands, and release workflow.