https://github.com/yukukotani/pi-voice
Headless voice interface for the Pi Coding Agent
https://github.com/yukukotani/pi-voice
pi voice-ai
Last synced: 4 months ago
JSON representation
Headless voice interface for the Pi Coding Agent
- Host: GitHub
- URL: https://github.com/yukukotani/pi-voice
- Owner: yukukotani
- License: mit
- Created: 2026-02-08T15:18:25.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-02-11T04:21:32.000Z (4 months ago)
- Last Synced: 2026-02-12T14:56:46.036Z (4 months ago)
- Topics: pi, voice-ai
- Language: TypeScript
- Homepage:
- Size: 527 KB
- Stars: 43
- Watchers: 0
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Agents: AGENTS.md
Awesome Lists containing this project
- awesome-pi-coding-agent - yukukotani-pi-voice
README
# pi-voice
Headless voice interface for the [Pi Coding Agent](https://github.com/badlogic/pi-mono). Hold a key, speak, and pi executes your instructions with voice feedback.
#### Demo using ElevenLabs provider (make sure unmuted)
https://github.com/user-attachments/assets/76adb941-83cf-4394-b8d2-f6d73a1df8bc
## Installation
```bash
npm i -g pi-voice
# or
bun i -g pi-voice
```
## Usage
pi-voice is a daemon-style application that runs in the background once started. You can push-to-talk with the agent.
```bash
pi-voice start # start the daemon in the background
pi-voice status # show state, PID, and uptime
pi-voice stop # stop the daemon
```
The push-to-talk trigger defaults to `Cmd+Shift+I` (macOS) / `Win+Shift+I` (Windows). Hold the key to record, release to send.
## Setting
### pi agent configuration
pi-voice launches a Pi agent session with the directory where `pi-voice start` was executed. This means **all standard pi configuration works as-is**:
- `AGENTS.md` — walked up from `cwd` to the filesystem root
- `.pi/settings.json` — project-level settings
- `.pi/skills/`, `.pi/extensions/`, `.pi/prompts/` — project-level resources
- `~/.pi/agent/` — global settings, skills, extensions, prompts, and models
- and more
Refer to the [Pi documentation](https://github.com/badlogic/pi-mono/tree/main/packages/coding-agent) for details on these settings.
### pi-voice configuration
You can configure pi-voice in `.pi/pi-voice.json`:
```json
{
"key": "ctrl+t",
"provider": "local"
}
```
| Key | Description |
| --- | --- |
| `key` | Push-to-talk shortcut. Combine modifiers (`ctrl`, `shift`, `alt`/`opt`, `meta`/`cmd`) and a main key with `+`. Examples: `"ctrl+t"`, `"alt+space"`, `"ctrl+shift+r"`. Default: `"meta+shift+i"`. |
| `provider` | Speech provider for STT & TTS. `"local"`, `"gemini"` (Vertex AI or Gemini API), `"openai"`, or `"elevenlabs"`. Default: `"local"`. |
### Environment variables
| Provider | Required variables |
| --- | --- |
| `local` | None (model is auto-downloaded on first launch). Optional: `WHISPER_MODEL_PATH` (custom model path), `WHISPER_MODEL` (model name, default `medium-q5_0`), `SAY_VOICE` (macOS `say` voice name, e.g. `"Kyoko"`). |
| `gemini` | **Vertex AI:** `GOOGLE_CLOUD_PROJECT`, `GOOGLE_CLOUD_LOCATION` (optional, default `us-central1`). **Gemini API:** `GEMINI_API_KEY` or `GOOGLE_API_KEY`. If `GOOGLE_CLOUD_PROJECT` is set, Vertex AI is used; set `GOOGLE_GENAI_USE_VERTEXAI=false` to force API key mode. |
| `openai` | `OPENAI_API_KEY` |
| `elevenlabs` | `ELEVENLABS_API_KEY`. Optional: `ELEVENLABS_VOICE_ID` (TTS voice, default `CwhRBWXzGAHq8TQ4Fs17`), `ELEVENLABS_TTS_MODEL` (default `eleven_flash_v2_5`). |
#### Logging
The daemon writes structured JSON logs to both the console and a log file. The default log file path is `$XDG_CONFIG_HOME/pi-voice/daemon.log` (falls back to `~/.config/pi-voice/daemon.log`).
To override the log file path:
```bash
export PI_VOICE_LOG_PATH=/path/to/custom.log
```
#### Whisper model (local provider)
The `local` provider uses [Whisper](https://github.com/openai/whisper) for STT and the macOS `say` command for TTS. On first launch, a ggml-format Whisper model (`medium-q5_0`, ~514 MB) is automatically downloaded to `~/.pi-agent/whisper/` and cached for subsequent runs.
To use a different model, set `WHISPER_MODEL`:
```bash
export WHISPER_MODEL=base # smaller & faster
```
Or point to your own model file directly:
```bash
export WHISPER_MODEL_PATH=/path/to/ggml-custom.bin
```
## Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, build commands, and release workflow.