https://github.com/jamditis/yap
Voice-coding tool for Windows 10 and 11.
https://github.com/jamditis/yap
coding productivity speech-to-text terminal transcript transcription voice-recognition voice-to-text vtt
Last synced: about 1 month ago
JSON representation
Voice-coding tool for Windows 10 and 11.
- Host: GitHub
- URL: https://github.com/jamditis/yap
- Owner: jamditis
- Created: 2025-12-10T04:32:16.000Z (3 months ago)
- Default Branch: master
- Last Pushed: 2025-12-18T15:31:45.000Z (2 months ago)
- Last Synced: 2025-12-21T19:54:00.455Z (2 months ago)
- Topics: coding, productivity, speech-to-text, terminal, transcript, transcription, voice-recognition, voice-to-text, vtt
- Language: TypeScript
- Homepage: https://jamditis.github.io/yap/
- Size: 469 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Yap - Voice Assistant for Developers
Voice-to-text dictation tool that runs in your system tray
Perfect for hands-free coding with Claude Code, Gemini CLI, or any terminal.
---
## Download
**[⬇️ Download Latest Release](https://github.com/jamditis/yap/releases/latest)**
---
## Features
- **Headless tray mode** - Runs silently in system tray, always ready
- **Global hotkeys** - `Alt+S` to record from any window, `Alt+H` to show settings
- **Multiple transcription engines:**
- **Gemini 2.0/2.5 Flash** - Cloud-based, high quality, supports Agent mode
- **Parakeet** - Local NVIDIA GPU transcription, free, high quality
- **Windows Speech** - Built-in Windows engine, no setup required
- **Agent mode** - Converts natural speech to CLI commands (e.g., "show git status" → `git status`)
- **Audio visualization** - Real-time waveform on mic button
- **Sound notifications** - Audio feedback for recording start/stop/success/error
---
## User Guide
### Installation
1. Download `Yap Setup 1.0.0.exe` from the [releases page](https://github.com/jamditis/yap/releases/latest)
2. Run the installer
3. Yap starts automatically and appears in your system tray
### First-Time Setup
#### Option 1: Gemini (Recommended - Best Quality)
1. Get a free API key at [Google AI Studio](https://aistudio.google.com/app/apikey)
2. Press `Alt+H` to open Yap settings
3. The app looks for `GEMINI_API_KEY` in your environment or `.env.local` file
4. Create `.env.local` in the app directory:
```
GEMINI_API_KEY=your_api_key_here
```
#### Option 2: Parakeet (Local - NVIDIA GPU)
Requires:
- NVIDIA GPU with CUDA support
- Python 3.10+
- NeMo toolkit: `pip install nemo_toolkit[asr] torch torchaudio pydub`
- FFmpeg installed and in PATH
The Parakeet server starts automatically with Yap.
#### Option 3: Windows Speech (Local - No Setup)
Just select "Windows Speech (Local/CPU)" in the model dropdown. Uses Windows' built-in speech recognition - works offline with no configuration.
### Basic Usage
| Action | Hotkey |
|--------|--------|
| Start/Stop Recording | `Alt+S` |
| Show/Hide Settings | `Alt+H` |
**Workflow:**
1. Focus your terminal (Claude Code, PowerShell, etc.)
2. Press `Alt+S` - hear the start sound
3. Speak your command or text
4. Press `Alt+S` again - hear the stop sound
5. Wait for transcription (you'll hear success/error sound)
6. Right-click in your terminal to paste
### Transcription Modes
#### RAW Mode (Default)
Transcribes exactly what you say. Use for:
- Dictating text, comments, documentation
- Any verbatim transcription
#### Agent Mode (Gemini only)
Converts natural language to CLI commands:
- "show me the git status" → `git status`
- "list all files" → `ls -la`
- "make a new folder called test" → `mkdir test`
- "install lodash" → `npm install lodash`
### Settings
| Setting | Description |
|---------|-------------|
| **Microphone** | Select input device |
| **Model** | Choose transcription engine |
| **Auto-Copy** | Automatically copy results to clipboard |
| **Paste Mode** | Enable for terminal paste workflow |
| **Agent Mode** | Convert speech to CLI commands (Gemini only) |
---
## Development
### Prerequisites
- Node.js 18+
- npm
### Setup
```bash
git clone https://github.com/jamditis/yap.git
cd yap
npm install
```
### Environment Variables
Create `.env.local`:
```
GEMINI_API_KEY=your_api_key_here
```
### Run in Development
```bash
npm run electron:dev
```
### Build for Production
```bash
npm run electron:build
```
Output: `dist/Yap Setup 1.0.0.exe`
---
## Troubleshooting
### No transcription with Gemini
- Check your API key is correct in `.env.local`
- Verify you have internet connection
- Check console for error messages (`Alt+H` → DevTools)
### Parakeet not working
- Ensure NVIDIA GPU drivers are installed
- Check Python version: `python --version` (needs 3.10+)
- Verify NeMo: `python -c "import nemo.collections.asr"`
- Check server: `curl http://localhost:8002/health`
### Windows Speech not transcribing
- Check microphone permissions in Windows Settings
- Try speaking louder/clearer
- Default timeout is 10 seconds
### Paste not working in Claude Code
- Yap copies to clipboard - you need to right-click to paste
- Claude Code doesn't support Ctrl+V, only right-click paste
---
## Tech Stack
- **Electron** - Desktop app framework
- **React 19** - UI
- **Vite** - Build tool
- **Tailwind CSS** - Styling
- **Google Gemini API** - Cloud transcription
- **NVIDIA Parakeet** - Local GPU transcription
- **Windows SAPI** - Local CPU transcription
---
## License
MIT
---
## Contributing
Issues and PRs welcome! Please open an issue first to discuss major changes.
---
Made with ❤️ for developers who prefer talking to typing