https://github.com/acoyfellow/t2t
Voice-to-text with MCP support. System-wide dictation (hold fn) and AI agent mode (hold fn+ctrl) that connects to any MCP server. Cross-platform desktop app with local Whisper transcription.
https://github.com/acoyfellow/t2t
accessibility ai-agents clipboard desktop-app dictation linux local-first macos mcp offline openrouter productivity push-to-talk rust speech-to-text svelte sveltekit tauri whisper windows
Last synced: 5 months ago
JSON representation
Voice-to-text with MCP support. System-wide dictation (hold fn) and AI agent mode (hold fn+ctrl) that connects to any MCP server. Cross-platform desktop app with local Whisper transcription.
- Host: GitHub
- URL: https://github.com/acoyfellow/t2t
- Owner: acoyfellow
- License: mit
- Created: 2025-12-10T20:19:47.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-12-30T00:52:17.000Z (6 months ago)
- Last Synced: 2026-01-13T19:43:32.694Z (5 months ago)
- Topics: accessibility, ai-agents, clipboard, desktop-app, dictation, linux, local-first, macos, mcp, offline, openrouter, productivity, push-to-talk, rust, speech-to-text, svelte, sveltekit, tauri, whisper, windows
- Language: Svelte
- Homepage: https://t2t.now
- Size: 11.3 MB
- Stars: 8
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# t2t

**Voice-to-text with intelligence. Hold fn to talk, hold fn+ctrl to command.**
## Download
**[Download for macOS →](https://t2t.now)**
[View all releases on GitHub →](https://github.com/acoyfellow/t2t/releases)
> **Note:** The app is not code-signed yet. On first launch, macOS may show a security warning. To open it:
> - Right-click the app → **Open**, then click **Open** in the dialog
> - Or run: `xattr -cr /Applications/t2t.app` in Terminal
>
> **Heads up:** This is an unsigned build while we polish things up. Each time you update to a new version, you'll need to remove t2t from System Settings → Privacy & Security → Accessibility (and Microphone if needed), then re-add it. We'll get it properly signed soon!
## How It Works
- **Hold Fn key** → records microphone audio
- **Release Fn key** → transcribes using local Whisper model
- **Typing mode** (red bar): Hold Fn alone → pastes transcription into focused text field, preserves clipboard
- **Agent mode** (cyan bar): Hold Fn+Ctrl → speaks commands to AI agent
- **MCP mode** (if configured): Connects to MCP servers, uses their tools via OpenRouter AI
- **AppleScript mode** (fallback): Generates and executes AppleScript for macOS automation
- Visual feedback: red/cyan bar while recording (based on mode), amber while processing
## Requirements
- **macOS** (currently macOS only; tested on Apple Silicon)
- **Accessibility permission** - Required for Fn key detection and focusing the correct field before paste
- **Microphone permission** - Required for audio recording
- **OpenRouter API key** (for agent mode) - Get one at [openrouter.ai](https://openrouter.ai)
The app will prompt you if permissions are missing.
## Getting Started
1. **Download and install** the app from [t2t.now](https://t2t.now)
2. **Grant permissions** when prompted (Accessibility and Microphone)
3. **Get an OpenRouter API key** at [openrouter.ai](https://openrouter.ai) (required for agent mode)
4. **Open settings**: Click the menu bar icon → **View Settings**
5. **Configure agent mode** (optional):
- Add your OpenRouter API key in settings
- Optionally configure MCP servers for extended automation
## Settings & Analytics
The settings window (Menu bar icon → **View Settings**) includes three tabs:
### Analytics Tab
View your transcription usage statistics:
- **Total Words**: Lifetime count of all transcribed words
- **Lifetime Average**: Average words per minute across all sessions
- **Session Average**: Average words per minute for current session
- **Sessions**: Total number of transcription sessions
- **Hours Active**: Total time spent transcribing
- **Recent Activity**: 48-hour hourly activity chart
### Settings Tab
Configure your t2t installation:
- **Theme**: Toggle between light and dark mode
- **OpenRouter API Key**: Set your API key for agent mode
- **AI Model Selection**: Choose which model to use for agent mode
- Supports all OpenRouter models
- Auto-refresh available to fetch latest models
- **MCP Servers**: Add, configure, and manage MCP servers
- Test connections and view available tools
- Enable/disable servers individually
- Supports stdio, HTTP, and SSE transports
### History Tab
See [History & Logging](#history--logging) section below.
## MCP (Model Context Protocol) Support
When MCP servers are configured in settings, agent mode uses MCP instead of AppleScript. This enables:
- **Extensible automation**: Connect to any MCP-compatible service (databases, APIs, file systems, etc.)
- **Tool-based execution**: AI agent uses tools provided by your MCP servers
- **Multiple servers**: Connect to multiple MCP servers simultaneously
- **Transport options**: Supports stdio, HTTP, and SSE transports
**To configure**: Menu bar icon → **View Settings** → Settings tab → MCP Servers section. Requires an OpenRouter API key.
## Vision Support & Automatic Screenshots
t2t automatically captures and includes a screenshot with every agent call, enabling vision-capable models to "see" your screen context. This works seamlessly with any model - vision-capable models process the image, while text-only models simply ignore it.
### How It Works
- **Automatic capture**: When you use agent mode (Fn+Ctrl), a screenshot is captured before sending your prompt
- **Universal support**: Screenshots are included with all agent calls, regardless of model selection
- **Smart routing**: OpenRouter automatically routes to vision-capable models when available, or ignores the image for text-only models
- **Seamless integration**: Screenshots are included in the API request without any additional UI or user action
- **Privacy**: Screenshots are only sent to the API (not stored locally), and thumbnails are visible in History
### Privacy & Permissions
- **Screen Recording permission**: macOS may prompt for screen recording permission the first time you use agent mode
- **No local storage**: Full screenshots are not saved to disk - they're only sent to the API
- **Thumbnails**: Small thumbnails (150x150px) are stored locally in History for reference
- **Error handling**: If screenshot capture fails (e.g., permission denied), the agent falls back to text-only mode
### Technical Details
- Screenshots are captured using macOS `screencapture` command
- Images are encoded as base64 PNG and included in the OpenAI-compatible message format
- The screenshot is included in both initial requests and follow-up requests after tool execution
- Vision-capable models (GPT-4 Vision, Claude 3.5 Sonnet, etc.) can process the image to understand your screen context
## History & Logging
t2t automatically logs all transcriptions and agent calls for review and debugging.
### Features
- **Transcription history**: All voice transcriptions are saved with timestamps
- **Agent call logging**: Complete request/response logs for all OpenRouter API calls
- **Screenshot thumbnails**: Tiny thumbnails (150x150px) of screenshots captured with all agent calls
- **Search**: Fast local search across all history entries
- **Expandable details**: Click any entry to view full request/response JSON and tool calls
### Accessing History
Menu bar icon → **View Settings** → **History** tab
### Configuration
- **History limit**: Set `T2T_HISTORY_LIMIT` environment variable (default: 1000 entries)
- **Storage**: History is stored locally in `history.json` via Tauri's store plugin
- **Privacy**: All data stays on your machine - nothing is sent to external services
### What's Logged
**Transcriptions:**
- Timestamp
- Transcribed text
**Agent Calls:**
- Timestamp
- Transcript (your voice input)
- Model used
- Full request JSON (messages, parameters)
- Full response JSON (AI output, tool calls)
- Tool calls executed (if any)
- Screenshot thumbnail (captured automatically with each agent call)
- Success/error status
## First Run
On first launch, the app automatically downloads the Whisper model (~150MB) to `~/.cache/whisper/ggml-base.en.bin`. This happens in the background.
## For Developers
### Setup
```bash
# Install dependencies (in desktop/)
cd desktop && bun install
# Development
bun dev # From root, or:
cd desktop && bun tauri dev
# Build
bun build # From root, or:
cd desktop && bun tauri build
```
### Requirements
- **Rust** (install via rustup)
- **Bun** (recommended) or Node.js 18+
### Tech Stack
- **Frontend**: Svelte 5 + SvelteKit
- **Backend**: Rust + Tauri
- **STT**: whisper-rs (local Whisper.cpp model)
- **AI**: OpenRouter API (direct calls, no infrastructure needed)
- **MCP**: Model Context Protocol client (local stdio/HTTP/SSE)
- **Hotkey**: macOS event monitoring (Fn key) + fallbacks
- **Audio capture**: native (Rust via cpal)
**Architecture**: Fully local. Only OpenRouter API calls go out. No servers, workers, or infrastructure required.
### Debugging
- **Logs**: `~/Library/Logs/t2t.log`
- **Model location**: `~/.cache/whisper/ggml-base.en.bin`
- **History storage**: `history.json` (via Tauri store, location depends on Tauri config)
## License
MIT