https://github.com/acoyfellow/t2t

Voice-to-text with MCP support. System-wide dictation (hold fn) and AI agent mode (hold fn+ctrl) that connects to any MCP server. Cross-platform desktop app with local Whisper transcription.
https://github.com/acoyfellow/t2t

accessibility ai-agents clipboard desktop-app dictation linux local-first macos mcp offline openrouter productivity push-to-talk rust speech-to-text svelte sveltekit tauri whisper windows

Last synced: 5 months ago
JSON representation

Voice-to-text with MCP support. System-wide dictation (hold fn) and AI agent mode (hold fn+ctrl) that connects to any MCP server. Cross-platform desktop app with local Whisper transcription.

Host: GitHub
URL: https://github.com/acoyfellow/t2t
Owner: acoyfellow
License: mit
Created: 2025-12-10T20:19:47.000Z (7 months ago)
Default Branch: main
Last Pushed: 2025-12-30T00:52:17.000Z (6 months ago)
Last Synced: 2026-01-13T19:43:32.694Z (5 months ago)
Topics: accessibility, ai-agents, clipboard, desktop-app, dictation, linux, local-first, macos, mcp, offline, openrouter, productivity, push-to-talk, rust, speech-to-text, svelte, sveltekit, tauri, whisper, windows
Language: Svelte
Homepage: https://t2t.now
Size: 11.3 MB
Stars: 8
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# t2t

![t2t logo](web/static/logo.svg)

**Voice-to-text with intelligence. Hold fn to talk, hold fn+ctrl to command.**

## Download

**[Download for macOS →](https://t2t.now)**

[View all releases on GitHub →](https://github.com/acoyfellow/t2t/releases)

> **Note:** The app is not code-signed yet. On first launch, macOS may show a security warning. To open it:
> - Right-click the app → **Open**, then click **Open** in the dialog
> - Or run: `xattr -cr /Applications/t2t.app` in Terminal
>
> **Heads up:** This is an unsigned build while we polish things up. Each time you update to a new version, you'll need to remove t2t from System Settings → Privacy & Security → Accessibility (and Microphone if needed), then re-add it. We'll get it properly signed soon!

## How It Works

- **Hold Fn key** → records microphone audio
- **Release Fn key** → transcribes using local Whisper model
- **Typing mode** (red bar): Hold Fn alone → pastes transcription into focused text field, preserves clipboard
- **Agent mode** (cyan bar): Hold Fn+Ctrl → speaks commands to AI agent
- **MCP mode** (if configured): Connects to MCP servers, uses their tools via OpenRouter AI
- **AppleScript mode** (fallback): Generates and executes AppleScript for macOS automation
- Visual feedback: red/cyan bar while recording (based on mode), amber while processing

## Requirements

- **macOS** (currently macOS only; tested on Apple Silicon)
- **Accessibility permission** - Required for Fn key detection and focusing the correct field before paste
- **Microphone permission** - Required for audio recording
- **OpenRouter API key** (for agent mode) - Get one at [openrouter.ai](https://openrouter.ai)

The app will prompt you if permissions are missing.

## Getting Started

1. **Download and install** the app from [t2t.now](https://t2t.now)
2. **Grant permissions** when prompted (Accessibility and Microphone)
3. **Get an OpenRouter API key** at [openrouter.ai](https://openrouter.ai) (required for agent mode)
4. **Open settings**: Click the menu bar icon → **View Settings**
5. **Configure agent mode** (optional):
- Add your OpenRouter API key in settings
- Optionally configure MCP servers for extended automation

## Settings & Analytics

The settings window (Menu bar icon → **View Settings**) includes three tabs:

### Analytics Tab

View your transcription usage statistics:
- **Total Words**: Lifetime count of all transcribed words
- **Lifetime Average**: Average words per minute across all sessions
- **Session Average**: Average words per minute for current session
- **Sessions**: Total number of transcription sessions
- **Hours Active**: Total time spent transcribing
- **Recent Activity**: 48-hour hourly activity chart

### Settings Tab

Configure your t2t installation:
- **Theme**: Toggle between light and dark mode
- **OpenRouter API Key**: Set your API key for agent mode
- **AI Model Selection**: Choose which model to use for agent mode
- Supports all OpenRouter models
- Auto-refresh available to fetch latest models
- **MCP Servers**: Add, configure, and manage MCP servers
- Test connections and view available tools
- Enable/disable servers individually
- Supports stdio, HTTP, and SSE transports

### History Tab

See [History & Logging](#history--logging) section below.

## MCP (Model Context Protocol) Support

When MCP servers are configured in settings, agent mode uses MCP instead of AppleScript. This enables:

- **Extensible automation**: Connect to any MCP-compatible service (databases, APIs, file systems, etc.)
- **Tool-based execution**: AI agent uses tools provided by your MCP servers
- **Multiple servers**: Connect to multiple MCP servers simultaneously
- **Transport options**: Supports stdio, HTTP, and SSE transports

**To configure**: Menu bar icon → **View Settings** → Settings tab → MCP Servers section. Requires an OpenRouter API key.

## Vision Support & Automatic Screenshots

t2t automatically captures and includes a screenshot with every agent call, enabling vision-capable models to "see" your screen context. This works seamlessly with any model - vision-capable models process the image, while text-only models simply ignore it.

### How It Works

- **Automatic capture**: When you use agent mode (Fn+Ctrl), a screenshot is captured before sending your prompt
- **Universal support**: Screenshots are included with all agent calls, regardless of model selection
- **Smart routing**: OpenRouter automatically routes to vision-capable models when available, or ignores the image for text-only models
- **Seamless integration**: Screenshots are included in the API request without any additional UI or user action
- **Privacy**: Screenshots are only sent to the API (not stored locally), and thumbnails are visible in History

### Privacy & Permissions

- **Screen Recording permission**: macOS may prompt for screen recording permission the first time you use agent mode
- **No local storage**: Full screenshots are not saved to disk - they're only sent to the API
- **Thumbnails**: Small thumbnails (150x150px) are stored locally in History for reference
- **Error handling**: If screenshot capture fails (e.g., permission denied), the agent falls back to text-only mode

### Technical Details

- Screenshots are captured using macOS `screencapture` command
- Images are encoded as base64 PNG and included in the OpenAI-compatible message format
- The screenshot is included in both initial requests and follow-up requests after tool execution
- Vision-capable models (GPT-4 Vision, Claude 3.5 Sonnet, etc.) can process the image to understand your screen context

## History & Logging

t2t automatically logs all transcriptions and agent calls for review and debugging.

### Features

- **Transcription history**: All voice transcriptions are saved with timestamps
- **Agent call logging**: Complete request/response logs for all OpenRouter API calls
- **Screenshot thumbnails**: Tiny thumbnails (150x150px) of screenshots captured with all agent calls
- **Search**: Fast local search across all history entries
- **Expandable details**: Click any entry to view full request/response JSON and tool calls

### Accessing History

Menu bar icon → **View Settings** → **History** tab

### Configuration

- **History limit**: Set `T2T_HISTORY_LIMIT` environment variable (default: 1000 entries)
- **Storage**: History is stored locally in `history.json` via Tauri's store plugin
- **Privacy**: All data stays on your machine - nothing is sent to external services

### What's Logged

**Transcriptions:**
- Timestamp
- Transcribed text

**Agent Calls:**
- Timestamp
- Transcript (your voice input)
- Model used
- Full request JSON (messages, parameters)
- Full response JSON (AI output, tool calls)
- Tool calls executed (if any)
- Screenshot thumbnail (captured automatically with each agent call)
- Success/error status

## First Run

On first launch, the app automatically downloads the Whisper model (~150MB) to `~/.cache/whisper/ggml-base.en.bin`. This happens in the background.

## For Developers

### Setup

```bash
# Install dependencies (in desktop/)
cd desktop && bun install

# Development
bun dev # From root, or:
cd desktop && bun tauri dev

# Build
bun build # From root, or:
cd desktop && bun tauri build
```

### Requirements

- **Rust** (install via rustup)
- **Bun** (recommended) or Node.js 18+

### Tech Stack

- **Frontend**: Svelte 5 + SvelteKit
- **Backend**: Rust + Tauri
- **STT**: whisper-rs (local Whisper.cpp model)
- **AI**: OpenRouter API (direct calls, no infrastructure needed)
- **MCP**: Model Context Protocol client (local stdio/HTTP/SSE)
- **Hotkey**: macOS event monitoring (Fn key) + fallbacks
- **Audio capture**: native (Rust via cpal)

**Architecture**: Fully local. Only OpenRouter API calls go out. No servers, workers, or infrastructure required.

### Debugging

- **Logs**: `~/Library/Logs/t2t.log`
- **Model location**: `~/.cache/whisper/ggml-base.en.bin`
- **History storage**: `history.json` (via Tauri store, location depends on Tauri config)

## License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/acoyfellow/t2t

Awesome Lists containing this project

README