{"id":39463806,"url":"https://github.com/4over7/speakout","last_synced_at":"2026-04-23T12:01:33.805Z","repository":{"id":333140641,"uuid":"1132045498","full_name":"4over7/SpeakOut","owner":"4over7","description":"Offline-first AI voice input for macOS. 8 ASR models + 6 cloud providers, AI polish via 12 LLMs, 11-language translation, flash notes, AI organize, AI debug, correction feedback. 598 tests, fully private.","archived":false,"fork":false,"pushed_at":"2026-04-20T04:28:44.000Z","size":307074,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-20T04:42:00.178Z","etag":null,"topics":["ai-polish","asr","blackbox-testing","dictation","flutter","llm","macos","offline","ollama","paraformer","sensevoice","sherpa-onnx","speech-recognition","translation","voice-input","voice-to-text","whisper"],"latest_commit_sha":null,"homepage":null,"language":"Dart","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/4over7.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-11T07:53:18.000Z","updated_at":"2026-04-20T04:28:48.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/4over7/SpeakOut","commit_stats":null,"previous_names":["4over7/speakout"],"tags_count":40,"template":false,"template_full_name":null,"purl":"pkg:github/4over7/SpeakOut","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4over7%2FSpeakOut","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4over7%2FSpeakOut/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4over7%2FSpeakOut/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4over7%2FSpeakOut/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/4over7","download_url":"https://codeload.github.com/4over7/SpeakOut/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4over7%2FSpeakOut/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32179387,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-23T11:42:27.955Z","status":"ssl_error","status_checked_at":"2026-04-23T11:42:18.877Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-polish","asr","blackbox-testing","dictation","flutter","llm","macos","offline","ollama","paraformer","sensevoice","sherpa-onnx","speech-recognition","translation","voice-input","voice-to-text","whisper"],"created_at":"2026-01-18T04:47:52.262Z","updated_at":"2026-04-23T12:01:33.800Z","avatar_url":"https://github.com/4over7.png","language":"Dart","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n  \u003cimg src=\"assets/app_icon_rounded.png\" width=\"160\" height=\"160\" alt=\"SpeakOut Icon\" /\u003e\n\n# 子曰 SpeakOut\n\n  **Offline-First AI Voice Input for macOS**\n  *Hold a key. Speak. Auto-type.*\n\n  [Download](https://github.com/4over7/SpeakOut/releases/latest) · [Wiki](https://github.com/4over7/SpeakOut/wiki) · [Changelog](CHANGELOG.md)\n\n  ![Platform](https://img.shields.io/badge/platform-macOS%2013+-blue)\n  ![Version](https://img.shields.io/badge/version-1.7.1-brightgreen)\n  ![Tests](https://img.shields.io/badge/tests-598%20passed-brightgreen)\n  ![License](https://img.shields.io/badge/license-proprietary-lightgrey)\n\n\u003c/div\u003e\n\n---\n\n## What is SpeakOut?\n\nA macOS desktop app that turns your voice into text — offline by default, with optional cloud enhancement. Press a hotkey, speak naturally, and text appears at your cursor. Supports 11 languages, real-time translation, and AI-powered text polishing.\n\n**Works 100% offline with production-quality results.** No account, no API key, no internet required. Just install, download a model, and start speaking. Cloud features (AI polish, translation, cloud ASR) are optional enhancements — the core voice input experience is fully local.\n\n**Core principles**: privacy first (audio never leaves your device in offline mode), low latency (sub-second response), and zero configuration (works out of the box).\n\n---\n\n## Features\n\n### Voice Input\n\n| | Offline Mode | Smart Mode | Cloud Mode |\n|---|---|---|---|\n| **ASR Engine** | Sherpa-ONNX (local) | Sherpa-ONNX (local) | Cloud ASR (Groq, DashScope, etc.) |\n| **AI Polish** | — | LLM correction + translation | — |\n| **Privacy** | 100% offline | ASR offline, LLM via cloud | Audio sent to cloud |\n| **Latency** | Fastest | +0.5~1s for LLM | Depends on network |\n\n- **8 Offline Models** — SenseVoice, Paraformer, Whisper Large-v3, FireRedASR, and more\n- **Two Trigger Modes** — Hold to Speak (PTT) or Tap to Toggle\n- **Streaming \u0026 Offline** — Real-time subtitles while speaking, or higher accuracy after release\n\n### 11 Languages + Translation\n\n| Languages | Input | Output | Translation |\n|-----------|-------|--------|-------------|\n| Chinese, English, Japanese, Korean, Cantonese | All modes | All modes | — |\n| Spanish, French, German, Russian, Portuguese | Whisper / Cloud | Smart Mode | Via LLM |\n\n- **Auto-detect** — Let the model detect what language you're speaking\n- **Translation Mode** — Set different input/output languages (e.g., speak Chinese → output English). Requires Smart Mode.\n- **Script Control** — Choose Simplified or Traditional Chinese output\n\n### Cloud ASR (6 Providers)\n\n| Provider | Protocol | Highlights |\n|----------|----------|------------|\n| **DashScope** (Aliyun) | WebSocket | Paraformer realtime, Chinese optimized |\n| **Groq** | REST (Whisper) | Fast, 99 languages |\n| **OpenAI** | REST (Whisper/GPT-4o) | Most accurate multilingual |\n| **Volcengine** (ByteDance) | WebSocket (binary) | Seed-ASR, highest Chinese accuracy |\n| **iFlytek** | WebSocket | 202 dialects |\n| **Tencent Cloud** | WebSocket | 5h/month free |\n\n### AI Polish (Smart Mode)\n\nLLM post-processing: fix homophones, remove filler words, translate, enforce output language.\n\n- **12 LLM Providers** — DashScope, DeepSeek, Volcengine, OpenAI, Anthropic, Zhipu, Kimi, MiniMax, Gemini, iFlytek, Groq, Ollama (local)\n- **Professional Vocabulary** — Industry dictionaries (Tech/Medical/Legal/Finance/Education) + personal dictionary\n- **Typewriter Mode** (Alpha) — Stream LLM output character by character to cursor\n\n### ⚡ Superpowers\n\nHotkey-driven productivity features on top of voice input:\n\n- **Flash Notes** — Dedicated hotkey, speak and auto-save as timestamped Markdown to any folder\n- **AI Organize** — Select any text, press hotkey, LLM restructures logic and appends below the original (keeps source intact)\n- **Instant Translation** — Speak source language, output target language in real time (works with any AI Polish LLM)\n- **Correction Feedback** — Spot an ASR/LLM mistake? Fix it inline, press the feedback hotkey — LLM diffs against the last recording trace and auto-learns the term into your vocabulary\n- **AI Debug** — Hold hotkey to capture screenshot + voice description of a bug, auto-sent to Claude Code / Cursor / bound AI coding windows (up to 5 slots)\n\n### Smart Audio\n\n- **Bluetooth Detection** — Auto-detects headset connect/disconnect\n- **Device Selection** — Choose preferred mic in settings\n- **Pre-segmentation** — 3s pause triggers background decoding, minimizing final wait on stop\n\n---\n\n## Install\n\n1. Download `SpeakOut.dmg` from [Releases](https://github.com/4over7/SpeakOut/releases/latest)\n2. Drag to `/Applications`\n3. First launch: `xattr -cr /Applications/SpeakOut.app` (required until Developer ID signing)\n4. Grant permissions: **Input Monitoring**, **Accessibility**, **Microphone**\n5. Follow the onboarding wizard to download a voice model\n\n### System Requirements\n\n- macOS 13+ (Ventura or later)\n- ~230MB for default model, up to ~1.4GB for Whisper/FireRedASR\n\n---\n\n## Offline Models\n\n### Streaming (Real-time subtitles)\n\n| Model | Languages | Size |\n|-------|-----------|------|\n| Zipformer Bilingual | Zh/En | ~490MB |\n| Paraformer Streaming | Zh/En | ~1GB |\n\n### Offline (Higher accuracy)\n\n| Model | Languages | Size | Notes |\n|-------|-----------|------|-------|\n| **SenseVoice 2024** | Zh/En/Ja/Ko/Yue | ~228MB | Default, built-in punctuation |\n| SenseVoice 2025 | Zh/En/Ja/Ko/Yue | ~158MB | Cantonese enhanced |\n| Paraformer Offline | Zh/En | ~217MB | Mature \u0026 stable |\n| Paraformer Dialect | Zh/En + Sichuan | ~218MB | Dialect support |\n| Whisper Large-v3 | 99 languages | ~1.0GB | Best multilingual |\n| FireRedASR Large | Zh/En + dialects | ~1.4GB | Highest capacity |\n\n---\n\n## Architecture\n\n```\nHotkey → native_input.m (CGEventTap)\n  → C Ring Buffer (16kHz PCM)\n  → CoreEngine FFI polling\n  → ASR (8 offline models / 6 cloud providers)\n  → LLM polish + translation (optional, 12 providers)\n  → Clipboard paste to active app\n```\n\n| Layer | Path | Description |\n|-------|------|-------------|\n| Engine | `lib/engine/` | CoreEngine, ASR providers, model management |\n| Service | `lib/services/` | Config, LLM, billing, diary, audio devices |\n| UI | `lib/ui/` | macOS-native UI (macos_ui), settings, overlay |\n| Native | `native_lib/` | Objective-C: CGEventTap + AudioQueue ring buffer |\n| Gateway | `gateway/` | Cloudflare Workers (Hono): license, billing, version check |\n\n**Codebase**: ~29,000 lines across 86 files. 598 tests.\n\n---\n\n## Build from Source\n\n```bash\nflutter pub get          # Dependencies\nflutter analyze          # Static analysis (0 issues)\nflutter test             # Run tests (598 tests)\nflutter build macos --release  # Build\n./scripts/install.sh     # Install to /Applications\n./scripts/create_styled_dmg.sh  # Create DMG\n\n# Native library (after modifying native_input.m)\ncd native_lib \u0026\u0026 clang -dynamiclib -framework Cocoa -framework Carbon \\\n  -framework AVFoundation -framework AudioToolbox -framework CoreAudio \\\n  -framework Accelerate -o libnative_input.dylib native_input.m -fobjc-arc\n```\n\n---\n\n## Security\n\n- **Offline Mode** — Audio never leaves your device\n- **Credentials** — API keys stored in SharedPreferences (local, not synced); export/backup includes plaintext keys with explicit user confirmation\n- **Logging** — User speech content never logged by default; developer mode logs may include input/output text for debugging\n- **Independent Review** — Passed 4 rounds of independent third-party security review\n\n---\n\n## License\n\nCopyright © 2025-2026 Leon Xu (云梦泽). All Rights Reserved.\n\nSee [LICENSE](./LICENSE) for full terms. Source code is publicly visible for\ntransparency, user trust, and security review — this is **not** an open-source\nlicense.\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\n# 子曰 SpeakOut\n\n  **macOS 离线优先 AI 语音输入**\n  *按住按键，说话，自动输入。*\n\n  [下载最新版](https://github.com/4over7/SpeakOut/releases/latest) · [Wiki](https://github.com/4over7/SpeakOut/wiki) · [更新日志](CHANGELOG.md)\n\n\u003c/div\u003e\n\n---\n\n## 功能亮点\n\n### 语音输入\n- **完全离线可用** — 无需账号、无需联网、无需 API Key，安装即用。8 款本地模型基于 [Sherpa-ONNX](https://github.com/k2-fsa/sherpa-onnx)，中英识别准确率媲美云端，音频不出设备\n- **三种工作模式** — 纯离线（隐私优先）/ 智能（离线识别 + AI 润色）/ 云端（高精度）\n- **两种触发方式** — 按住说话（PTT）或单击切换（Toggle）；PTT 和 Toggle 可共用一个键\n- **预分段识别** — 录音中检测到 3 秒停顿自动后台解码，停止时只等最后一段，显著减少等待\n\n### 11 种语言 + 口译\n- 中英日韩粤 + 西法德俄葡，支持输入/输出自动检测\n- **口译模式** — 输入中文→输出英文等任意组合，LLM 自动翻译（需智能模式）\n\n### ⚡ 超能力（热键驱动）\n- **闪念笔记** — 独立热键，语音直接保存为 Markdown，按天归档到自定义目录\n- **AI 梳理** — 选中文字按快捷键，LLM 深度重组逻辑结构并追加在原文下一行\n- **即时翻译** — 按住说话自动翻译为目标语言，不影响正常录音\n- **纠错反馈** — 发现识别错误，改完按反馈键，LLM 对比最近录音自动学入词汇表\n- **AI 一键调试** — 按住截屏+语音描述 bug，自动发送到绑定的 Claude Code / Cursor 窗口（最多 5 槽位）\n\n### 云端服务（可选增强）\n- **6 家云端 ASR** — 阿里云百炼（DashScope 实时）、Groq、OpenAI、火山引擎、讯飞、腾讯云\n- **12 家 LLM** — 百炼、DeepSeek、豆包、OpenAI、Claude、智谱、Kimi、MiniMax、Gemini、讯飞、Groq、Ollama 本地\n- **服务商预置** — 新用户打开云账户即可看到完整列表，点击配置即用\n- **账户导入/导出** — 跨设备迁移，JSONL 格式，含凭证\n\n### 专业词汇 \u0026 安全\n- **行业词典 + 个人词库** — 术语注入 LLM 实现领域感知\n- **API 密钥本地存储** — SharedPreferences，不上云、不同步；导出备份含明文密钥需用户确认\n- **签名公证** — Developer ID 签名 + Apple 公证，下载双击即用，无 Gatekeeper 警告\n\n## 安装\n\n1. 从 [Releases](https://github.com/4over7/SpeakOut/releases/latest) 下载 `SpeakOut.dmg`\n2. 拖到 `/Applications`\n3. 首次启动前：`xattr -cr /Applications/SpeakOut.app`\n4. 授权：**输入监控**、**辅助功能**、**麦克风**\n5. 按引导下载语音模型即可使用\n\n**系统要求**：macOS 13+，磁盘空间 230MB ~ 1.4GB（取决于模型选择）\n\n---\n\n## Contact\n\n\u003ca href=\"https://x.com/4over7\"\u003e\u003cimg src=\"https://img.shields.io/badge/X-@4over7-000?logo=x\" alt=\"X\" /\u003e\u003c/a\u003e\n\n\u003cimg src=\"assets/wx.jpg\" width=\"200\" alt=\"WeChat\" /\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F4over7%2Fspeakout","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F4over7%2Fspeakout","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F4over7%2Fspeakout/lists"}