{"id":45974097,"url":"https://github.com/alan890104/sumi","last_synced_at":"2026-02-28T16:01:05.098Z","repository":{"id":340523487,"uuid":"1166431133","full_name":"alan890104/sumi","owner":"alan890104","description":"Sumi — Free, open-source voice dictation for macOS. Local-first Whisper + LLM polish, with built-in free cloud APIs.","archived":false,"fork":false,"pushed_at":"2026-02-28T15:35:27.000Z","size":13648,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-28T15:37:16.349Z","etag":null,"topics":["dictation","llm","local-first","macos","metal","open-source","privacy","productivity","rust","speech-to-text","tauri","voice-dictation","whisper"],"latest_commit_sha":null,"homepage":"https://sumivoice.com","language":"Svelte","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alan890104.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-25T08:07:46.000Z","updated_at":"2026-02-28T15:33:13.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/alan890104/sumi","commit_stats":null,"previous_names":["alan890104/opentypeless","alan890104/sumi"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/alan890104/sumi","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alan890104%2Fsumi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alan890104%2Fsumi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alan890104%2Fsumi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alan890104%2Fsumi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alan890104","download_url":"https://codeload.github.com/alan890104/sumi/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alan890104%2Fsumi/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29941794,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-28T13:49:17.081Z","status":"ssl_error","status_checked_at":"2026-02-28T13:48:50.396Z","response_time":90,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dictation","llm","local-first","macos","metal","open-source","privacy","productivity","rust","speech-to-text","tauri","voice-dictation","whisper"],"created_at":"2026-02-28T16:01:04.301Z","updated_at":"2026-02-28T16:01:05.076Z","avatar_url":"https://github.com/alan890104.png","language":"Svelte","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sumi\n\n![GitHub Release](https://img.shields.io/github/v/release/alan890104/sumi)\n![License](https://img.shields.io/github/license/alan890104/sumi)\n![GitHub stars](https://img.shields.io/github/stars/alan890104/sumi?style=social)\n![GitHub forks](https://img.shields.io/github/forks/alan890104/sumi?style=social)\n![Rust](https://img.shields.io/badge/Rust-black?style=flat-square\u0026logo=rust)\n![Tauri](https://img.shields.io/badge/Tauri_v2-FFC131?style=flat-square\u0026logo=tauri\u0026logoColor=white)\n![Svelte](https://img.shields.io/badge/Svelte_5-FF3E00?style=flat-square\u0026logo=svelte\u0026logoColor=white)\n![macOS](https://img.shields.io/badge/macOS-000000?style=flat-square\u0026logo=apple\u0026logoColor=white)\n\nEnglish | [繁體中文](README_TW.md)\n\n**Voice-to-text that adapts to what you're doing.**\n\nSumi is a macOS app that transcribes your speech and polishes it with AI — automatically adjusting tone and style based on the app you're in. Casual in LINE, professional in Slack, formal email format in Gmail. You define the rules, or let Sumi's built-in presets handle it.\n\nFree, open source, and local-first.\n\n\u003c!-- TODO: Replace with actual demo GIF\n![Sumi Demo](demo.gif)\n--\u003e\n\n## Why Sumi?\n\n### Per-app rules you can customize\n\nMost voice-to-text tools produce the same output everywhere. Sumi detects the frontmost app and URL, then applies prompt rules that shape the AI's output. It ships with 18 built-in presets (Gmail, Slack, Discord, GitHub, VSCode, Terminal, and more), and you can create your own for any app — even by just describing what you want in natural language and letting the LLM generate the rule for you.\n\n**Same speech, different output:**\n\n\u003e You say: *\"um I think the project is kind of behind schedule and we should probably have a meeting to figure out what to do next\"*\n\u003e\n\u003e **In LINE** (chat — casual, natural, may add emoji):\n\u003e I think the project is behind schedule, we should have a meeting to figure out what to do next\n\u003e\n\u003e **In Slack** (professional but approachable, concise):\n\u003e I think the project is behind schedule. We should have a meeting to discuss next steps.\n\u003e\n\u003e **In Gmail** (formal email format with greeting/body/sign-off):\n\u003e Hi,\n\u003e\n\u003e I believe the project is currently behind schedule. Could we schedule a meeting to discuss the next steps?\n\u003e\n\u003e Best regards\n\n### Local-first privacy\n\nRun Whisper + LLM entirely on your Mac's GPU (Metal accelerated). In local mode, your audio and text never leave your device — verifiable because the code is open source.\n\n### Free \u0026 open source\n\nGPLv3 licensed. No subscription, no word limits, no account required. Bring your own API keys for cloud providers if you want faster processing, or use local models for free.\n\n## Features\n\n### Context-Aware AI\n\n- **Per-app prompt rules** — 18 built-in presets covering email (Gmail), chat (Slack, Discord, WhatsApp, Telegram, LINE), code editors (VSCode, Cursor, Antigravity), terminals (Terminal, iTerm2), AI CLI tools (Claude Code, Gemini CLI, Codex CLI, Aider), docs (Notion), developer platforms (GitHub), and social media (X/Twitter). Rules match by app name, bundle ID, or URL.\n- **Multi-match rules** — A single rule can match multiple conditions (e.g. Slack desktop app OR `app.slack.com` in browser).\n- **Create rules with your voice** — Describe what you want in natural language; the LLM generates the structured rule for you.\n- **Edit by Voice** — Select text, press `Ctrl+Option+Z`, speak an instruction (\"translate to English\", \"make it more formal\"), and the AI rewrites the selection in place.\n- **Custom dictionary** — Add proper nouns, names, or domain terms so the AI always gets them right. Dictionary terms are injected into both Whisper and LLM prompts.\n\n### Speech-to-Text\n\n- **Local Whisper** — 7 model variants (large-v3-turbo default, quantized lite, Chinese-tuned, medium, small, base) with Metal GPU acceleration via `whisper-rs`.\n- **Cloud STT** — Bring your own API keys for Groq, OpenAI, Deepgram, Azure, or any custom endpoint.\n- **Silero VAD** — Optional voice activity detection filters out silence and non-speech before transcription.\n- **Zero-latency start** — Audio stream runs continuously; recording starts by flipping an atomic flag with no stream initialization delay.\n\n### AI Polish\n\n- **Local LLM** — 3 models via `llama-cpp-2` with Metal acceleration: Llama 3 Taiwan 8B (~4.9 GB), Qwen 2.5 7B (~4.7 GB), Qwen 3 8B (~5.0 GB).\n- **Cloud LLM** — Groq, OpenRouter, OpenAI, Gemini, GitHub Models, SambaNova, or any OpenAI-compatible endpoint.\n- **Reasoning toggle** — Enable/disable model thinking (e.g. Qwen 3 `\u003cthink\u003e` blocks) per your preference.\n\n### UX\n\n- **Global hotkey** — `Option+Z` to toggle recording (customizable). Press once to start, again to stop and paste.\n- **Floating overlay** — Transparent always-on-top capsule with real-time waveform, elapsed timer, and status.\n- **Auto-paste** — Transcribed text is pasted at your cursor via simulated `Cmd+V`.\n- **Transcription history** — Browse past transcriptions with audio playback and export.\n- **58 languages** — UI available in 58 languages including English, Chinese, Japanese, Korean, Spanish, French, German, and many more.\n- **Menu bar app** — Lives in the menu bar, stays out of your way.\n\n## Comparison\n\n\u003e [!NOTE]\n\u003e This table reflects our best understanding as of the time of writing. Competitors update their features frequently — corrections are welcome via issues or PRs.\n\n| | **Sumi** | Built-in Dictation | Typeless | Wispr Flow | VoiceInk | SuperWhisper |\n|---|---|---|---|---|---|---|\n| **Price** | **Free** | Free | 4K words/wk free, $12-30/mo | 2K words/wk free, $12-15/mo | $25-49 (one-time) | Free trial, ~$8/mo |\n| **Open Source** | ✅ GPLv3 | ❌ | ❌ | ❌ | ✅ GPLv3 | ❌ |\n| **Local STT** | ✅ Whisper+Metal | ✅ Apple Silicon | ❌ Cloud only | ❌ Cloud only | ✅ | ✅ |\n| **Cloud STT** | ✅ BYOK | ❌ | ✅ | ✅ | ✅ Optional | ✅ |\n| **AI Polish** | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ |\n| **Local LLM Polish** | ✅ 3 models | ❌ | ❌ | ❌ | ❌ | ✅ |\n| **Per-App Rules** | ✅ 18 presets + custom | ❌ | ❌ | ✅ Styles | ✅ Power Modes | ✅ Custom modes |\n| **Context-Aware** | ✅ App + URL | ❌ | ✅ App | ✅ App | ✅ App | ✅ Super Mode |\n| **Edit by Voice** | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |\n| **Dictionary** | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ |\n| **History** | ✅ + audio export | ❌ | ✅ | ✅ | ✅ | ✅ |\n| **Platforms** | macOS | macOS, iOS | macOS, Win, iOS, Android | macOS, Win, iOS, Android | macOS | macOS, Win, iOS |\n\n## Installation\n\n### Homebrew\n\n```bash\nbrew tap alan890104/sumi\nbrew install --cask sumi\n```\n\n### Download\n\n1. Download the latest DMG from [GitHub Releases](https://github.com/alan890104/sumi/releases/latest).\n2. Open the DMG and drag **Sumi** into `/Applications`.\n3. Since this app is not notarized by Apple, macOS will flag it. Run in Terminal:\n\n   ```bash\n   xattr -cr /Applications/Sumi.app\n   ```\n\n4. Launch the app. On first launch it will ask for:\n   - **Microphone** access for recording.\n   - **Accessibility** permissions (System Settings \u003e Privacy \u0026 Security \u003e Accessibility) for auto-paste.\n\n### Build from Source\n\n```bash\ngit clone https://github.com/alan890104/sumi.git\ncd sumi\n\n# Run in development mode\ncargo tauri dev\n\n# Build for production (outputs .dmg)\ncargo tauri build\n```\n\nRequires [Rust](https://rustup.rs/) and [Tauri CLI](https://v2.tauri.app/) (`cargo install tauri-cli --version \"^2\"`).\n\n## Usage\n\n1. Start the application. You will see an icon in your Menu Bar.\n2. Focus any text field where you want to type.\n3. Press `Option+Z` (⌥Z) to start recording. A floating indicator appears.\n4. Speak naturally (max 30 seconds).\n5. Press `Option+Z` again to stop.\n6. The transcribed text is pasted at your cursor position.\n\n**Edit by Voice:** Select text, then press `Ctrl+Option+Z` (⌃⌥Z). Speak your instruction (e.g. \"translate to Japanese\"), and the AI will rewrite the selected text accordingly.\n\n## Tech Stack\n\n- **Framework**: Tauri v2\n- **Backend**: Rust\n- **Frontend**: Svelte 5 + TypeScript + Vite\n- **Audio Capture**: `cpal`\n- **Speech Recognition**: `whisper-rs` (local, Metal-accelerated) or cloud API (Groq / OpenAI / Deepgram / Azure)\n- **AI Polishing**: `llama-cpp-2` (local, Metal-accelerated) or cloud API (OpenAI-compatible)\n- **Voice Activity Detection**: Silero VAD\n\n## License\n\nGPLv3\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falan890104%2Fsumi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falan890104%2Fsumi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falan890104%2Fsumi/lists"}