{"id":49666717,"url":"https://github.com/myuan19/voiceinput","last_synced_at":"2026-06-08T03:06:20.887Z","repository":{"id":356091663,"uuid":"1193238999","full_name":"myuan19/voiceInput","owner":"myuan19","description":"Windows AI 语音输入🎙 — 按快捷键说话即输入，支持润色。摆脱打字限制，实现无拘束、高效率的表达。","archived":false,"fork":false,"pushed_at":"2026-05-28T18:45:08.000Z","size":3201,"stargazers_count":36,"open_issues_count":4,"forks_count":6,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-28T20:23:40.977Z","etag":null,"topics":["asr","dashscope","productivity","pyqt6","python","qwen-asr","speech-to-text","tool","voice-input","windows"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/myuan19.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-27T02:34:17.000Z","updated_at":"2026-05-28T18:45:07.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/myuan19/voiceInput","commit_stats":null,"previous_names":["myuan19/voiceinput"],"tags_count":26,"template":false,"template_full_name":null,"purl":"pkg:github/myuan19/voiceInput","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/myuan19%2FvoiceInput","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/myuan19%2FvoiceInput/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/myuan19%2FvoiceInput/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/myuan19%2FvoiceInput/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/myuan19","download_url":"https://codeload.github.com/myuan19/voiceInput/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/myuan19%2FvoiceInput/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33646428,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-29T02:00:06.066Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","dashscope","productivity","pyqt6","python","qwen-asr","speech-to-text","tool","voice-input","windows"],"created_at":"2026-05-06T17:01:30.063Z","updated_at":"2026-06-08T03:06:20.881Z","avatar_url":"https://github.com/myuan19.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# VoiceInput ⌨️🎙️\n\n[![Stars](https://img.shields.io/github/stars/myuan19/voiceInput?style=flat-square)](https://github.com/myuan19/voiceInput/stargazers)\n[![Release](https://img.shields.io/github/v/release/myuan19/voiceInput?style=flat-square)](https://github.com/myuan19/voiceInput/releases/latest)\n[![Downloads](https://img.shields.io/github/downloads/myuan19/voiceInput/total?style=flat-square)](https://github.com/myuan19/voiceInput/releases)\n[![License](https://img.shields.io/github/license/myuan19/voiceInput?style=flat-square)](LICENSE)\n[![Platform](https://img.shields.io/badge/platform-Windows-blue?style=flat-square)]()\n[![Python](https://img.shields.io/badge/Python-3.12-blue?style=flat-square)]()\n\nWindows 上的 AI 语音输入工具 —— 按下快捷键，说话即输入。代替键盘输入，解放双手，原本5min才能梳理清楚的需求，现在30s一步到位直达输入框。\n\n\u003e 基于阿里云 DashScope（通义千问 ASR），延迟低、识别准，免费额度足够日常使用。\n\n## 💡 核心价值\n\n本工具旨在实现与 AI 的自然交互：用户可通过日常语言自由表达自己的想法；AI 会自动将语音内容转化为清晰、可读的需求描述，并直接填入输入框。这才是本工具的根本目标——代替键盘的主要交互手段，让用户摆脱打字限制，实现无拘束、高效率的表达。\n\n## ✨ 特性\n\n- **全局快捷键** — 任意应用中按 `Ctrl+Shift+R`（可改）开始/停止录音，自动输入文字到光标处，同时加入剪切板。\n- **智能润色** — 可选 LLM 润色，修正口语、补标点、去语气词；支持多模型切换（qwen3.5-flash / qwen-flash / qwen-plus / qwen-max）\n- **顶部悬浮指示器** — 极简 mini 条，悬停水滴式展开；录音时波形与停止按钮；**长按停止按钮可作废本条**，短按仍为正常结束；录音中悬停可查看当前麦克风和润色模型\n- **系统托盘** — 切换模式、润色模型、快捷键、API Key、麦克风（含系统默认设备名）；保存录音文件开关；查看历史与日志目录\n- **稳健体验** — 单实例防重复启动；未配置 Key 时引导配置；麦克风不可用时持续提示；转录进行中屏蔽重复触发；麦克风断开自动停止并提示\n- **音效反馈** — 开始 / 停止 / 完成提示音\n\n## 🎬 演示\n\n|   | 说明 | 演示 |\n|:-:|------|:----:|\n| demo1 | 日常可用作语音输入工具：聊天回复、搜资料、写文档 | ![demo1](./docs/PixPin_2026-03-27_21-40-35.gif) |\n| demo2 | 推荐的工作流程：开启录音，一边说一边梳理项目架构，然后按下快捷键结束，直接把需求粘贴到对话框 | ![demo2](./docs/PixPin_2026-03-27_21-54-22.gif) |\n| demo3 | 用于代替 Cursor 只支持英文的语音输入 | ![demo3](./docs/PixPin_2026-03-27_22-10-19.gif) |\n\n## 🚀 快速开始\n\n1. 前往 [Releases](https://github.com/myuan19/voiceInput/releases) 下载最新发行包（便携压缩包或单文件，以 Release 页面说明为准）\n2. 解压或运行后，右键托盘图标进入 **API Key**，填入 DashScope Key\n   [获取 API Key](https://bailian.console.aliyun.com/cn-beijing/?tab=model#/api-key)\n\n```bash\ngit clone https://github.com/myuan19/voiceInput.git\ncd voiceInput\nuv .venv\n.venv\\Scripts\\activate\nuv pip install -r src/requirements.txt\nset DASHSCOPE_API_KEY=sk-xxxxxxxx\npython -u src\\main.py\n```\n\n## 📖 使用说明\n\n| 操作           | 说明                                                                          |\n| -------------- | ----------------------------------------------------------------------------- |\n| 默认快捷键     | 开始/停止录音（可在托盘菜单中修改）                                           |\n| 左键托盘图标   | 开始/停止录音（可在配置中关闭）                                               |\n| 右键托盘图标   | 菜单：模式、润色模型、设备、快捷键、Key、保存录音、重置指示器位置、历史、日志、退出 |\n| 悬停顶部指示器 | 展开面板：录音、润色开关、是否弹出原文                                        |\n| 录音中悬停     | 显示当前使用的麦克风和润色模型                                                |\n| 录音中         | 短按停止键 — 结束并识别；长按至环形走完 — 作废本条                          |\n\n**模式**\n\n- **纯转录** — 语音直接转文字\n- **智能润色** — ASR 后再经 LLM 润色\n\n## ⚙️ 配置\n\n配置文件：`%USERPROFILE%\\.voiceinput\\config.json`（首次运行自动生成）。\n\n| 配置项                   | 说明                                                     | 默认值                     |\n| ------------------------ | -------------------------------------------------------- | -------------------------- |\n| `hotkey`               | 全局快捷键（支持区分左右修饰键，如 `lctrl+lshift+r`） | `lctrl+lshift+r`         |\n| `trigger_mode`         | 触发模式                                                 | `toggle`                 |\n| `mode`                 | `transcribe` / `polish`                              | `transcribe`             |\n| `custom_prompt`        | 自定义润色提示（预留）                                   | 空                         |\n| `language`             | 语言                                                     | `auto`                   |\n| `api_key`              | DashScope API Key                                        | 空（可用环境变量）         |\n| `api_base_url`         | API 基地址                                               | 官方默认                   |\n| `asr_model`            | ASR 模型 ID                                              | `qwen3-asr-flash-2026-02-10` |\n| `polish_models`        | 润色模型菜单（`[{ \"id\", \"label\" }, ...]`）               | 见 `config.py` 出厂列表  |\n| `enabled_polish_models` | 启用并显示在托盘菜单中的润色模型 ID 列表               | `qwen3.6-flash`、`qwen3.6-plus`、`qwen3.7-max` |\n| `polish_model`         | 当前选中的润色模型 ID                                    | `qwen3.6-flash`          |\n| `mic_index`            | 麦克风设备索引（自动随名称解析，无需手动修改）           | `null`（默认设备）       |\n| `paste_result`         | 识别后粘贴到光标                                         | `true`                   |\n| `restore_clipboard`    | 粘贴后还原剪贴板                                         | `false`                  |\n| `simulate_keypresses`  | 模拟按键（预留）                                         | `false`                  |\n| `tray_click_to_record` | 托盘左键即录音                                           | `true`                   |\n| `play_sounds`          | 音效                                                     | `true`                   |\n| `save_history`         | 保存历史                                                 | `true`                   |\n| `save_audio`           | 是否保存每条原始录音（WAV，存于历史目录）                | `false`                  |\n| `mini_window_x`        | 指示器水平锚点（像素，可清空以重置）                     | `null`                   |\n\n环境变量 `DASHSCOPE_API_KEY` 可与配置文件同时使用（配置中为空时会尝试读取）。\n\n开发者向说明见 [`_docs/开发者文档/配置文件开发参考.md`](_docs/开发者文档/配置文件开发参考.md)；发版迁移见 [`_docs/config-versioning.md`](_docs/config-versioning.md)。\n\n日志目录：`%USERPROFILE%\\.voiceinput\\logs\\`（每次启动一个新文件，包含从启动到退出的完整记录，含 WARNING / ERROR 等）。\n\n## 🗂️ 项目结构\n\n```\nsrc/\n├── main.py              # 入口（单实例）\n├── config.py\n├── core/\n│   ├── engine.py        # 录音 / ASR / 润色 / 注入\n│   ├── recorder.py\n│   ├── asr.py\n│   ├── polisher.py\n│   ├── injector.py\n│   ├── history.py\n│   └── log.py\n└── ui/\n    ├── tray.py\n    ├── mini_window.py\n    ├── waveform_widget.py\n    ├── icons.py\n    ├── sounds.py\n    └── theme.py\n```\n\n## 🛠 技术栈\n\n- Python 3.12 + PyQt6\n- DashScope SDK（ASR：通义千问）+ OpenAI 兼容 API（LLM 润色）\n- PyAudio、pynput、NumPy、loguru\n\n## ⭐ Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=myuan19/voiceInput\u0026type=Date)](https://star-history.com/#myuan19/voiceInput\u0026Date)\n\n## 📄 License\n\n项目遵循 [MIT](LICENSE)\n\n---\n\n感谢 [LINUX DO](https://linux.do/) 社区","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmyuan19%2Fvoiceinput","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmyuan19%2Fvoiceinput","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmyuan19%2Fvoiceinput/lists"}