{"id":28926773,"url":"https://github.com/obsidianplusplus/speechrecognition","last_synced_at":"2026-04-29T09:04:49.494Z","repository":{"id":298775667,"uuid":"915502258","full_name":"obsidianplusplus/SpeechRecognition","owner":"obsidianplusplus","description":"一个基于 PyQt 和 Python，用于将音频文件转换为文本工具。支持多种 Whisper 模型选择、语言设置和 GPU 加速 | A tool based on PyQt and Python for converting audio files to text. It supports various Whisper model selections, language settings, and GPU acceleration.","archived":false,"fork":false,"pushed_at":"2025-01-12T02:29:59.000Z","size":4,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-22T12:12:54.655Z","etag":null,"topics":["audio","audio-to-text","huggingface","model","recognition","speech","speech-recognition","text","to","transformers","whisper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/obsidianplusplus.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-01-12T02:29:55.000Z","updated_at":"2025-04-16T01:16:52.000Z","dependencies_parsed_at":"2025-06-12T22:09:16.895Z","dependency_job_id":null,"html_url":"https://github.com/obsidianplusplus/SpeechRecognition","commit_stats":null,"previous_names":["obsidianplusplus/speechrecognition"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/obsidianplusplus/SpeechRecognition","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/obsidianplusplus%2FSpeechRecognition","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/obsidianplusplus%2FSpeechRecognition/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/obsidianplusplus%2FSpeechRecognition/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/obsidianplusplus%2FSpeechRecognition/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/obsidianplusplus","download_url":"https://codeload.github.com/obsidianplusplus/SpeechRecognition/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/obsidianplusplus%2FSpeechRecognition/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32418192,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T06:29:02.080Z","status":"ssl_error","status_checked_at":"2026-04-29T06:29:00.631Z","response_time":110,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","audio-to-text","huggingface","model","recognition","speech","speech-recognition","text","to","transformers","whisper"],"created_at":"2025-06-22T12:12:04.716Z","updated_at":"2026-04-29T09:04:49.487Z","avatar_url":"https://github.com/obsidianplusplus.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003ch1\u003e🎙️ 音频文件转文字工具\u003c/h1\u003e\n  \u003cp\u003e一个基于 PyQt 和 Transformers 的简单易用的音频转文字桌面应用。\u003c/p\u003e\n\u003c/div\u003e\n\n## 🌟 功能特性\n\n- 🎧 支持多种音频格式 (wav, mp3, ogg)\n- 📝 将音频文件快速转换为文本\n- ⚙️ 可选择不同的 Whisper 模型 (openai/whisper-large-v3, openai/whisper-medium, openai/whisper-small, openai/whisper-tiny)\n- 🌐 支持多种语言 (中文、英文、法语、德语、西班牙语、日语、韩语)\n- 🚀 可选 GPU 加速 (如果可用)\n- 💾 支持将转录结果保存为 .txt 文件\n- 📊 精确的转录进度显示\n- ✨ 简洁友好的用户界面\n\n## 🛠️ 安装指南\n\n1. **克隆或下载仓库:**\n\n   ```bash\n   git clone https://github.com/loveboyme/SpeechRecognition\n\n1. **创建虚拟环境 (推荐):**\n\n   ```\n   python -m venv venv\n   source venv/bin/activate  # On Linux/macOS\n   venv\\Scripts\\activate  # On Windows\n   ```\n\n2. **安装依赖:**\n\n   ```\n   pip install -r requirements.txt\n   ```\n\n   或者，你可以手动安装以下:\n\n   ```\n   pip install PyQt5 transformers torch librosa numpy\n   ```\n\n3. **运行应用:**\n\n   ```\n   python SpeechRecognition.py\n   ```\n\n## ⚙️ 使用方法\n\n1. **选择模型:** 在下拉菜单中选择你想要使用的 Whisper 模型。更大的模型通常提供更高的准确性，但也需要更多的计算资源。模型文件将会在首次使用时下载并缓存。\n2. **选择语言:** 选择音频中使用的语言。\n3. **打开音频文件:** 点击 \"打开音频文件\" 按钮，选择你要转录的音频文件。\n4. **开始转录:** 点击 \"开始转录\" 按钮开始转录过程。你可以在进度条中查看转录进度。\n5. **查看结果:** 转录的文本将显示在下方的文本框中。\n6. **保存:** 点击 \"保存\" 按钮将转录结果保存到 .txt 文件。\n\n## 📂 模型文件\n\n- 模型文件将下载并存储在当前工作目录下的 model 文件夹中。\n\n## 📝 依赖\n\n\n- [PyQt5](https://www.riverbankcomputing.com/software/pyqt/intro)：用于创建图形用户界面。\n- [Transformers](https://huggingface.co/transformers)：Hugging Face 提供的用于自然语言处理的库，包括 Whisper 模型。\n- [Torch](https://pytorch.org/)：一个开源的深度学习框架，用于运行 Whisper 模型。\n- [Librosa](https://librosa.org/)：一个用于音频和音乐分析的 Python 库。\n- [Numpy](https://numpy.org/)：用于科学计算的 Python 库。\n\n## 💡 注意事项\n\n- 首次使用某个模型时，可能需要一些时间来下载模型文件。\n- 如果你的系统有可用的 NVIDIA GPU 并且正确安装了 CUDA，应用将尝试使用 GPU 进行加速，从而加快转录速度。\n- 转录的准确性可能受到音频质量、背景噪音和所选模型的影响。\n\n## 🙏 感谢\n\n感谢 [Hugging Face](https://huggingface.co/) 提供的 Transformers 库和预训练模型。\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fobsidianplusplus%2Fspeechrecognition","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fobsidianplusplus%2Fspeechrecognition","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fobsidianplusplus%2Fspeechrecognition/lists"}