https://github.com/obsidianplusplus/speechrecognition

一个基于 PyQt 和 Python，用于将音频文件转换为文本工具。支持多种 Whisper 模型选择、语言设置和 GPU 加速 | A tool based on PyQt and Python for converting audio files to text. It supports various Whisper model selections, language settings, and GPU acceleration.
https://github.com/obsidianplusplus/speechrecognition

audio audio-to-text huggingface model recognition speech speech-recognition text to transformers whisper

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/obsidianplusplus/speechrecognition
Owner: obsidianplusplus
Created: 2025-01-12T02:29:55.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-01-12T02:29:59.000Z (over 1 year ago)
Last Synced: 2025-06-22T12:12:54.655Z (about 1 year ago)
Topics: audio, audio-to-text, huggingface, model, recognition, speech, speech-recognition, text, to, transformers, whisper
Language: Python
Homepage:
Size: 3.91 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: readme.md

Awesome Lists containing this project

README

🎙️ 音频文件转文字工具

一个基于 PyQt 和 Transformers 的简单易用的音频转文字桌面应用。

## 🌟 功能特性

- 🎧 支持多种音频格式 (wav, mp3, ogg)
- 📝 将音频文件快速转换为文本
- ⚙️ 可选择不同的 Whisper 模型 (openai/whisper-large-v3, openai/whisper-medium, openai/whisper-small, openai/whisper-tiny)
- 🌐 支持多种语言 (中文、英文、法语、德语、西班牙语、日语、韩语)
- 🚀 可选 GPU 加速 (如果可用)
- 💾 支持将转录结果保存为 .txt 文件
- 📊 精确的转录进度显示
- ✨ 简洁友好的用户界面

## 🛠️ 安装指南

1. **克隆或下载仓库:**

```bash
git clone https://github.com/loveboyme/SpeechRecognition

1. **创建虚拟环境 (推荐):**

```
python -m venv venv
source venv/bin/activate # On Linux/macOS
venv\Scripts\activate # On Windows
```

2. **安装依赖:**

```
pip install -r requirements.txt
```

或者，你可以手动安装以下:

```
pip install PyQt5 transformers torch librosa numpy
```

3. **运行应用:**

```
python SpeechRecognition.py
```

## ⚙️ 使用方法

1. **选择模型:** 在下拉菜单中选择你想要使用的 Whisper 模型。更大的模型通常提供更高的准确性，但也需要更多的计算资源。模型文件将会在首次使用时下载并缓存。
2. **选择语言:** 选择音频中使用的语言。
3. **打开音频文件:** 点击 "打开音频文件" 按钮，选择你要转录的音频文件。
4. **开始转录:** 点击 "开始转录" 按钮开始转录过程。你可以在进度条中查看转录进度。
5. **查看结果:** 转录的文本将显示在下方的文本框中。
6. **保存:** 点击 "保存" 按钮将转录结果保存到 .txt 文件。

## 📂 模型文件

- 模型文件将下载并存储在当前工作目录下的 model 文件夹中。

## 📝 依赖

- [PyQt5](https://www.riverbankcomputing.com/software/pyqt/intro)：用于创建图形用户界面。
- [Transformers](https://huggingface.co/transformers)：Hugging Face 提供的用于自然语言处理的库，包括 Whisper 模型。
- [Torch](https://pytorch.org/)：一个开源的深度学习框架，用于运行 Whisper 模型。
- [Librosa](https://librosa.org/)：一个用于音频和音乐分析的 Python 库。
- [Numpy](https://numpy.org/)：用于科学计算的 Python 库。

## 💡 注意事项

- 首次使用某个模型时，可能需要一些时间来下载模型文件。
- 如果你的系统有可用的 NVIDIA GPU 并且正确安装了 CUDA，应用将尝试使用 GPU 进行加速，从而加快转录速度。
- 转录的准确性可能受到音频质量、背景噪音和所选模型的影响。

## 🙏 感谢

感谢 [Hugging Face](https://huggingface.co/) 提供的 Transformers 库和预训练模型。

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/obsidianplusplus/speechrecognition

Awesome Lists containing this project

README

🎙️ 音频文件转文字工具