{"id":21069465,"url":"https://github.com/lissettecarlr/automaticspeechrecognition","last_synced_at":"2025-05-16T04:34:26.327Z","repository":{"id":203492089,"uuid":"706594770","full_name":"lissettecarlr/AutomaticSpeechRecognition","owner":"lissettecarlr","description":"语音转文本的各类python封装实现（paraformer、whisper_online、whisper_offline、funasr），用于服务kuon仓库","archived":false,"fork":false,"pushed_at":"2025-02-13T07:08:05.000Z","size":907,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-03T20:21:43.434Z","etag":null,"topics":["ai","asr","audio","audio-processing","deepl","paraformer","python","speech-to-text","text","whisper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lissettecarlr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-18T08:57:50.000Z","updated_at":"2025-03-03T06:32:05.000Z","dependencies_parsed_at":null,"dependency_job_id":"c60d4bde-ea6c-4d43-99f7-5a7626c73f12","html_url":"https://github.com/lissettecarlr/AutomaticSpeechRecognition","commit_stats":null,"previous_names":["lissettecarlr/automaticspeechrecognition"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lissettecarlr%2FAutomaticSpeechRecognition","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lissettecarlr%2FAutomaticSpeechRecognition/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lissettecarlr%2FAutomaticSpeechRecognition/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lissettecarlr%2FAutomaticSpeechRecognition/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lissettecarlr","download_url":"https://codeload.github.com/lissettecarlr/AutomaticSpeechRecognition/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254470303,"owners_count":22076566,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","asr","audio","audio-processing","deepl","paraformer","python","speech-to-text","text","whisper"],"created_at":"2024-11-19T18:35:38.348Z","updated_at":"2025-05-16T04:34:21.317Z","avatar_url":"https://github.com/lissettecarlr.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## kunasr\n该仓库用于语音识别，目前有三种实现方式，分别是paraformer、whisper_online、funasr、whisper_offline。主要用于服务[kuon](https://github.com/lissettecarlr/kuon)仓库。\n\n## 依赖\n\n可以直接`pip install -r requirements.txt`安装所有环境，也可以根据选择方式安装\n\n### paraformer\n\n* onnxruntime-gpu 或者 onnxruntime\n* numpy\n* librosa 用于音频分析和处理\n* pyyaml\n* typeguard==2.13.3\n* scipy\n\n### whisper_online\n\n* openai\n* langid\n\n### funasr_client\n\n* websockets\n\n### whisper_offline\n\n* torch\n* faster-whisper\n\n## 配置\n```bash\ncp config.yaml.example config.yaml\n```\n* channel 从paraformer、whisper_online、funasr、whisper_offline中选择一种\n* 如果选择whisper_online，则需要配置openai的key和代理地址\n* 如果选择funasr，则需要配置funasr的服务端地址\n* 如果选择whisper_offline，模型选择：tiny、base、medium、small、large-v2、large-v3、tiny.en、base.en、medium.en、small.en，device选择：cpu、cuda\n\n## 使用\n\n*如果使用funasr，则需要部署服务端，这里推荐使用该方式*\n\n```python\nfrom kuonasr import ASR\ntest = ASR()\ntest.test()\n```\n\n```python\nfrom kuonasr import ASR\nasr = ASR()\ntry:\n    result = asr.convert(\"./kuonasr/audio/asr_example.wav\")\n    print(result)\nexcept Exception as e:\n    print(e)\n```\n\n可以直接执行`python .\\example.py`进行测试。\n\n使用paraformer时：\n![paraformer](./file/paraformer.gif)\n\n使用whisper_online时：\n![whisper_online](./file/whisper_online.gif)\n\n使用funasr时：\n![funasr](./file/funasr.gif)\n\n使用whisper_offline时：\n![whisper_offline](./file/whisper_offline.png)\n\n\n## 关于转换方式\n\n### paraformer\n\n源码来自rapid的[RapidASR仓库](https://github.com/RapidAI/RapidASR/blob/main/README.md)\n\n[模型百度云](https://pan.baidu.com/s/1sY6ENdKcxM-X7bqK07RThg?pwd=kuon)，在paraformer文件夹下的名为asr_paraformerv2的文件，将其放置到kuonasr/paraformer/models文件中。或者去原项目下载。\n\n### whisper_online\n\nopenai的whisper在线语音识别，[官方文档](https://platform.openai.com/docs/guides/speech-to-text)。实际上就是调用接口而已。\n使用时注意将openai升级到最新版本，改动了调用方式。然后需要配置密匙和代理地址。准确率还行，但是速度太慢了。\n\n### funasr\n\n[github仓库](https://github.com/alibaba-damo-academy/FunASR)，需要先部署服务端，这里代码只是客户端进行接口的调用。部署方式可以看官方仓库，也可以参考[笔记](https://blog.kala.love/posts/cbe699d7/)。目前该方式是最优解\n\n### whisper_offline\n\n使用[faster-whisper](https://github.com/SYSTRAN/faster-whisper)进行本地推理\n\n## 报错：\n\n### 1\n```bash\nValueError: An error occurred: unknown format: 3\n```\n输入音频的格式不支持，可以使用sox进行转换，例如\n```bash\nsox test.wav -b 16 -e signed-integer test2.wav\n```\n* [sox的github](https://github.com/chirlu/sox)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flissettecarlr%2Fautomaticspeechrecognition","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flissettecarlr%2Fautomaticspeechrecognition","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flissettecarlr%2Fautomaticspeechrecognition/lists"}