{"id":14989219,"url":"https://github.com/zthxxx/python-speech_recognition","last_synced_at":"2025-04-12T00:31:28.224Z","repository":{"id":53066978,"uuid":"68082739","full_name":"zthxxx/python-Speech_Recognition","owner":"zthxxx","description":"A simple example for use speech recognition baidu api with python.","archived":false,"fork":false,"pushed_at":"2021-04-08T08:03:42.000Z","size":513,"stargazers_count":115,"open_issues_count":0,"forks_count":40,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-25T20:51:11.419Z","etag":null,"topics":["pyaudio","python","scipy","speech","speech-recognition"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zthxxx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-09-13T06:52:46.000Z","updated_at":"2024-10-31T03:37:37.000Z","dependencies_parsed_at":"2022-09-08T11:51:25.507Z","dependency_job_id":null,"html_url":"https://github.com/zthxxx/python-Speech_Recognition","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zthxxx%2Fpython-Speech_Recognition","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zthxxx%2Fpython-Speech_Recognition/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zthxxx%2Fpython-Speech_Recognition/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zthxxx%2Fpython-Speech_Recognition/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zthxxx","download_url":"https://codeload.github.com/zthxxx/python-Speech_Recognition/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248501238,"owners_count":21114636,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pyaudio","python","scipy","speech","speech-recognition"],"created_at":"2024-09-24T14:17:53.308Z","updated_at":"2025-04-12T00:31:27.761Z","avatar_url":"https://github.com/zthxxx.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## python语音识别项目\n----------------------------------------------\n`python3.5` `语音识别` `百度语音API`\n\n[![Build Status](https://api.travis-ci.org/zthxxx/python-Speech_Recognition.png?branch=master)](https://travis-ci.org/zthxxx/python-Speech_Recognition)\n[![Coverage Status](https://coveralls.io/repos/github/zthxxx/python-Speech_Recognition/badge.svg?branch=master)](https://coveralls.io/github/zthxxx/python-Speech_Recognition?branch=master)\n[![Code Climate](https://codeclimate.com/github/zthxxx/python-Speech_Recognition/badges/gpa.svg)](https://codeclimate.com/github/zthxxx/python-Speech_Recognition)\n\n### 项目简介\n\n本项目使用 python3.5，包管理使用 pip3.5，用 pyaudio 录音， numpy 计算， scipy 滤波， pylab 绘制波形与频谱。\n\n\n### 项目环境\n\n推介在与项目根目录同级目录内通过 virtualenv 建立 python 虚拟环境：\n```bash?linenums\nvirtualenv --no-site-packages venv\n```\n第一次会自动安装一些虚拟环境文件，安装完后再激活虚拟环境，  \nWindows 环境下使用：\n```bash\nvenv\\Scripts\\activate\n```\nLinux 环境下使用：\n```bash\nsource venv/bin/activate\n```\n\ncd 回到项目根目录中，项目依赖都写在 requirements.txt 中，  \n\n#### Windows\n\n在我的 Win 10 环境中 **`numpy 1.11.1+mkl`** 和 **`scipy 0.18.1`** 两个包都不能通过 pip 安装成功，  \n因此我去[加利福尼亚大学镜像源](http://www.lfd.uci.edu/~gohlke/pythonlibs/)下载了 \n[numpy](http://www.lfd.uci.edu/~gohlke/pythonlibs/dp2ng7en/numpy-1.11.2rc1+mkl-cp35-cp35m-win_amd64.whl) \n[scipy](http://www.lfd.uci.edu/~gohlke/pythonlibs/dp2ng7en/scipy-0.18.1-cp35-cp35m-win_amd64.whl) \n这两个包的 win 下支持的 .whl 文件并先通过 pip 安装：\n```bash\npip install numpy-1.11.1+mkl-cp35-cp35m-win_amd64.whl\npip install scipy-0.18.1-cp35-cp35m-win_amd64.whl\n```\n然后再用 pip 安装 requirements.txt 的依赖：\n```bash\npip install -r requirements.txt\n```\n全部通过安装后才算是建立好项目环境了  \n\n#### Ubuntu 14.04 trusty\n\n由于项目使用 Travis-CI 的系统是 ubuntu 14.04 trusty，  \n因此 ubuntu 下的安装依赖可以参考 `.travis.yml` 中使用 `travis_env_init.sh` 进行安装，  \n在上面 virtualenv 环境中，项目根目录下，使用以下命令执行安装：  \n```bash\nsudo source travis_env_init.sh\n```\n或者\n```bash\nsudo ./travis_env_init.sh\n```\n预安装完成后，再 `pip install -r requirements.txt` 安装剩下的依赖。 \n安装脚本本来是针对 Travis 的环境，并非写的兼容的，  \n如有其他包安装失败，请手动排查。  \n\n\n### 配置说明\n\n本项目使用 [百度语音识别 API](http://yuyin.baidu.com/docs/asr/57)，  \n所以请先去 [百度语音开放平台](http://yuyin.baidu.com/) 建立工程，申请 API key、 Secret key，  \n具体申请过程可参见 [玩转百度语音识别，就是这么简单](http://www.cnblogs.com/bigdataZJ/p/SpeechRecognition.html) 这篇文章。  \n`./BaiduSpeech` 目录下的 `BaiduOAuthSample.ini` 是配置示例文件，先复制 `BaiduOAuthSample.ini` 为 `BaiduOAuth.ini`，  \n再按照文件示例中对应位置填写自己的 `api_key` `secret_key`，键值间等号左右各空一格，值项无需引号，  \n若已有 token 可填写 `access_token` 项。\n\n\n### 使用说明\n\n根目录中的 `SpeechRecognise.py` 为语音识别启动文件\n```bash\npython3.5 SpeechRecognise.py\n```\n启动后对准话筒说话，控制台将输出识别结果。(距离话筒的远近与话筒灵敏度相关)\n\n\n### 结构说明\n\n根目录中的 `SpeechRecognise.py` 为语音识别启动文件，  \n`WaveOperate` 包中封装了一些对声卡的常用操作，如：  \n录音、播放、保存文件、读取文件、绘制声波、绘制频谱、声音滤波 等操作。  \n`BaiduSpeech` 包中为对 Baidu API 调用的封装，其中 `BaiduOAuth.ini` 为百度 API key 配置文件。  \n`UnitTest` 包中为各模块的单元测试文件，\n在项目根目录下执行命令 `python3.5 -m nose -vs --with-coverage` 将自动执行单元测试并输出结果和覆盖率。\n\n\n### 项目思路\n\n本项目语音识别的思路是：  \n\n1. 麦克风阵列录音产生音频流  \n2. 语音增强  \n    2.1 音频流实时带通滤波，除去低音和高音  \n    2.2 通过过零率 ZSR 和短时能量 Ep 进行 VAD 语音端点检测  \n    2.3 切分判断有人声说话的音频部分  \n3. 去混响(回声)  \n    3.1 双麦技术  \n    3.2 NLMS 自适应滤波  \n4. 背景噪音消除  \n    4.1 双麦背景消除  \n5. 波束形成  \n    5.1 声源信号分离   \n6. 语音提取  \n    6.1 声纹识别  \n7. 语音识别  \n    7.1 百度语音 API  \n    7.2 Google 语音 API  \n    7.3 讯飞语音 API  \n8. 语义分析  \n    8.1 Hanlp 句法词法依存分析  \n    8.2 Boson 句法词法依存分析  \n9. 参数化指令  \n    9.1 Cortana XML 指令解析  \n\n**当然，以上并没有完全实现。。。**\n\n实现的项目：  \n\n- [x] pyaudio录音  \n- [x] 音频流实时带通滤波  \n- [x] 短时能量 Ep 判断  \n- [x] 切分判断有人声说话的音频部分  \n- [x] 语音识别-百度语音 API  \n- [x] Hanlp 依存分析  \n- [x] Boson 依存分析  \n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzthxxx%2Fpython-speech_recognition","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzthxxx%2Fpython-speech_recognition","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzthxxx%2Fpython-speech_recognition/lists"}