{"id":29688794,"url":"https://github.com/modelscope/kws-training-suite","last_synced_at":"2025-07-23T05:38:46.821Z","repository":{"id":95634842,"uuid":"583200017","full_name":"modelscope/kws-training-suite","owner":"modelscope","description":null,"archived":false,"fork":false,"pushed_at":"2023-05-26T06:25:17.000Z","size":53577,"stargazers_count":129,"open_issues_count":9,"forks_count":25,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-07-21T15:10:06.297Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/modelscope.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-12-29T04:04:16.000Z","updated_at":"2025-07-17T06:43:56.000Z","dependencies_parsed_at":"2025-07-21T15:20:19.779Z","dependency_job_id":null,"html_url":"https://github.com/modelscope/kws-training-suite","commit_stats":null,"previous_names":["modelscope/kws-training-suite"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/modelscope/kws-training-suite","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2Fkws-training-suite","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2Fkws-training-suite/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2Fkws-training-suite/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2Fkws-training-suite/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/modelscope","download_url":"https://codeload.github.com/modelscope/kws-training-suite/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2Fkws-training-suite/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266625389,"owners_count":23958306,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-23T02:00:09.312Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-23T05:38:46.120Z","updated_at":"2025-07-23T05:38:46.792Z","avatar_url":"https://github.com/modelscope.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## 概览\n### 软件架构\n![](https://intranetproxy.alipay.com/skylark/lark/0/2022/jpeg/2639/1670305061456-81c162b4-c564-4744-b45c-160c4087230c.jpeg)\n### 输入输出\n![](https://intranetproxy.alipay.com/skylark/lark/0/2022/jpeg/2639/1670304405390-09fd554c-d272-4611-9174-db8244bf15a4.jpeg)\n\n### 训练流程\n![](https://intranetproxy.alipay.com/skylark/lark/0/2022/jpeg/2639/1669865969613-8e2651a3-2286-40fa-bd18-4304b386ab21.jpeg)\n\n## 数据准备\n### 数据分类\n训练需要的数据大致分如下几类，格式除特殊说明外要求为采样率16000Hz的单声道PCM编码.wav文件。\n\n- 带标注的唤醒词音频\n- 负样本音频\n- 噪声音频（单通道/多通道）\n\n其中负样本音频和单通道噪声音频，可以使用套件内置的自动下载开源数据的功能，提供的数据集有三个：AISHELL2，DNS-Challenge，musan，数据总大小170G，打包成zip包约130G。\n建议用户针对实际使用场景录制一些音频，如果完全依靠开源数据，训练出的模型在用户实际场景下很难达到理想的性能。\n\n#### 唤醒词音频\n唤醒词音频文件，通常是众包采集的背景安静，发音清晰的唤醒词语音。\n数据量：至少需要 100 人 * 100 句 = 10000条数据，每一条单独保存一个文件。\n数据量越多越好，总数据量相同的情况下，人数越多越好。\n##### 数据打标\n把音频中的唤醒词信息通过工具自动标注出来，供模型学习。\n如下图，上半部分是唤醒词音频，下半部分是对应标注信息。\n![image.png](https://intranetproxy.alipay.com/skylark/lark/0/2022/png/2639/1670412114735-aa18ea8f-72b9-43d0-bb17-5d8363ca3f16.png#clientId=ua84797a3-1e78-4\u0026crop=0\u0026crop=0\u0026crop=1\u0026crop=1\u0026from=paste\u0026height=244\u0026id=uf6abdb13\u0026margin=%5Bobject%20Object%5D\u0026name=image.png\u0026originHeight=488\u0026originWidth=621\u0026originalType=binary\u0026ratio=1\u0026rotation=0\u0026showTitle=false\u0026size=37706\u0026status=done\u0026style=none\u0026taskId=ua14540ae-9e73-4dd3-8081-aad584bbe85\u0026title=\u0026width=310.5)\n进入kws-training-scripts目录，使用force_align.py命令，可以加-h参数看帮助，以下为示例：\n\n- -t 表示并发工作线程数\n- /data/wav 是存放唤醒词音频的目录，不同的唤醒词应该分在不同目录存放，每次只能处理一个唤醒词\n- 天猫精灵 是唤醒词，每次只能指定一个\n```python\npython force_align.py -t 10 /data/wav 天猫精灵\n```\n\n##### 真实线上数据的利用\n如果用户的唤醒词已经有在线服务，积累了一批线上真实唤醒音频，把这些数据加入训练很很有帮助。但是线上唤醒场景不可控，可能有很多杂音干扰，需要经过筛选之后才能使用。\n推荐方法是先用少量唤醒词音频训练一个初级唤醒模型，利用此模型对线上音频做筛选和打标。\n\n#### 负样本音频\n不含唤醒词的清晰人声音频，可以从[开源数据库](http://www.openslr.org/)中获取。\n\n#### 噪声音频\n单通道噪声音频，可包括音乐，扫地机，吸尘器等各种噪声；也可以是设备中播放出的音乐、故事、电视节目等。\n多通道噪声音频，应当是真实设备录制的多通道音频。\n建议用户准备噪音音频时覆盖尽可能多的场景，每种场景的音频时长应当在8小时以上，并切分成时长1分钟的片段。\n想要达到比较好的效果，建议准备常规噪声数据如电视节目，音乐等100小时以上；另外特殊场景，如扫地机的噪音，风噪声 要20小时以上。\n\n### 准备音频文件列表\n训练程序是通过音频文件列表读取数据的，以上各类音频文件在本地准备好以后，要分别生成音频文件列表。\n列表是一个文本文件(.txt)，其中每一行是一条wav音频的本地绝对路径。\n\n## 搭建环境\n### 环境要求\n#### 硬件配置：\n\n- 64 CPU 48G内存 ——此为推荐值，配置越高训练越快\n- 1 GPU(Tesla P4或以上)  6G显存\n- 400G存储空间\n#### 软件环境：\n\n- CUDA \u003e= 11.0\n- Java SDK \u003e= 8\n- Python \u003e= 3.7\n- Pytorch \u003e= 1.11\n- ModelScope \u003e= 1.1\n\n以上配置支持60并发，整个训练流程约耗时3天。\n\n参考数据：\n\n- 60并发跑第一次500轮耗时35小时；第二次训练200轮预计10小时\n- 测试速度取决于测试数据量，测试 50 个模型约 10小时；\n- 总体预计60+小时\n\n#### 网络环境\n程序运行过程中需要从网络存储下载开源数据，连接ModelScope网站更新模型等数据，所以需要连接公开网络。\n需要连接的域名如下：\n\n```\n*.aliyuncs.com\n*.modelscope.cn\n```\n\n### 选项A. 推荐使用Docker镜像\n使用ModelScope提供的docker镜像，上面已经预装好了模型训练所需的Python环境和ModelScope框架。\n\n```\n# CPU版本：\nregistry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-py37-torch1.11.0-tf1.15.5-1.1.0\n# GPU版本: \nregistry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.3.0-py37-torch1.11.0-tf1.15.5-1.1.0\n```\n\n#### 安装其他依赖\n安装官方docker镜像中缺少的依赖：\n```shell\napt-get update\napt-get install unzip\napt-get install openjdk-11-jdk\n```\n然后可直接开始“验证内置唤醒模型推理”\n\n### 选项B. 手动安装ModelScope和相关依赖\n#### Python环境配置\n建议使用Anaconda配置Python环境，具体安装步骤可参考其[官方文档](https://docs.anaconda.com/anaconda/install)\n执行如下Anaconda命令为ModelScope创建对应的python环境（要求python版本 \u003e=3.7）\n\n```python\nconda create -n modelscope python=3.7\nconda activate modelscope\n```\n检查python和pip命令是否切换到conda环境下：\n\n```python\nwhich python\n# ~/anaconda3/envs/modelscope/bin/python\nwhich pip\n# ~/anaconda3/envs/modelscope/bin/pip\n```\n\n由于anaconda环境默认的pip版本较低，建议先升级到新版：\n\n```python\npython -m pip install --upgrade pip\n```\n\n#### 安装PyTorch\n本模型已经在PyTorch 1.8~1.11下测试通过，可执行以下命令指定安装PyTorch v1.11：（如果下载安装速度较慢可指定阿里云、清华等国内pypi镜像）\n```\npip install torch==1.11 torchaudio torchvision\n```\n\n#### 安装libsndfile1\n本模型的pipeline中使用了三方库SoundFile进行wav文件处理，**在Linux系统上用户需要手动安装SoundFile的底层依赖库libsndfile**，在Windows和MacOS上会自动安装不需要用户操作。详细信息可参考[SoundFile官网](https://github.com/bastibe/python-soundfile#installation)。以Ubuntu系统为例，用户需要执行如下命令:\n\n```shell\nsudo apt-get update\nsudo apt-get install libsndfile1\n```\n\n#### 安装ModelScope和语音模型相关依赖\n\n```\npip install \"modelscope[audio]\" -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html\n```\n\n### 验证内置唤醒模型推理\n如果以上步骤执行正常，环境安装成功的话，以下python代码应该能运行通过，并且打印出5次唤醒信息。\n**注意：**以下代码运行时需要从modelscope网站下载模型数据，因此需要确保网络正常。\n\n```python\nfrom modelscope.pipelines import pipeline\nfrom modelscope.utils.constant import Tasks\n\n\nkws = pipeline(\n    Tasks.keyword_spotting,\n    model='damo/speech_dfsmn_kws_char_farfield_16k_nihaomiya')\n# you can also use local file path\nresult = kws('https://modelscope.oss-cn-beijing.aliyuncs.com/test/audios/3ch_nihaomiya.wav')\nprint(result)\n```\n\n### 准备训练套件\n本训练套件在ModelScope的模型训练能力之外封装了数据打标，配置，模型转换，测试，训练流程组织等功能，使用户可以一键完成训练。\n\n#### 训练mini模型\n训练套件提供了**try_me.py**脚本，可以自动下载一个不到200M大小的数据包，并生成对应的配置文件。使用这个配置文件启动训练，可以在1小时内完成训练流程，得到一个可以在安静场景下唤醒的模型，唤醒词是“你好米雅”。\n运行命令如下，请把其中的`/your/test_dir`替换为您的真实路径，下载和生成的所有数据都会保存在其中。\n\n```shell\n# 进入训练套件目录\ncd kws-training-scripts\n# 运行脚本，准备数据和配置文件\n# 参数threads指定训练时的并发线程数，请根据您实际可用的cpu数量配置\npython try_me.py threads /your/test_dir\n# 运行训练套件\npython pipeline.py -1 /your/test_dir/config.yml\n```\n\n##### 运行成功后输出信息如下：\n前面很多行是模型唤醒率(51/57)和误唤醒率(5/0)测试信息，正式训练时会从几十个模型中测试挑选最佳模型，会输出很多类似信息，一般不需要关注。\n红框中是最终选择的最优模型信息：\n\n- .pth是pytorch格式保存的模型，可以使用此模型继续训练\n- .txt是唤醒工具使用的模型参数文件，供后续测试使用\n- model kw frr and level一行表示\n   - 建议此模型使用时唤醒阈值设置为`0.86`\n   - 此时唤醒词'0_ni_hao_mi_ya'在所有测试场景中的综合拒识率为`0.14035087719298245`，换算成唤醒率约为`1 - 0.14035087719298245 = 0.86`\n\n![image.png](https://intranetproxy.alipay.com/skylark/lark/0/2022/png/2639/1670226175459-10893a4c-6f62-4815-930e-25784a63c84e.png#clientId=uafcf068e-9738-4\u0026crop=0\u0026crop=0\u0026crop=1\u0026crop=1\u0026from=paste\u0026height=794\u0026id=u993996b4\u0026margin=%5Bobject%20Object%5D\u0026name=image.png\u0026originHeight=1588\u0026originWidth=2146\u0026originalType=binary\u0026ratio=1\u0026rotation=0\u0026showTitle=false\u0026size=372161\u0026status=done\u0026style=none\u0026taskId=u50c56780-d0b0-4909-a9c2-ddf530241ad\u0026title=\u0026width=1073)\n\n##### 测试mini模型\n可以复制一份唤醒工具配置文件，\n\n```\n# 以下命令仍然都在唤醒套件目录下运行\n# 复制唤醒工具配置\ncp /your/test_dir/tmp.conf .\n```\n手工修改配置文件中的唤醒模型路径，指向上面生成的模型参数文件(.txt)，例如：\n\n```\n# 唤醒模型路径。\nkws_model_base = /your/test_dir/first_txt/top_01_checkpoint_0399_loss_train_0.1136_loss_val_0.1098.txt\n```\n运行唤醒工具，参数分别为配置文件，测试音频，处理后的输出音频。\n输出信息：\n\n* detected x 中x表示唤醒id\n* kw表示唤醒词\n* spot, bestend, duration都是唤醒时间信息\n* confidence是置信度\n* bestch是通道选择信息\n\n```\n ./bin/SoundConnect ./tmp.conf test.wav ./output.wav\n# 以下为输出\n[detected 0], kw: 0_xiao_ai_tong_xue, spot: 13.219999, bestend: 13.219999, duration: [12.139999-12.940000], confidence: 0.926316, bestch: 0\n[detected 1], kw: 0_xiao_ai_tong_xue, spot: 31.699999, bestend: 31.699999, duration: [30.660000-31.420000], confidence: 0.914814, bestch: 0\n[detected 2], kw: 0_xiao_ai_tong_xue, spot: 40.899998, bestend: 40.899998, duration: [39.899998-40.619999], confidence: 0.853534, bestch: 0\n```\n\n## 配置和运行\n### 配置\n各配置项和说明，参见[《KWS训练套件配置说明》](https://github.com/alibaba-damo-academy/kws-training-suite/blob/main/HOW_TO_CONFIG.md)。\n### 运行\n进入kws-training-scripts目录，运行以下命令：\n\n```shell\n# 通过设置环境变量指定希望使用的GPU的id序号，从 0 开始\nexport CUDA_VISIBLE_DEVICES=gpu_id\n# config.yml 为训练配置文件\n# --remote_dataset 指定需要下载第三方开源数据集\n# /data/open_dataset 是用户指定的数据集存放目录，需要至少300G磁盘空间\n# 程序支持断点续传和智能判断，之前已经下载过的话不会重复下载\npython pipeline.py config.yml --remote_dataset /data/open_dataset\n```\n\n### 步骤和产物\n\n- 检查数据，生成最终训练配置\n- 训练阶段，实时读取原始数据，生成训练数据，训练模型，每轮生成的模型checkpoint保存成.pth文件，放在$work_dir/first，默认训练500轮\n- 训练完毕后，从所有模型checkpoint中挑选loss最小的一批（约20%），转换为推理格式.txt文件，保存在$work_dir/first_txt\n- 每个模型都用测试集测试各场景唤醒率和误唤醒率，汇总结果存放在$work_dir/first_roc，详细结果存放在$work_dir/first_roc_eval\n- 综合唤醒率和误唤醒率结果对模型进行排序后存放在$work_dir/first_roc_sort\n- 给出排序第一名的模型\n\n第一轮和第二轮产物相同，第一轮存放路径前缀为first，第二轮为second\n\n## 其他工具\n### kws模型打标工具\n套件中提供的kws_align.py脚本可以利用唤醒模型对线上音频做筛选和打标，处理后的数据即可用来训练正式模型。\n\n脚本调用方式如下：\n\n```\nkws_align.py [-h] -m MODEL_TXT [-o OUT_DIR] [-t THREADS] input keyword_desc\n其中：\ninput\t\t\t是线上音频所在目录\nkeyword_desc\t是唤醒词描述符，针对”小爱同学“模型是 0_xiao_ai_tong_xue,1,2,3,4\n-m MODEL_TXT\t从附件中zip包解压得到txt模型参数文件\n-o OUT_DIR\t\t指定生成的音频数据输出目录\n-t THREADS\t\t并发处理线程数\n```\n例如：\n`python kws_align.py /your/audio/data/ 0_xiao_ai_tong_xue,1,2,3,4 -m top_28_checkpoint_0089.txt`\n\n### 手工测试模型\n如果训练中途想测试模型性能，可以按如下步骤操作。\n#### 准备\n新建一个test_dir，把训练用config.yml 复制过来，并把work dir配置项改成test_dir。也可以根据硬件资源调整workers配置项，并发workers越多测试越快。\n\n#### 转换模型\n\n```\n# train_dir 为训练目录\npython print_model.py /train_dir/first/checkpoint_xx.pth \u003e /test_dir/txt/checkpoint_xxx.txt \n```\n如果想同时测试多个模型，则重复执行以上命令。\n\n#### 测试\n脚本会逐个测试 /test_dir/txt/下的所有模型，中间结果保存在/test_dir/roc_eval/下，最终结果保存在 /test_dir/roc目录下。\n```\nPYTHONPATH=. python evaluate/batch_roc.py /test_dir/config.yml /test_dir/txt/\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodelscope%2Fkws-training-suite","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmodelscope%2Fkws-training-suite","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodelscope%2Fkws-training-suite/lists"}