{"id":17258891,"url":"https://github.com/blueloveth/speech_commands_recognition","last_synced_at":"2025-04-14T06:11:51.357Z","repository":{"id":41455450,"uuid":"314294921","full_name":"blueloveTH/speech_commands_recognition","owner":"blueloveTH","description":"CCF练习赛-通用音频分类","archived":false,"fork":false,"pushed_at":"2021-05-31T03:16:18.000Z","size":523,"stargazers_count":21,"open_issues_count":1,"forks_count":4,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-06T03:57:14.383Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/blueloveTH.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-11-19T15:49:30.000Z","updated_at":"2024-08-12T13:44:00.000Z","dependencies_parsed_at":"2022-09-24T13:45:01.758Z","dependency_job_id":null,"html_url":"https://github.com/blueloveTH/speech_commands_recognition","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blueloveTH%2Fspeech_commands_recognition","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blueloveTH%2Fspeech_commands_recognition/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blueloveTH%2Fspeech_commands_recognition/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blueloveTH%2Fspeech_commands_recognition/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/blueloveTH","download_url":"https://codeload.github.com/blueloveTH/speech_commands_recognition/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248830395,"owners_count":21168272,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-15T07:22:29.596Z","updated_at":"2025-04-14T06:11:51.336Z","avatar_url":"https://github.com/blueloveTH.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Speech Commands Recognition\n\n## 内容介绍\n\nhttps://zhuanlan.zhihu.com/p/331833198\n\n## 实验结果\n\n| Local CV Score | Test Score    |\n| -------------- | ------------- |\n| 0.977 ± 0.001​  | 0.975 ± 0.001 |\n\n\n\n本方案基于pytorch和[keras4torch](https://github.com/blueloveTH/keras4torch)。为方便移植到其他框架测试，下面列出了训练用到的主要设定。\n\n## 主要设定\n\n| setting           | value                        |\n| ----------------- | ---------------------------- |\n| features          | 1x32x32 melspectrogram       |\n| model             | wide resnet28                |\n| total parameters  | 36491726                     |\n| epochs            | 40                           |\n| batch size        | 96                           |\n| optimizer         | SGD with momentum            |\n| learning rate     | 1e-2 -\u003e 3e-3 -\u003e 9e-4 -\u003e 8e-5 |\n| L2 regularization | 1e-2                         |\n| label smoothing   | 0.1                          |\n| epoch time        | 82s (1 * RTX 2080Ti)         |\n\n\n\n## 模型结构\n\n![](model_architecture.jpg)\n\n## 运行仓库代码\n\n#### 环境配置\n\n```txt\ntorch\u003e=1.6.0\nkeras4torch==1.1.3\nscikit-learn==0.23.2\n\nlibrosa==0.8.0\n```\n\n如果使用linux系统，需要先执行如下命令才能安装librosa。\n\n```bash\n! sudo apt-get install -y libsndfile1\n```\n\n\n\n#### 数据预处理\n\n确保原始数据被放在data/ 文件夹中，运行preprocess.ipynb。\n\n这些文件的结构如下：\n\n- data/\n  - train/\n  - test/\n- preprocess.ipynb\n- train.ipynb\n\n\n\n#### 训练和预测\n\n在上一步完成的基础上，运行train.ipynb。\n\n结束后，对测试集的预测（概率值）将被保存为一个.npy文件。\n\n\n\n## 问题反馈\n\n+ [Github Issue](https://github.com/blueloveTH/speech_commands_recognition/issues)\n+ Email: blueloveTH@foxmail.com","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblueloveth%2Fspeech_commands_recognition","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fblueloveth%2Fspeech_commands_recognition","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblueloveth%2Fspeech_commands_recognition/lists"}