https://github.com/tugstugi/pytorch-speech-commands
Speech commands recognition with PyTorch | Kaggle 10th place solution in TensorFlow Speech Recognition Challenge
https://github.com/tugstugi/pytorch-speech-commands
cifar10 classification deep-learning densenet dual-path-networks kaggle neural-network pytorch resnet resnext speech-recognition wide-residual-networks
Last synced: 14 days ago
JSON representation
Speech commands recognition with PyTorch | Kaggle 10th place solution in TensorFlow Speech Recognition Challenge
- Host: GitHub
- URL: https://github.com/tugstugi/pytorch-speech-commands
- Owner: tugstugi
- Created: 2018-01-20T14:07:04.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2024-01-19T11:31:41.000Z (over 1 year ago)
- Last Synced: 2025-03-24T10:56:25.569Z (about 1 month ago)
- Topics: cifar10, classification, deep-learning, densenet, dual-path-networks, kaggle, neural-network, pytorch, resnet, resnext, speech-recognition, wide-residual-networks
- Language: Python
- Homepage:
- Size: 62.5 KB
- Stars: 198
- Watchers: 3
- Forks: 47
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Convolutional neural networks for [Google speech commands data set](https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html)
with [PyTorch](http://pytorch.org/).# General
We, [xuyuan](https://github.com/xuyuan) and [tugstugi](https://github.com/tugstugi), have participated
in the Kaggle competition [TensorFlow Speech Recognition Challenge](https://www.kaggle.com/c/tensorflow-speech-recognition-challenge)
and reached the 10-th place. This repository contains a simplified and cleaned up version of our team's code.# Features
* `1x32x32` mel-spectrogram as network input
* single network implementation both for CIFAR10 and Google speech commands data sets
* faster audio data augmentation on STFT
* Kaggle private LB scores evaluated on 150.000+ audio files# Results
Due to time limit of the competition, we have trained most of the nets with `sgd` using `ReduceLROnPlateau` for 70 epochs.
For the training parameters and dependencies, see [TRAINING.md](TRAINING.md). Earlier stopping the train process will sometimes produce a better score in Kaggle.Model
CIFAR10
test set
accuracy
Speech Commands
test set
accuracy
Speech Commands
test set
accuracy with crop
Speech Commands
Kaggle private LB
score
Speech Commands
Kaggle private LB
score with crop
RemarksVGG19 BN
93.56%
97.337235%
97.527432%
0.87454
0.88030ResNet32
-
96.181419%
96.196050%
0.87078
0.87419WRN-28-10
-
97.937089%
97.922458%
0.88546
0.88699WRN-28-10-dropout
96.22%
97.702999%
97.717630%
0.89580
0.89568WRN-52-10
-
98.039503%
97.980980%
0.88159
0.88323
another trained model has 97.52%/0.89322ResNext29 8x64
-
97.190929%
97.161668%
0.89533
0.89733
our best model during competitionDPN92
-
97.190929%
97.249451%
0.89075
0.89286DenseNet-BC (L=100, k=12)
95.52%
97.161668%
97.147037%
0.88946
0.89134DenseNet-BC (L=190, k=40)
-
97.117776%
97.147037%
0.89369
0.89521# Results with Mixup
After the competition, some of the networks were retrained using [mixup: Beyond Empirical Risk Minimization](https://arxiv.org/abs/1710.09412) by Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin and David Lopez-Paz.
Model
CIFAR10
test set
accuracy
Speech Commands
test set
accuracy
Speech Commands
test set
accuracy with crop
Speech Commands
Kaggle private LB
score
Speech Commands
Kaggle private LB
score with crop
RemarksVGG19 BN
-
97.483541%
97.542063%
0.89521
0.89839WRN-52-10
-
97.454279%
97.498171%
0.90273
0.90355
same score as the 16-th place in Kaggle