https://github.com/tugstugi/pytorch-speech-commands

Speech commands recognition with PyTorch | Kaggle 10th place solution in TensorFlow Speech Recognition Challenge
https://github.com/tugstugi/pytorch-speech-commands

cifar10 classification deep-learning densenet dual-path-networks kaggle neural-network pytorch resnet resnext speech-recognition wide-residual-networks

Last synced: 14 days ago
JSON representation

Speech commands recognition with PyTorch | Kaggle 10th place solution in TensorFlow Speech Recognition Challenge

Host: GitHub
URL: https://github.com/tugstugi/pytorch-speech-commands
Owner: tugstugi
Created: 2018-01-20T14:07:04.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2024-01-19T11:31:41.000Z (over 1 year ago)
Last Synced: 2025-03-24T10:56:25.569Z (about 1 month ago)
Topics: cifar10, classification, deep-learning, densenet, dual-path-networks, kaggle, neural-network, pytorch, resnet, resnext, speech-recognition, wide-residual-networks
Language: Python
Homepage:
Size: 62.5 KB
Stars: 198
Watchers: 3
Forks: 47
Open Issues: 3
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        Convolutional neural networks for [Google speech commands data set](https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html)

with [PyTorch](http://pytorch.org/).

# General

We, [xuyuan](https://github.com/xuyuan) and [tugstugi](https://github.com/tugstugi), have participated

in the Kaggle competition [TensorFlow Speech Recognition Challenge](https://www.kaggle.com/c/tensorflow-speech-recognition-challenge)

and reached the 10-th place. This repository contains a simplified and cleaned up version of our team's code.

# Features

* `1x32x32` mel-spectrogram as network input

* single network implementation both for CIFAR10 and Google speech commands data sets

* faster audio data augmentation on STFT

* Kaggle private LB scores evaluated on 150.000+ audio files

# Results

Due to time limit of the competition, we have trained most of the nets with `sgd` using `ReduceLROnPlateau` for 70 epochs.

For the training parameters and dependencies, see [TRAINING.md](TRAINING.md). Earlier stopping the train process will sometimes produce a better score in Kaggle.

^_Model

^{_{CIFAR10
test set
accuracy}}

^{_{Speech Commands
test set
accuracy}}

^{_{Speech Commands
test set
accuracy with crop}}

^{_{Speech Commands
Kaggle private LB
score}}

^{_{Speech Commands
Kaggle private LB
score with crop}}

^_Remarks

^{_{VGG19 BN}}

^_93.56%

^_97.337235%

^_97.527432%

^_0.87454

^_0.88030



^_ResNet32

^_-

^_96.181419%

^_96.196050%

^_0.87078

^_0.87419



^_WRN-28-10

^_-

^_97.937089%

^_97.922458%

^_0.88546

^_0.88699



^{_{WRN-28-10-dropout}}

^_96.22%

^_97.702999%

^_97.717630%

^_0.89580

^_0.89568



^_WRN-52-10

^_-

^_98.039503%

^_97.980980%

^_0.88159

^_0.88323

^{_{another trained model has 97.52%/0.89322}}

^{_{ResNext29 8x64}}

^_-

^_97.190929%

^_97.161668%

^_0.89533

^_0.89733

^{_{our best model during competition}}

^_DPN92

^_-

^_97.190929%

^_97.249451%

^_0.89075

^_0.89286



^{_{DenseNet-BC (L=100, k=12)}}

^_95.52%

^_97.161668%

^_97.147037%

^_0.88946

^_0.89134



^{_{DenseNet-BC (L=190, k=40)}}

^_-

^_97.117776%

^_97.147037%

^_0.89369

^_0.89521



# Results with Mixup

After the competition, some of the networks were retrained using [mixup: Beyond Empirical Risk Minimization](https://arxiv.org/abs/1710.09412) by Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin and David Lopez-Paz.

^_Model

^{_{CIFAR10
test set
accuracy}}

^{_{Speech Commands
test set
accuracy}}

^{_{Speech Commands
test set
accuracy with crop}}

^{_{Speech Commands
Kaggle private LB
score}}

^{_{Speech Commands
Kaggle private LB
score with crop}}

^_Remarks

^{_{VGG19 BN}}

^_-

^_97.483541%

^_97.542063%

^_0.89521

^_0.89839



^_WRN-52-10

^_-

^_97.454279%

^_97.498171%

^_0.90273

^_0.90355

^{_{same score as the 16-th place in Kaggle}}

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tugstugi/pytorch-speech-commands

Awesome Lists containing this project

README