Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pannous/caffe-speech-recognition
Speech Recognition with the Caffe deep learning framework, migrating to
https://github.com/pannous/caffe-speech-recognition
Last synced: 18 days ago
JSON representation
Speech Recognition with the Caffe deep learning framework, migrating to
- Host: GitHub
- URL: https://github.com/pannous/caffe-speech-recognition
- Owner: pannous
- Created: 2014-12-15T17:03:12.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2018-11-20T12:44:05.000Z (over 5 years ago)
- Last Synced: 2024-05-02T01:29:06.870Z (about 2 months ago)
- Language: Jupyter Notebook
- Homepage: https://github.com/pannous/tensorflow-speech-recognition
- Size: 62.4 MB
- Stars: 323
- Watchers: 45
- Forks: 126
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Lists
- Awesome-Caffe - Speech Recognition
README
Speech Recognition with BVLC caffe
==================================Speech Recognition with the [caffe](https://github.com/BVLC/caffe) deep learning framework
UPDATE: We are migrating to [tensorflow](https://github.com/pannous/tensorflow-speech-recognition/)
This project is quite fresh and only the first of three milestones is accomplished:
Even now it might be useful if you just want to train a handful of commands/options (1,2,3..yes/no/cancel/...)1) training spoken **numbers**:
* get spectogram training images from http://pannous.net/spoken_numbers.tar (470 MB)
* start ./train.sh
* test with `ipython notebook test-speech-recognition.ipynb`
or `caffe test ...` or `/python/classify.py`
* 99% accuracy, nice!
* online recognition and learning with `./recognition-server.py` and `./record.py` scripts![Sample spectrogram, That's what she said, too laid?](https://raw.githubusercontent.com/pannous/caffe-speech-recognition/master/0_Karen_160.png)
Sample spectrogram, Karen uttering 'zero' with 160 words per minute.
2) training **words**:
* 4GB of training [data](https://dl.dropboxusercontent.com/u/23615316/spoken_words.tar)
* net topology: work in progress ...
* todo: use [upcoming new](https://github.com/BVLC/caffe/issues/1653) caffe [LSTM](https://en.wikipedia.org/wiki/Long_short_term_memory) layers etc
* UPDATE [LSTMs get rolling](https://github.com/BVLC/caffe/pull/1873), [still not merged](https://github.com/BVLC/caffe/pull/2033)
* UPDATE since the caffe project leaders have a hindering merging policy and this pull request was shifted many times without ever being merged, we are migrating to [tensorflow](https://github.com/pannous/tensorflow-speech-recognition)
* todo: add extra categories for a) silence b) common noises like typing, achoo c) ALL other noises3) training **speech**:
* todo!
* 100GB of training data here: http://www.openslr.org/12/
* [TIMIT dataset](https://catalog.ldc.upenn.edu/memberships) $27,000.00 membership fee or [$250 for non-members](https://catalog.ldc.upenn.edu/LDC93S1)+[$2400 under research-only license](https://catalog.ldc.upenn.edu/LDC2016MNP)?
* combine with google n-gramsTheoretical background: **papers**
A. Graves and N. Jaitly. Towards end-to-end speech recognition with recurrent neural networks. [In ICML, 2014](https://duckduckgo.com/l/?kh=-1&uddg=http%3A%2F%2Fjmlr.org%2Fproceedings%2Fpapers%2Fv32%2Fgraves14.pdf)
O. Vinyals, S. V. Ravuri, and D. Povey. Revisiting recurrent neural networks for robust ASR. [In ICASSP, 2012](http://research.microsoft.com/pubs/164627/4085.pdf)
[Andrew Ng et al](http://arxiv.org/pdf/1406.7806.pdf) / [Baidu](http://arxiv.org/abs/1412.5567)
[Hinton et al / Toronto](http://www.cs.toronto.edu/~hinton/absps/RNN13.pdf)
[good old Hinton](http://psych.stanford.edu/~jlm/pdfs/Hinton12IEEE_SignalProcessingMagazine.pdf)
[Schmidhuber et al](http://arxiv.org/pdf/1402.3511v1.pdf) using new 'ClockWork-RNNs'
The **book**:
[Automatic Speech Recognition: A Deep Learning Approach](http://www.amazon.com/Automatic-Speech-Recognition-Communication-Technology/dp/1447157788/ref=sr_1_1?ie=UTF8&qid=1422013427&sr=8-1&keywords=speech+recognition) (Signals and Communication Technology) Hardcover – November 11, 2014 by Dong Yu (Author) and Li Deng (Author)**Related work**
Also see the [Kaldi](http://kaldi.sourceforge.net/about.html) project, which seems a bit messy but already uses deep learning with [LSTM](https://en.wikipedia.org/wiki/Long_short_term_memory)
Another experimental LSTM network, which works out-of-the-box: [Currennt](http://sourceforge.net/projects/currennt/)