Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pannous/tensorflow-speech-recognition
🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
https://github.com/pannous/tensorflow-speech-recognition
deep-learning neural-network speech-recognition speech-to-text stt tensorflow
Last synced: 4 days ago
JSON representation
🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
- Host: GitHub
- URL: https://github.com/pannous/tensorflow-speech-recognition
- Owner: pannous
- License: other
- Created: 2015-12-07T10:02:43.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2024-01-17T14:27:13.000Z (about 1 year ago)
- Last Synced: 2025-01-10T20:08:08.740Z (11 days ago)
- Topics: deep-learning, neural-network, speech-recognition, speech-to-text, stt, tensorflow
- Language: Python
- Homepage:
- Size: 31.1 MB
- Stars: 2,163
- Watchers: 189
- Forks: 637
- Open Issues: 33
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# Tensorflow Speech Recognition
Speech recognition using google's [tensorflow](https://github.com/tensorflow/tensorflow/) deep learning framework, [sequence-to-sequence](https://www.tensorflow.org/versions/master/tutorials/seq2seq/index.html) neural networks.Replaces [caffe-speech-recognition](https://github.com/pannous/caffe-speech-recognition), see there for some background.
## Update 2024: Use **Whisper** !
This (relatively) old project is NO LONGER UP TO DATE.
The tensorflow 1.0 used is not compatible anymore and the theory is no longer state of the art either.
We highly recommend you check out and use [whisper](https://github.com/ggerganov/whisper.cpp)## Update 2020: **Mozilla** released [DeepSpeech](https://github.com/mozilla/DeepSpeech)
They achieve good [error rates](http://doyouunderstand.me). Free Speech is in good hands, go *there* if you are an end user.
For now *this* project is only maintained for educational purposes.## Ultimate goal
Create a decent standalone speech recognition for Linux etc.
Some people say we have the models but not enough training data.
We disagree: There is plenty of training data (100GB [here](http://www.openslr.org/12) and 21GB [here on openslr.org](http://www.openslr.org/7/) , synthetic Text to Speech snippets, Movies with transcripts, Gutenberg, YouTube with captions etc etc) we just need a simple yet powerful model. It's only a question of time...![Sample spectrogram, That's what she said, too laid?](images/0_Karen_160.png)
Sample spectrogram, Karen uttering 'zero' with 160 words per minute.
## Installation
### clone code
```
git clone https://github.com/pannous/tensorflow-speech-recognition
cd tensorflow-speech-recognition
git clone https://github.com/pannous/layer.git
git clone https://github.com/pannous/tensorpeers.git
```### pyaudio
#### requirements portaudio from http://www.portaudio.com/
```
git clone https://git.assembla.com/portaudio.git
./configure --prefix=/path/to/your/local
make
make install
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/your/local/lib
export LIDRARY_PATH=$LIBRARY_PATH:/path/to/your/local/lib
export CPATH=$CPATH:/path/to/your/local/include
source ~/.bashrc
```
#### install pyaudio
```
pip install pyaudio
```## Getting started
Toy examples:
`./number_classifier_tflearn.py`
`./speaker_classifier_tflearn.py`Some less trivial architectures:
`./densenet_layer.py`Later:
`./train.sh`
`./record.py`![Sample spectrogram or record.py](images/spectrogram.demo.png)
Update: Nervana [demonstrated](https://www.youtube.com/watch?v=NaqZkV_fBIM) that it is possible for 'independents' to build speech recognizers that are state of the art.
### Fun tasks for newcomers
* Watch video : https://www.youtube.com/watch?v=u9FPqkuoEJ8
* Understand and correct the corresponding code: [lstm-tflearn.py](/lstm-tflearn.py)
* Data Augmentation : create on-the-fly modulation of the data: increase the speech frequency, add background noise, alter the pitch etc,...### Extensions
**Extensions** to current tensorflow which are probably needed:
* [WarpCTC on the GPU](https://github.com/baidu-research/warp-ctc/tree/master/tensorflow_binding) see [issue](https://github.com/tensorflow/tensorflow/issues/2146)
* Incremental collaborative snapshots ('[P2P learning](https://github.com/pannous/tensorpeers)') !
* Modular graphs/models + persistanceEven though this project is far from finished we hope it gives you some starting points.
Looking for a tensorflow collaboration / consultant / deep learning contractor? Reach out to [[email protected]](mailto:[email protected]?subject=contractor)