Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hirofumi0810/tensorflow_end2end_speech_recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
https://github.com/hirofumi0810/tensorflow_end2end_speech_recognition
asr attention-mechanism automatic-speech-recognition beam-search csj ctc end-to-end end-to-end-learning joint-ctc-attention librispeech speech-recognition speech-to-text tensorflow timit timit-dataset
Last synced: about 2 months ago
JSON representation
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
- Host: GitHub
- URL: https://github.com/hirofumi0810/tensorflow_end2end_speech_recognition
- Owner: hirofumi0810
- License: mit
- Created: 2017-05-24T09:35:21.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-01-23T02:05:10.000Z (almost 7 years ago)
- Last Synced: 2024-08-08T23:21:34.487Z (6 months ago)
- Topics: asr, attention-mechanism, automatic-speech-recognition, beam-search, csj, ctc, end-to-end, end-to-end-learning, joint-ctc-attention, librispeech, speech-recognition, speech-to-text, tensorflow, timit, timit-dataset
- Language: Python
- Homepage:
- Size: 4.17 MB
- Stars: 313
- Watchers: 34
- Forks: 120
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## TensorFlow Implementation of End-to-End Speech Recognition
### Requirements
- TensorFlow >= 1.3.0
- tqdm >= 4.14.0
- python-Levenshtein >= 0.12.0
- setproctitle >= 1.1.10
- seaborn >= 0.7.1### Corpus
#### [TIMIT](https://catalog.ldc.upenn.edu/LDC93S1)
- Phone (39, 48, 61 phones)
- character#### [LibriSpeech](http://www.openslr.org/12/)
- Phone (under implementation)
- Character
- Word#### [CSJ (Corpus of Spontaneous Japanese)](http://pj.ninjal.ac.jp/corpus_center/csj/en/)
- Phone (under implementation)
- Japanese kana character (about 150 classes)
- Japanese kanji characters (about 3000 classes)These corpuses will be added in the future.
- Switchboard
- WSJ
- [AMI](http://groups.inf.ed.ac.uk/ami/corpus/)This repository does'nt include pre-processing and pre-processing is based on [this repo](https://github.com/hirofumi0810/asr_preprocessing).
If you want to do pre-processing, please look at this repo.### Model
#### Encoder
- BLSTM
- LSTM
- BGRU
- GRU
- VGG-BLSTM
- VGG-LSTM
- Multi-task BLSTM
- you can set another CTC layer to the aubitrary layer.
- Multi-task LSTM
- VGG#### Connectionist Temporal Classification (CTC) [\[Graves+ 2006\]](http://dl.acm.org/citation.cfm?id=1143891)
- Greedy decoder
- Beam Search decoder
- Beam Search decoder w/ CharLM (under implementation)##### Options
- Frame-stacking [\[Sak+ 2015\]](https://arxiv.org/abs/1507.06947)
- Multi-GPUs training (synchronous)
- Splicing
- Down sampling (under implementation)#### Attention Mechanism
##### Decoder
- Greedy decoder
- Beam search decoder (under implementation)##### Attention type
- Bahdanau's content-based attention
- Bahdanau's normed content-based attention (under implementation)
- location-based attention
- Hybrid attention
- Luong's dot attention
- Luong's scaled dot attention (under implementation)
- Luong's general attention
- Luong's concat attention
- Baidu's attention (under implementation)###### Options
- Sharpning
- Temperature regularization in the softmax layer (Output posteriors)
- Joint CTC-Attention [\[Kim 2016\]](https://arxiv.org/abs/1609.06773.)
- Coverage (under implementation)### Usage
Please refer to docs in each corpuse
- TIMIT
- LibriSpeech
- CSJ### Lisense
MIT### Contact
[email protected]