https://github.com/hirofumi0810/tensorflow_end2end_speech_recognition

End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
https://github.com/hirofumi0810/tensorflow_end2end_speech_recognition

asr attention-mechanism automatic-speech-recognition beam-search csj ctc end-to-end end-to-end-learning joint-ctc-attention librispeech speech-recognition speech-to-text tensorflow timit timit-dataset

Last synced: 8 months ago
JSON representation

End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)

Host: GitHub
URL: https://github.com/hirofumi0810/tensorflow_end2end_speech_recognition
Owner: hirofumi0810
License: mit
Created: 2017-05-24T09:35:21.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2018-01-23T02:05:10.000Z (over 7 years ago)
Last Synced: 2024-08-08T23:21:34.487Z (11 months ago)
Topics: asr, attention-mechanism, automatic-speech-recognition, beam-search, csj, ctc, end-to-end, end-to-end-learning, joint-ctc-attention, librispeech, speech-recognition, speech-to-text, tensorflow, timit, timit-dataset
Language: Python
Homepage:
Size: 4.17 MB
Stars: 313
Watchers: 34
Forks: 120
Open Issues: 11
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        ## TensorFlow Implementation of End-to-End Speech Recognition

### Requirements

- TensorFlow >= 1.3.0

- tqdm >= 4.14.0

- python-Levenshtein >= 0.12.0

- setproctitle >= 1.1.10

- seaborn >= 0.7.1

### Corpus

#### [TIMIT](https://catalog.ldc.upenn.edu/LDC93S1)

- Phone (39, 48, 61 phones)

- character

#### [LibriSpeech](http://www.openslr.org/12/)

- Phone (under implementation)

- Character

- Word

#### [CSJ (Corpus of Spontaneous Japanese)](http://pj.ninjal.ac.jp/corpus_center/csj/en/)

- Phone (under implementation)

- Japanese kana character (about 150 classes)

- Japanese kanji characters (about 3000 classes)

These corpuses will be added in the future.

- Switchboard

- WSJ

- [AMI](http://groups.inf.ed.ac.uk/ami/corpus/)

This repository does'nt include pre-processing and pre-processing is based on [this repo](https://github.com/hirofumi0810/asr_preprocessing).

If you want to do pre-processing, please look at this repo.

### Model

#### Encoder

- BLSTM

- LSTM

- BGRU

- GRU

- VGG-BLSTM

- VGG-LSTM

- Multi-task BLSTM

  - you can set another CTC layer to the aubitrary layer.

- Multi-task LSTM

- VGG

#### Connectionist Temporal Classification (CTC) [\[Graves+ 2006\]](http://dl.acm.org/citation.cfm?id=1143891)

- Greedy decoder

- Beam Search decoder

- Beam Search decoder w/ CharLM (under implementation)

##### Options

- Frame-stacking [\[Sak+ 2015\]](https://arxiv.org/abs/1507.06947)

- Multi-GPUs training (synchronous)

- Splicing

- Down sampling (under implementation)

#### Attention Mechanism

##### Decoder

- Greedy decoder

- Beam search decoder (under implementation)

##### Attention type

- Bahdanau's content-based attention

- Bahdanau's normed content-based attention (under implementation)

- location-based attention

- Hybrid attention

- Luong's dot attention

- Luong's scaled dot attention (under implementation)

- Luong's general attention

- Luong's concat attention

- Baidu's attention (under implementation)

###### Options

- Sharpning

- Temperature regularization in the softmax layer (Output posteriors)

- Joint CTC-Attention [\[Kim 2016\]](https://arxiv.org/abs/1609.06773.)

- Coverage (under implementation)

### Usage

Please refer to docs in each corpuse

- TIMIT

- LibriSpeech

- CSJ

### Lisense

MIT

### Contact

[email protected]

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hirofumi0810/tensorflow_end2end_speech_recognition

Awesome Lists containing this project

README