An open API service indexing awesome lists of open source software.

https://github.com/HawkAaron/RNN-Transducer

MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks
https://github.com/HawkAaron/RNN-Transducer

asr end-to-end mxnet rnn-transducer rnnt-joint rnnt-model sequence-transduction speech-recognition timit transducers

Last synced: 4 months ago
JSON representation

MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks

Awesome Lists containing this project

README

          

# End-to-End Speech Recognition using RNN-Transducer
## File description
* eval.py: rnnt joint model decode
* model.py: rnnt model, which contains acoustic / phoneme model
* model2012.py: rnnt model refer to Graves2012
* seq2seq/*: seq2seq with attention
* rnnt_np.py: rnnt loss function implementation on mxnet, support for both symbol and gluon [refer to PyTorch implementation](https://github.com/awni/transducer)
* DataLoader.py: data process
* train.py: rnnt training script, can be initialized from CTC and PM model
* train_ctc.py: ctc training script
* train_att.py: attention training script

## Directory description
* conf: kaldi feature extraction config

## Reference Paper
* RNN Transducer (Graves 2012): [Sequence Transduction with Recurrent Neural Networks](https://arxiv.org/abs/1211.3711)
* RNNT joint (Graves 2013): [Speech Recognition with Deep Recurrent Neural Networks](https://arxiv.org/abs/1303.5778 )
* E2E criterion comparison (Baidu 2017): [Exploring Neural Transducers for End-to-End Speech Recognition](https://arxiv.org/abs/1707.07413)
* Seq2Seq-Attention: [Attention-Based Models for Speech Recognition](https://arxiv.org/abs/1506.07503)

## Run
* Compile RNNT Loss
Follow the instructions in [here](https://github.com/HawkAaron/mxnet-transducer/tree/master) to compile MXNET with RNNT loss.

* Extract feature
link kaldi timit example dirs (`local` `steps` `utils` )
excute `run.sh` to extract 40 dim fbank feature
run `feature_transform.sh` to get 123 dim feature as described in Graves2013

* Train RNNT model:
```bash
python train.py --lr 1e-3 --bi --dropout .5 --out exp/rnnt_bi_lr1e-3 --schedule
```

## Evaluation
Default only for RNNT
* Greedy decoding:
```
python eval.py --bi
```
* Beam search:
```
python eval.py --bi --beam
```

## Results
* CTC

| Decode | PER |
| --- | --- |
| greedy | 20.36 |
| beam 100 | 20.03 |

* Transducer

| Decode | PER |
| --- | --- |
| greedy | 20.74 |
| beam 40 | 19.84 |

## Requirements
* Python 3.6
* MxNet 1.1.0
* numpy 1.14

## TODO
* beam serach accelaration
* Seq2Seq with attention