https://github.com/HawkAaron/RNN-Transducer
MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks
https://github.com/HawkAaron/RNN-Transducer
asr end-to-end mxnet rnn-transducer rnnt-joint rnnt-model sequence-transduction speech-recognition timit transducers
Last synced: 4 months ago
JSON representation
MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks
- Host: GitHub
- URL: https://github.com/HawkAaron/RNN-Transducer
- Owner: HawkAaron
- Created: 2018-04-10T08:20:19.000Z (about 8 years ago)
- Default Branch: graves2013
- Last Pushed: 2021-06-07T15:39:34.000Z (about 5 years ago)
- Last Synced: 2026-02-01T21:27:22.614Z (5 months ago)
- Topics: asr, end-to-end, mxnet, rnn-transducer, rnnt-joint, rnnt-model, sequence-transduction, speech-recognition, timit, transducers
- Language: Python
- Homepage:
- Size: 48.8 KB
- Stars: 139
- Watchers: 6
- Forks: 31
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# End-to-End Speech Recognition using RNN-Transducer
## File description
* eval.py: rnnt joint model decode
* model.py: rnnt model, which contains acoustic / phoneme model
* model2012.py: rnnt model refer to Graves2012
* seq2seq/*: seq2seq with attention
* rnnt_np.py: rnnt loss function implementation on mxnet, support for both symbol and gluon [refer to PyTorch implementation](https://github.com/awni/transducer)
* DataLoader.py: data process
* train.py: rnnt training script, can be initialized from CTC and PM model
* train_ctc.py: ctc training script
* train_att.py: attention training script
## Directory description
* conf: kaldi feature extraction config
## Reference Paper
* RNN Transducer (Graves 2012): [Sequence Transduction with Recurrent Neural Networks](https://arxiv.org/abs/1211.3711)
* RNNT joint (Graves 2013): [Speech Recognition with Deep Recurrent Neural Networks](https://arxiv.org/abs/1303.5778 )
* E2E criterion comparison (Baidu 2017): [Exploring Neural Transducers for End-to-End Speech Recognition](https://arxiv.org/abs/1707.07413)
* Seq2Seq-Attention: [Attention-Based Models for Speech Recognition](https://arxiv.org/abs/1506.07503)
## Run
* Compile RNNT Loss
Follow the instructions in [here](https://github.com/HawkAaron/mxnet-transducer/tree/master) to compile MXNET with RNNT loss.
* Extract feature
link kaldi timit example dirs (`local` `steps` `utils` )
excute `run.sh` to extract 40 dim fbank feature
run `feature_transform.sh` to get 123 dim feature as described in Graves2013
* Train RNNT model:
```bash
python train.py --lr 1e-3 --bi --dropout .5 --out exp/rnnt_bi_lr1e-3 --schedule
```
## Evaluation
Default only for RNNT
* Greedy decoding:
```
python eval.py --bi
```
* Beam search:
```
python eval.py --bi --beam
```
## Results
* CTC
| Decode | PER |
| --- | --- |
| greedy | 20.36 |
| beam 100 | 20.03 |
* Transducer
| Decode | PER |
| --- | --- |
| greedy | 20.74 |
| beam 40 | 19.84 |
## Requirements
* Python 3.6
* MxNet 1.1.0
* numpy 1.14
## TODO
* beam serach accelaration
* Seq2Seq with attention