https://github.com/HawkAaron/RNN-Transducer

MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks
https://github.com/HawkAaron/RNN-Transducer

asr end-to-end mxnet rnn-transducer rnnt-joint rnnt-model sequence-transduction speech-recognition timit transducers

Last synced: 4 months ago
JSON representation

MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks

Host: GitHub
URL: https://github.com/HawkAaron/RNN-Transducer
Owner: HawkAaron
Created: 2018-04-10T08:20:19.000Z (about 8 years ago)
Default Branch: graves2013
Last Pushed: 2021-06-07T15:39:34.000Z (about 5 years ago)
Last Synced: 2026-02-01T21:27:22.614Z (5 months ago)
Topics: asr, end-to-end, mxnet, rnn-transducer, rnnt-joint, rnnt-model, sequence-transduction, speech-recognition, timit, transducers
Language: Python
Homepage:
Size: 48.8 KB
Stars: 139
Watchers: 6
Forks: 31
Open Issues: 10
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # End-to-End Speech Recognition using RNN-Transducer

## File description

* eval.py: rnnt joint model decode

* model.py: rnnt model, which contains acoustic / phoneme model

* model2012.py: rnnt model refer to Graves2012

* seq2seq/*: seq2seq with attention 

* rnnt_np.py: rnnt loss function implementation on mxnet, support for both symbol and gluon [refer to PyTorch implementation](https://github.com/awni/transducer)

* DataLoader.py: data process

* train.py: rnnt training script, can be initialized from CTC and PM model

* train_ctc.py: ctc training script

* train_att.py: attention training script

## Directory description

* conf: kaldi feature extraction config

## Reference Paper

* RNN Transducer (Graves 2012): [Sequence Transduction with Recurrent Neural Networks](https://arxiv.org/abs/1211.3711)

* RNNT joint (Graves 2013): [Speech Recognition with Deep Recurrent Neural Networks](https://arxiv.org/abs/1303.5778 )

* E2E criterion comparison (Baidu 2017): [Exploring Neural Transducers for End-to-End Speech Recognition](https://arxiv.org/abs/1707.07413)

* Seq2Seq-Attention: [Attention-Based Models for Speech Recognition](https://arxiv.org/abs/1506.07503)

## Run

* Compile RNNT Loss

Follow the instructions in [here](https://github.com/HawkAaron/mxnet-transducer/tree/master) to compile MXNET with RNNT loss.

* Extract feature

link kaldi timit example dirs (`local` `steps` `utils` )

excute `run.sh` to extract 40 dim fbank feature

run `feature_transform.sh` to get 123 dim feature as described in Graves2013

* Train RNNT model:

```bash

python train.py --lr 1e-3 --bi --dropout .5 --out exp/rnnt_bi_lr1e-3 --schedule

```

## Evaluation

Default only for RNNT

* Greedy decoding:

```

python eval.py  --bi

```

* Beam search:

```

python eval.py  --bi --beam 

```

## Results

* CTC 

    | Decode | PER |

    | --- | --- |

    | greedy | 20.36 |

    | beam 100 | 20.03 |

* Transducer

    | Decode | PER |

    | --- | --- |

    | greedy | 20.74 |

    | beam 40 | 19.84 |

## Requirements

* Python 3.6

* MxNet 1.1.0

* numpy 1.14

## TODO

* beam serach accelaration

* Seq2Seq with attention

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/HawkAaron/RNN-Transducer

Awesome Lists containing this project

README