Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jeremiedb/translatR
Lightweight tools for translation tasks with mxnet
https://github.com/jeremiedb/translatR
Last synced: 2 months ago
JSON representation
Lightweight tools for translation tasks with mxnet
- Host: GitHub
- URL: https://github.com/jeremiedb/translatR
- Owner: jeremiedb
- License: apache-2.0
- Created: 2018-02-25T01:03:36.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2019-03-10T19:46:33.000Z (almost 6 years ago)
- Last Synced: 2024-08-01T22:41:46.637Z (5 months ago)
- Language: R
- Size: 55.1 MB
- Stars: 3
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
- Awesome-MXNet - translatR
README
---
title: "translatR"
output: github_document
---Lightweight tools for translation tasks with MXNet R.
Inspiration taken from the [AWS Sockeye](https://github.com/awslabs/sockeye) project.
## Getting started
Prepare data using WMT datasets using `Preproc_wmt15_train.Rmd`. Creates a source and target matrix of word indices along with the associated dictionary. Data preparation mainly relies on `data.table` package.
A RNN encoder-decoder training demo is provided in `NMT_rnn_rnn.Rmd` and and CNN-RNN architecture is shown in `NMT_cnn_rnn.Rmd`.
## Performance
Performance during training is tracked using the perplexity metric. Once a training is completed, the above scripts show how to perform a batch inference on a test data, more specifically the WMT official one. [sacreBLEU](https://github.com/mjpost/sacreBLEU) is then used to compute the BLEU score, providing a clear comparison point to the metric typically found in publications.
For example:
`cat data/trans_test_wmt_en_fr_rnn_72_Capital.txt | sacrebleu -t wmt14 -l en-fr`
A BLEU score of 28.2 was obtained with the CNN-RNN architecture.
## Features:
#### Encoders:
- Bidirectional RNN encoders (LSTM and GRU).
- Convolutional encoder with residual gates.#### Decoders:
- RNN (LSTM and GRU)
#### Attention:
Architectures are fully attention based. All information is passed from the encoder to the decoder through the encoded sequences weighted with an attention mechanism (RNN hidden layers are not carry forward from the encoder to the decoder).
Attention modules are defined in `attention.R`. Three attention approaches are supported, all described in Luong et al, 2015:
- Bilinear (`attn_bilinear`)
- Dot (`attn_dot`)
- MLP (`attn_MLP`)The decoder network takes an attention module as a parameter along with an encoder graph. All attentions modules are implemented using the query-key-value approach.
# To do:
- Test multi-GPU support
- More efficient data preprocessing to handle larger scale tasks (currently works fine up to around 4M parallel sequences).
- Support for bucketing
- Positional embedding
- Transformer encoder-decoder
- Encoder and decoder self-attention
- Beam searchTutorial to be added to [Examples of application of RNN](https://jeremiedb.github.io/mxnet_R_bucketing/index.html).