https://github.com/howardyclo/pytorch-seq2seq-example
Fully batched seq2seq example based on practical-pytorch, and more extra features.
https://github.com/howardyclo/pytorch-seq2seq-example
attention-seq2seq checkpoint fixed-embedding glove-embeddings jupyter-notebook nlp nlp-machine-learning pretrained-embedding pytorch pytorch-nlp-tutorial pytorch-tutorial seq2seq shared-embedding tensorboard tensorboard-visualization tie-embedding
Last synced: 9 months ago
JSON representation
Fully batched seq2seq example based on practical-pytorch, and more extra features.
- Host: GitHub
- URL: https://github.com/howardyclo/pytorch-seq2seq-example
- Owner: howardyclo
- Created: 2018-02-04T11:58:23.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-03-11T10:31:02.000Z (almost 8 years ago)
- Last Synced: 2025-04-18T19:41:01.860Z (10 months ago)
- Topics: attention-seq2seq, checkpoint, fixed-embedding, glove-embeddings, jupyter-notebook, nlp, nlp-machine-learning, pretrained-embedding, pytorch, pytorch-nlp-tutorial, pytorch-tutorial, seq2seq, shared-embedding, tensorboard, tensorboard-visualization, tie-embedding
- Language: Jupyter Notebook
- Homepage:
- Size: 151 KB
- Stars: 76
- Watchers: 3
- Forks: 17
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Batched Seq2Seq Example
Based on the [`seq2seq-translation-batched.ipynb`](https://github.com/spro/practical-pytorch/blob/master/seq2seq-translation/seq2seq-translation-batched.ipynb) from *practical-pytorch*, but more extra features.
This example runs grammatical error correction task where the source sequence is a grammatically erroneuous English sentence and the target sequence is an grammatically correct English sentence. The corpus and evaluation script can be download at: https://github.com/keisks/jfleg.
### Extra features
- Cleaner codebase
- Very detailed comments for learners
- Implement Pytorch native dataset and dataloader for batching
- Correctly handle the hidden state from bidirectional encoder and past to the decoder as initial hidden state.
- Fully batched attention mechanism computation (only implement `general attention` but it's sufficient). Note: The original code still uses for-loop to compute, which is very slow.
- Support LSTM instead of only GRU
- Shared embeddings (encoder's input embedding and decoder's input embedding)
- Pretrained Glove embedding
- Fixed embedding
- Tie embeddings (decoder's input embedding and decoder's output embedding)
- Tensorboard visualization
- Load and save checkpoint
- Replace unknown words by selecting the source token with the highest attention score. (Translation)
### Cons
Comparing to the state-of-the-art seq2seq library, OpenNMT-py, there are some stuffs that aren't optimized in this codebase:
- Use CuDNN when possible (always on encoder, on decoder when `input_feed`=0)
- Always avoid indexing / loops and use torch primitives.
- When possible, batch softmax operations across time. (this is the second complicated part of the code)
- Batch inference and beam search for translation (this is the most complicated part of the code)
### How to speed up RNN training?
Several ways to speed up RNN training:
- Batching
- Static padding
- Dynamic padding
- Bucketing
- Truncated BPTT
See ["Sequence Models and the RNN API (TensorFlow Dev Summit 2017)"](https://www.youtube.com/watch?v=RIR_-Xlbp7s&t=490s) for understanding those techniques.
You can use [torchtext](http://torchtext.readthedocs.io/en/latest/index.html) or OpenNMT's data iterator for speeding up the training. It can be 7x faster! (ex: 7 hours for an epoch -> 1 hour!)
### Acknowledgement
Thanks to the author of OpenNMT-py @srush for answering the questions for me! See https://github.com/OpenNMT/OpenNMT-py/issues/552