https://github.com/soskek/efficient_softmax
BlackOut and Adaptive Softmax for language models by Chainer
https://github.com/soskek/efficient_softmax
adaptive-softmax blackout chainer rnn-language-model rnnlm softmax
Last synced: about 1 year ago
JSON representation
BlackOut and Adaptive Softmax for language models by Chainer
- Host: GitHub
- URL: https://github.com/soskek/efficient_softmax
- Owner: soskek
- Created: 2017-10-15T07:03:17.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2017-10-20T05:00:55.000Z (over 8 years ago)
- Last Synced: 2025-03-28T20:21:18.270Z (about 1 year ago)
- Topics: adaptive-softmax, blackout, chainer, rnn-language-model, rnnlm, softmax
- Language: Python
- Homepage:
- Size: 43.9 KB
- Stars: 11
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Efficient Softmax Approximation
Implementations of Blackout and Adaptive Softmax for efficiently calculating word distribution for language modeling of very large vocabularies.
LSTM language models are derived from [rnnlm_chainer](https://github.com/soskek/rnnlm_chainer).
Available output layers are as follows
- Linear + softmax with cross entropy loss. A usual output layer.
- `--share-embedding`: A variant using the word embedding matrix shared with the input layer for the output layer.
- `--adaptive-softmax`: [Adaptive softmax](http://proceedings.mlr.press/v70/grave17a/grave17a.pdf)
- `--blackout`: [BlackOut](https://arxiv.org/pdf/1511.06909.pdf) (BlackOut is not faster on GPU.)
### Adaptive Softmax
- Efficient softmax approximation for GPUs
- Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, Hervé Jégou, ICML 2017
- [paper](http://proceedings.mlr.press/v70/grave17a/grave17a.pdf)
- [authors' Lua code](https://github.com/facebookresearch/adaptive-softmax)
### BlackOut
- BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies
- Shihao Ji, S. V. N. Vishwanathan, Nadathur Satish, Michael J. Anderson, Pradeep Dubey, ICLR 2016
- [paper](https://arxiv.org/pdf/1511.06909.pdf)
- [authors' C++ code](https://github.com/IntelLabs/rnnlm)
# How to Run
```
python -u train.py -g 0
```
## Datasets
- PennTreeBank
- Wikitext-2
- Wikitext-103
For wikitext, run `prepare_wikitext.sh` for downloading the datasets.