Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lucidrains/product-key-memory
Standalone Product Key Memory module in Pytorch - for augmenting Transformer models
https://github.com/lucidrains/product-key-memory
artificial-intelligence deep-learning pytorch transformers
Last synced: 4 days ago
JSON representation
Standalone Product Key Memory module in Pytorch - for augmenting Transformer models
- Host: GitHub
- URL: https://github.com/lucidrains/product-key-memory
- Owner: lucidrains
- License: mit
- Created: 2020-06-06T22:28:54.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-07-30T14:36:33.000Z (4 months ago)
- Last Synced: 2024-11-06T02:23:52.585Z (11 days ago)
- Topics: artificial-intelligence, deep-learning, pytorch, transformers
- Language: Python
- Homepage:
- Size: 34.1 MB
- Stars: 72
- Watchers: 3
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Product Key Memory
[![PyPI version](https://badge.fury.io/py/product-key-memory.svg)](https://badge.fury.io/py/product-key-memory)
Standalone Product Key Memory module for augmenting Transformer models
## Install
```bash
$ pip install product-key-memory
```## Usage
Replace the feedforwards in a Transformer with the following
```python
import torch
from product_key_memory import PKMpkm = PKM(
dim = 512,
heads = 4,
dim_head = 128, # keep at 128 for best results
num_keys = 256, # number of subkeys, # values will be num_keys ^ 2
topk = 32 # the top number of subkeys to select
)x = torch.randn(1, 1024, 512)
mask = torch.ones((1, 1024)).bool()
values = pkm(x, input_mask = mask) # (1, 1024, 512)
```## Learning Rates
To give different learning rates to the value parameters of the product-key-memory network, use the following helper function.
```python
from torch.optim import Adam
from product_key_memory import fetch_pkm_value_parameters# this helper function, for your root model, finds all the PKM models and the embedding bag weight parameters
pkm_parameters, other_parameters = fetch_pkm_value_parameters(model)optim = Adam([
{'params': other_parameters},
{'params': pkm_parameters, 'lr': 1e-2}
], lr=1e-3)
```Or, if product-key-memory parameters are the only other parameters you have a different learning rate for
```python
from torch.optim import Adam
from product_key_memory import fetch_optimizer_parametersparameters = fetch_optimizer_parameters(model) # automatically creates array of parameter settings with learning rate set at 1e-2 for pkm values
optim = Adam(parameters, lr=1e-3)
```## Appreciation
Special thanks go to Aran for encouraging me to look into this, and to Madison May for his educational blog post, which helped me understand this better.
## Todo
- [x] offer stochasticity with annealed gumbel noise. seen dramatic effects in vector-quantization setting
- [x] offer a way for smaller value dimensions + concat and linear combination of heads (like multi-head attention)- [ ] get caught up on latest literature on product key memories, if any
- [ ] instead of additive scores, try multiplicative using coordinate descent routing## Citations
```bibtex
@misc{lample2019large,
title = {Large Memory Layers with Product Keys},
author = {Guillaume Lample and Alexandre Sablayrolles and Marc'Aurelio Ranzato and Ludovic Denoyer and Hervé Jégou},
year = {2019},
eprint = {1907.05242},
archivePrefix = {arXiv}
}
``````bibtex
@misc{liu2020evolving,
title = {Evolving Normalization-Activation Layers},
author = {Hanxiao Liu and Andrew Brock and Karen Simonyan and Quoc V. Le},
year = {2020},
eprint = {2004.02967},
archivePrefix = {arXiv}
}
``````bibtex
@article{Shen2023ASO,
title = {A Study on ReLU and Softmax in Transformer},
author = {Kai Shen and Junliang Guo and Xuejiao Tan and Siliang Tang and Rui Wang and Jiang Bian},
journal = {ArXiv},
year = {2023},
volume = {abs/2302.06461},
url = {https://api.semanticscholar.org/CorpusID:256827573}
}
``````bibtex
@article{Csordas2023ApproximatingTF,
title = {Approximating Two-Layer Feedforward Networks for Efficient Transformers},
author = {R'obert Csord'as and Kazuki Irie and J{\"u}rgen Schmidhuber},
journal = {ArXiv},
year = {2023},
volume = {abs/2310.10837},
url = {https://api.semanticscholar.org/CorpusID:264172384}
}
```