{"id":15601056,"url":"https://github.com/lucidrains/product-key-memory","last_synced_at":"2025-07-12T14:35:06.833Z","repository":{"id":62578105,"uuid":"270122105","full_name":"lucidrains/product-key-memory","owner":"lucidrains","description":"Standalone Product Key Memory module in Pytorch - for augmenting Transformer models","archived":false,"fork":false,"pushed_at":"2024-07-30T14:36:33.000Z","size":35788,"stargazers_count":80,"open_issues_count":0,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-06-25T08:39:50.720Z","etag":null,"topics":["artificial-intelligence","deep-learning","pytorch","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lucidrains.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-06T22:28:54.000Z","updated_at":"2025-06-06T06:25:46.000Z","dependencies_parsed_at":"2024-10-23T05:35:05.417Z","dependency_job_id":null,"html_url":"https://github.com/lucidrains/product-key-memory","commit_stats":{"total_commits":49,"total_committers":1,"mean_commits":49.0,"dds":0.0,"last_synced_commit":"e96826c8608a8f0fc64a6e5ea41d5ac1b406ab5c"},"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"purl":"pkg:github/lucidrains/product-key-memory","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fproduct-key-memory","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fproduct-key-memory/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fproduct-key-memory/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fproduct-key-memory/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lucidrains","download_url":"https://codeload.github.com/lucidrains/product-key-memory/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fproduct-key-memory/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265003936,"owners_count":23696326,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","deep-learning","pytorch","transformers"],"created_at":"2024-10-03T02:13:19.599Z","updated_at":"2025-07-12T14:35:06.799Z","avatar_url":"https://github.com/lucidrains.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg src=\"./pkm.png\" width=\"400px\"\u003e\u003c/img\u003e\n\n## Product Key Memory\n\n[![PyPI version](https://badge.fury.io/py/product-key-memory.svg)](https://badge.fury.io/py/product-key-memory)\n\nStandalone \u003ca href=\"https://arxiv.org/abs/1907.05242\"\u003eProduct Key Memory\u003c/a\u003e module for augmenting Transformer models\n\n## Install\n\n```bash\n$ pip install product-key-memory\n```\n\n## Usage\n\nReplace the feedforwards in a Transformer with the following\n\n```python\nimport torch\nfrom product_key_memory import PKM\n\npkm = PKM(\n    dim = 512,\n    heads = 4,\n    dim_head = 128,       # keep at 128 for best results\n    num_keys = 256,       # number of subkeys, # values will be num_keys ^ 2\n    topk = 32             # the top number of subkeys to select\n)\n\nx = torch.randn(1, 1024, 512)\nmask = torch.ones((1, 1024)).bool()\nvalues = pkm(x, input_mask = mask) # (1, 1024, 512)\n```\n\n## Learning Rates\n\nTo give different learning rates to the value parameters of the product-key-memory network, use the following helper function.\n\n```python\nfrom torch.optim import Adam\nfrom product_key_memory import fetch_pkm_value_parameters\n\n# this helper function, for your root model, finds all the PKM models and the embedding bag weight parameters\npkm_parameters, other_parameters = fetch_pkm_value_parameters(model)\n\noptim = Adam([\n    {'params': other_parameters},\n    {'params': pkm_parameters, 'lr': 1e-2}\n], lr=1e-3)\n```\n\nOr, if product-key-memory parameters are the only other parameters you have a different learning rate for\n\n```python\nfrom torch.optim import Adam\nfrom product_key_memory import fetch_optimizer_parameters\n\nparameters = fetch_optimizer_parameters(model) # automatically creates array of parameter settings with learning rate set at 1e-2 for pkm values\noptim = Adam(parameters, lr=1e-3)\n```\n\n## Appreciation\n\nSpecial thanks go to \u003ca href=\"https://github.com/AranKomat\"\u003eAran\u003c/a\u003e for encouraging me to look into this, and to \u003ca href=\"https://github.com/madisonmay\"\u003eMadison May\u003c/a\u003e for his \u003ca href=\"https://www.pragmatic.ml/large-memory-layers-with-product-keys/\"\u003eeducational blog post\u003c/a\u003e, which helped me understand this better.\n\n## Todo\n\n- [x] offer stochasticity with annealed gumbel noise. seen dramatic effects in vector-quantization setting\n- [x] offer a way for smaller value dimensions + concat and linear combination of heads (like multi-head attention)\n\n- [ ] get caught up on latest literature on product key memories, if any\n- [ ] instead of additive scores, try multiplicative using coordinate descent routing\n\n## Citations\n\n```bibtex\n@misc{lample2019large,\n    title   = {Large Memory Layers with Product Keys},\n    author  = {Guillaume Lample and Alexandre Sablayrolles and Marc'Aurelio Ranzato and Ludovic Denoyer and Hervé Jégou},\n    year    = {2019},\n    eprint  = {1907.05242},\n    archivePrefix = {arXiv}\n}\n```\n\n```bibtex\n@misc{liu2020evolving,\n    title   = {Evolving Normalization-Activation Layers},\n    author  = {Hanxiao Liu and Andrew Brock and Karen Simonyan and Quoc V. Le},\n    year    = {2020},\n    eprint  = {2004.02967},\n    archivePrefix = {arXiv}\n}\n```\n\n```bibtex\n@article{Shen2023ASO,\n    title   = {A Study on ReLU and Softmax in Transformer},\n    author  = {Kai Shen and Junliang Guo and Xuejiao Tan and Siliang Tang and Rui Wang and Jiang Bian},\n    journal = {ArXiv},\n    year    = {2023},\n    volume  = {abs/2302.06461},\n    url     = {https://api.semanticscholar.org/CorpusID:256827573}\n}\n```\n\n```bibtex\n@article{Csordas2023ApproximatingTF,\n    title   = {Approximating Two-Layer Feedforward Networks for Efficient Transformers},\n    author  = {R'obert Csord'as and Kazuki Irie and J{\\\"u}rgen Schmidhuber},\n    journal = {ArXiv},\n    year    = {2023},\n    volume  = {abs/2310.10837},\n    url     = {https://api.semanticscholar.org/CorpusID:264172384}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucidrains%2Fproduct-key-memory","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flucidrains%2Fproduct-key-memory","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucidrains%2Fproduct-key-memory/lists"}