{"id":19647063,"url":"https://github.com/idsia/lmtool-fwp","last_synced_at":"2025-04-28T15:31:14.927Z","repository":{"id":39338558,"uuid":"341226593","full_name":"IDSIA/lmtool-fwp","owner":"IDSIA","description":"PyTorch Language Modeling Toolkit for Fast Weight Programmers","archived":false,"fork":false,"pushed_at":"2023-08-24T14:12:12.000Z","size":188,"stargazers_count":17,"open_issues_count":0,"forks_count":4,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-05T09:23:13.762Z","etag":null,"topics":["fast-weight-programmers","fast-weights","language-model","pytorch","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IDSIA.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-02-22T14:25:19.000Z","updated_at":"2025-03-18T14:53:30.000Z","dependencies_parsed_at":"2024-11-11T14:53:03.383Z","dependency_job_id":null,"html_url":"https://github.com/IDSIA/lmtool-fwp","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDSIA%2Flmtool-fwp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDSIA%2Flmtool-fwp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDSIA%2Flmtool-fwp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDSIA%2Flmtool-fwp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IDSIA","download_url":"https://codeload.github.com/IDSIA/lmtool-fwp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251338612,"owners_count":21573584,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fast-weight-programmers","fast-weights","language-model","pytorch","transformers"],"created_at":"2024-11-11T14:42:14.107Z","updated_at":"2025-04-28T15:31:14.084Z","avatar_url":"https://github.com/IDSIA.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## PyTorch Language Modeling Toolkit (for Fast Weight Programmers)\n\nThis repository contains the official code used for language modeling experiments in the paper(s):\n* [Linear Transformers are Secretly Fast Weight Programmers (ICML 2021)](https://arxiv.org/abs/2102.11174)\n* [Going Beyond Linear Transformers with Recurrent Fast Weight Programmers](https://arxiv.org/abs/2106.06295)\n* ...\n\nMore generally, this can be used as a language modeling toolkit in PyTorch to experiment with:\n* [Standard Transformers](https://arxiv.org/abs/1808.04444)\n* [Transformer-XL](https://arxiv.org/abs/1901.02860)\n* **Fast Weight Programmers** with different **update rules** and **linear attention functions**:\n    * Update rules: \"sum\" and our \"delta\" rule (as proposed in our paper; Sec 4.2)\n    * Linear attention functions: \"ELU-based\" linear attention, \"FAVOR+\", \"deterministic parameter-free projection (DPFP)\"\n    \n    e.g. some combinations result in well known models:\n    * [Linear Transformers](https://arxiv.org/abs/2006.16236) = \"sum\" update rule + \"ELU-based\" linear attention\n    * [Performers](https://arxiv.org/abs/2009.14794) = \"sum\" update rule + \"FAVOR+\"\n\n## Fast Weight Implementations\nThis repositiory contains two implementations of fast weights.\n* Custom cuda kernel (see [utils/fast_fast_weight](https://github.com/IDSIA/lmtool-fwms/tree/master/src/utils/fast_fast_weight) and [utils/cuda_fast_weight_layer.py](https://github.com/IDSIA/lmtool-fwms/blob/master/src/utils/cuda_fast_weight_layer.py))\n* Custom `torch.autograd.Function` (see [utils/fast_weight.py](https://github.com/IDSIA/lmtool-fwms/blob/master/src/utils/fast_weight.py))\n\nWhile we only used the cuda implementation for all our final experiments (faster/much better GPU utilization),\n`torch.autograd.Function` version can be useful for a quick prototyping with new extensions.\n\n## Requirements\nThis toolkit requires PyTorch `torch` and Ninja `ninja` (to compile the cuda kernels).\n\nThe experiments for the paper were conducted with Python 3.6 and PyTorch 1.4.0 (note on Aug 24, 2023: the code also works with Python 3.11 and PyTorch 2.0.1+cu117).\n\nMore recent versions of PyTorch are not yet well supported by this toolkit which still uses `torch.nn.DataParallel` for multi-GPU training.\nIf you really need to use a more recent version of PyTorch, check the [documentation](https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html)\nto use `torch.nn.parallel.DistributedDataParallel` instead. We will hopefully fix this soon, but we cannot tell exactly when.\n\nThe toolkit supports [Weights \u0026 Biases](https://docs.wandb.ai/) for monitoring jobs. If you use it, also install `wandb`.\n\n## Acknowledgements\nThis reposity contains many lines of code taken and adapted from the following sources:\n* This reposity was originally forked from the official implementation of Transformer-XL [kimiyoung/transformer-xl](https://github.com/kimiyoung/transformer-xl).\nThe code for Transformer-XL and standard Transformer models, as well as basic functionality needed for language modeling\n(including adaptive input and output embeddings) and data preparation (WikiText-103, enwik8, ...) is from the corresponding repository.\n* For Performers, helper functions from [lucidrains/performer-pytorch](https://github.com/lucidrains/performer-pytorch) are used.\n* For cuda implementations of our fast weight programmers with the delta rule:\n    * Code from [idiap/fast-transformers](https://github.com/idiap/fast-transformers/tree/master/fast_transformers/causal_product) is used with minor changes for the sum update rule.\n    * We modified it to implement our update rule.\nSee comments in code for exact locations and modifications.\n\n## General Instructions\n\nPlease check files under `example_scripts` for general instructions and examples to train and evaluate models. \n\n\n## BibTex\n\n```\n@inproceedings{schlag2021linear,\n      title={Linear Transformers Are Secretly Fast Weight Programmers}, \n      author={Imanol Schlag and Kazuki Irie and J\\\"urgen Schmidhuber},\n      booktitle={Proc. Int. Conf. on Machine Learning (ICML)},\n      address = {Virtual only},\n      month = jul,\n      year={2021}\n}\n```\n\n```\n@article{irie2021going,\n      title={Going Beyond Linear Transformers with Recurrent Fast Weight Programmers}, \n      author={Kazuki Irie and Imanol Schlag and R\\'obert Csord\\'as and J\\\"urgen Schmidhuber},\n      journal={Preprint arXiv:2106.06295},\n      year={2021}\n}\n```\n\n## Links\n* The code for synthetic retrieval experiments in the paper [\"Linear Transformers are Secretly Fast Weight Programmers\" (ICML 2021)](https://arxiv.org/abs/2102.11174) can be found at [ischlag/fast-weight-transformers](https://github.com/ischlag/fast-weight-transformers).\n* The full repository for the paper \"Going Beyond Linear Transformers with Recurrent Fast Weight Programmers\" can be found at: [IDSIA/recurrent-fwp](https://github.com/IDSIA/recurrent-fwp)\n* [Jürgen Schmidhuber's AI blog post on Fast Weight Programmers (March 26, 2021)](https://people.idsia.ch/~juergen/fast-weight-programmer-1991-transformer.html).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidsia%2Flmtool-fwp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fidsia%2Flmtool-fwp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidsia%2Flmtool-fwp/lists"}