{"id":22703603,"url":"https://github.com/myscience/x-lstm","last_synced_at":"2025-08-07T10:31:31.848Z","repository":{"id":238996201,"uuid":"797821032","full_name":"myscience/x-lstm","owner":"myscience","description":"Pytorch implementation of the xLSTM model by Beck et al. (2024)","archived":false,"fork":false,"pushed_at":"2024-08-12T10:06:01.000Z","size":124,"stargazers_count":95,"open_issues_count":2,"forks_count":10,"subscribers_count":9,"default_branch":"main","last_synced_at":"2024-08-12T11:29:00.334Z","etag":null,"topics":["lightning","llm","lstm","pytorch","pytorch-implementation","xlstm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/myscience.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-08T15:26:18.000Z","updated_at":"2024-08-12T10:06:04.000Z","dependencies_parsed_at":"2024-05-11T13:30:22.505Z","dependency_job_id":"71ca2170-9888-40fd-afa7-7098336d724b","html_url":"https://github.com/myscience/x-lstm","commit_stats":null,"previous_names":["myscience/x-lstm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/myscience%2Fx-lstm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/myscience%2Fx-lstm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/myscience%2Fx-lstm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/myscience%2Fx-lstm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/myscience","download_url":"https://codeload.github.com/myscience/x-lstm/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229026412,"owners_count":18008325,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["lightning","llm","lstm","pytorch","pytorch-implementation","xlstm"],"created_at":"2024-12-10T08:12:00.556Z","updated_at":"2024-12-10T08:12:02.199Z","avatar_url":"https://github.com/myscience.png","language":"Python","funding_links":[],"categories":["Implementations (Most of them are unofficial and under development)"],"sub_categories":[],"readme":"# xLSTM in Easy Pytorch\n\nThis repo contains the _unofficial_ implementation of `xLSTM` model as introduced in [Beck et al. (2024)](https://arxiv.org/abs/2405.04517). This repo is developed mainly for didactic purposes to spell out the details of a modern `Long-Short Term Memory` with competitive performances against modern `Transformers` or `State-Space` models (e.g. `Mamba`).\n\nJust for fun, this repo tries to implement a basic LLM (see `📂 xlstm.llm`) using [Lightning](https://lightning.ai/docs/pytorch/stable/) so that training on multi-gpu (should) be just one variable away.\n\n# Results\n\nJust for fun I set up to train a small `xLSTM` LLM model on the cute `TinyStories` dataset and logged its progress as it learned (I always find it amusing to read the incoherent first attempts and was actually surprised by how quickly it got the general structure). Here what I get for the highly original `Once upon a time` prompt:\n\n**At initialization**\n\n```text\nOnce upon a timeboro wit carryingabellaastered Greens intestinal Pil su128 configure Patentrowing SeventhNohs implies Burger ® Cities lowacommTYelligimilationbender Manual authored Comprehensivelow fightingrinasq intercourse377 gradientafe bluntlyaroo coats Witchhiba Jeff Flags ambassadors iT deleted Deals reassCruzka...(you get the idea)\n```\n\n**After 320 steps**\n\n```text\nOnce upon a time. She and took them. He is and they with something. She asked, a big dog on the park. Lily went to the park, ''That wanted it is not she is\nverv hanov into the around's mom man was a lot him to the \"Thank\nhe couldn't sad and. He is a time. \"What and not to go be careful. She was that the little girl, I will. Then it?''' Tom things. He took it they saw a bia.\"\n```\n\n![Validation Loss](res/loss.jpeg)\n\n**After 20K steps**\n\n```text\nOnce upon a time. Jack and ran across the hill. When she always a bit embarrassed and felt so much to play!\" And they couldn't know what you should always made of the park.\" One day she wanted to help make some new friends.\"\n\"The boy was so happy to a time.\n\"Lily's help. He was very sorry, there. Then, and it looked at how he saw the ball. When she was happy and had so excited to buy the ground. He used to fly was very happy and daddy was so excited and the car. Timmy went to go home.\"\n```\n\n# Usage\n\nThe `xlstm` module exposes both the `sLSTM` (scalar-LSTM) and the `mLSTM` (matrix-LSTM) modules. Both expect their input to have shape `(batch_size, d_input)` as they consume an input sequence sequentially. They output the model current (projected) hidden state `h_t` (which is considered the module output and has the same shape as the input, see Figure 9 in the Appendix of [Beck et al. (2024)](https://arxiv.org/abs/2405.04517)), plus their updated hidden variables (a tuple of tensors).\n\n```python\nfrom xlstm import sLSTM\nfrom itertools import pairwise\n\nseq_len = 32\nbatch_size = 4\n\ninp_dim = 16\nhead_dim = 8\nhead_num = 4\n\n# Create a mock up input sequence\nseq = torch.randn(seq_len, batch_size, inp_dim)\n\nlstm = sLSTM(\n    inp_dim,        # Input sequence dimension\n    head_dim,       # Dimension of each head\n    head_num,       # Number of heads\n    p_factor=4/3,   # Tunable expansion factor\n)\n\n# Initialize the hidden states\nhid = lstm.init_hidden(batch_size)\n\ncriterion = ... # Pick some loss function, i.e. MSE\n\n# Iterate through the sequence length\nloss = 0\nfor prev, succ in pairwise(seq):\n    # Get the model prediction plus the updated hidden states\n    pred, hid = lstm(prev, hid)\n\n    # Target is the next sequence token\n    loss += criterion(pred, succ)\n\n# Compute gradients\nloss.backward()\n```\n\nThis repo also provides an implementation of an `xLSTM` LLM (which is simply a stack of `sLSTM`s and `mLSTM` plus a prediction head) built using `Pytorch Lightning` which unlocks easy training on multi-gpus. To use it one can simply run the following example:\n\n```python\nfrom lightning import Trainer\nfrom transformers import AutoTokenizer\n\nfrom xlstm import xLSTM\nfrom xlstm.stories import TinyStoriesLightning\n\nconfig = ... # path to YAML configuration file\n\n# Load an off-the-shelf tokenizer from HF\ntokenizer = AutoTokenizer.from_pretrained('openai-community/gpt2')\n\n# Load the Mamba model from a config file\nmodel = xLSTM.from_config(config, key='llm')\n\n# Load the dataset\ndataset = TinyStoriesLightning.from_config(\n    config,\n    tokenizer,\n    key='dataset'\n)\n\ntrainer = Trainer(\n  max_epochs  = 500,\n  accelerator = 'gpu',\n  devices     = 4, # Piece of cake multi-gpu support!\n  strategy    = 'ddp_find_unused_parameters_false',\n)\n\n# Train the model\ntrainer.fit(model, dataset)\n```\n\nAlternatively, one can also run the training script `run.py` directly which leverages the `LightningCLI` API which offers great flexibility for customization. The script expects a configuration file path (see example configuration file in `📂 config/llm.yaml`) and accepts all the Trainer arguments (and more! See [LightningCLI](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.cli.LightningCLI.html#lightning.pytorch.cli.LightningCLI) for reference).\n\n```bash\npython run.py fit --config config/llm.yaml\n```\n\nA cool feature of `xLSTM` current implementation is the lazy (batched-) inference implemented via a generator. One can thus print tokens on screen as they are streamed by the model, no need to wait for the whole inference to finish! A mock-up script would look like the following.\n\n```python\nfrom xlstm import xLSTM\nfrom transformers import AutoTokenizer\n\n# Get an off-the-shelf tokenizer\ntokenizer = AutoTokenizer.from_pretrained('openai-community/gpt2')\n\ntokenizer.add_special_tokens({'pad_token': '\u003c|pad|\u003e'})\n\n# Parameters for the LLM\nvocab_size = tokenizer.vocab_size + 1\nnum_layers = 8\nsignature = (7, 1)\ninp_dim = 16\nhead_dim = 8\nhead_num = 4\nker_size = 4\np_factor = (2, 4/3)\n\nmodel = xLSTM(\n    vocab_size = vocab_size,\n    num_layers = self.num_layers,\n    signature = self.signature,\n    inp_dim= self.inp_dim,\n    head_dim= self.head_dim,\n    head_num= self.head_num,\n    p_factor= self.p_factor,\n    ker_size = self.ker_size,\n)\n\n# Parameters for the inference\ntoken_lim = 16\nuse_top_k = 50\ntemperature = 0.7\n\n# Generate text\nstream = model.generate(\n  # We can provide more than one prompt!\n  prompt=[\n      'Once upon a time',\n      'In a galaxy far far away',\n  ],\n  tokenizer=tokenizer,\n  token_lim=token_lim,\n  use_top_k=use_top_k,\n  temperature=temperature,\n)\n\nfor token in stream:\n    # Each token is a dictionary indexed by the\n    # batch-id and contains the produced string\n    # as value, so we can print the first batch as:\n    print(token[0], end='')\n```\n\n# Roadmap\n\n- [x] Put all the essential pieces together (i.e. `sLSTM` \u0026 `mLSTM`)\n- [x] Add implementation for a full `xLSTM`\n- [x] Add functioning training script (Lightning)\n- [x] Show some results\n\n# Requirements\n\nCode was tested with Python 3.11+. To install the required dependencies simply run `pip install -r requirements.txt`.\n\n```\ntorch==2.3.0\nPyYAML==6.0.1\neinops==0.8.0\nlightning==2.2.4\nsetuptools==69.5.1\ntransformers==4.40.2\n```\n\n# Citations\n\n```bibtex\n@article{beck2024xlstm,\n  title={xLSTM: Extended Long Short-Term Memory},\n  author={Beck, Maximilian and P{\\\"o}ppel, Korbinian and Spanring, Markus and Auer, Andreas and Prudnikova, Oleksandra and Kopp, Michael and Klambauer, G{\\\"u}nter and Brandstetter, Johannes and Hochreiter, Sepp},\n  journal={arXiv preprint arXiv:2405.04517},\n  year={2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmyscience%2Fx-lstm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmyscience%2Fx-lstm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmyscience%2Fx-lstm/lists"}