{"id":19647073,"url":"https://github.com/idsia/fwp-formal-lang","last_synced_at":"2025-04-30T15:10:50.875Z","repository":{"id":203194125,"uuid":"707388600","full_name":"IDSIA/fwp-formal-lang","owner":"IDSIA","description":"Official repository for the paper \"Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions\" (EMNLP 2023)","archived":false,"fork":false,"pushed_at":"2023-10-24T14:32:12.000Z","size":46,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-11-11T14:47:59.618Z","etag":null,"topics":["fast-weight-programmers","fast-weights","formal-languages","linear-transformers","pytorch","self-referential-learning","self-referential-weight-matrix"],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IDSIA.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-19T19:45:47.000Z","updated_at":"2024-07-09T14:01:23.000Z","dependencies_parsed_at":null,"dependency_job_id":"4bbb50b3-5c52-4e31-a9c5-688004030610","html_url":"https://github.com/IDSIA/fwp-formal-lang","commit_stats":null,"previous_names":["idsia/fwp-formal-lang"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDSIA%2Ffwp-formal-lang","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDSIA%2Ffwp-formal-lang/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDSIA%2Ffwp-formal-lang/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDSIA%2Ffwp-formal-lang/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IDSIA","download_url":"https://codeload.github.com/IDSIA/fwp-formal-lang/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":233265472,"owners_count":18650037,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fast-weight-programmers","fast-weights","formal-languages","linear-transformers","pytorch","self-referential-learning","self-referential-weight-matrix"],"created_at":"2024-11-11T14:42:16.206Z","updated_at":"2025-01-09T21:54:06.198Z","avatar_url":"https://github.com/IDSIA.png","language":"Cuda","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Formal Language Recognition with Recurrent and Self-Referential Linear Transformers\n\nThis is the official code repository for the paper:\n\n[Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions (EMNLP 2023, short paper)]() (Link coming soon)\n\nThis repository was originally forked from [IDSIA/recurrent-fwp\n/algorithmic](https://github.com/IDSIA/recurrent-fwp/tree/master/algorithmic), and contains code from [IDSIA/modern-srwm](https://github.com/IDSIA/modern-srwm) (SRWM implementation) and [IDSIA/rtrl-elstm](https://github.com/IDSIA/rtrl-elstm) (e-LSTM implementation).\n\n[`fast_transformers`](https://github.com/IDSIA/fwp-formal-lang/tree/main/fast_transformers) directory contains code taken and modified from [idiap/fast-transformers](https://github.com/idiap/fast-transformers/tree/master/fast_transformers/causal_product) (a specific license file is included).\n\n## Requirements\n* PyTorch. We used PyTorch `2.0.1+cu117` with Python 3.9 or 3.11\n* Ninja to compile custom CUDA kernels (`pip install ninja`)\n* Optionally: wandb for monitoring jobs (you can enable it by adding the `--use_wandb` flag) (We did not use/need this option for our experiments)\n\n## Data\nWe obtained the datasets from [satwik77/Transformer-Formal-Languages](https://github.com/satwik77/Transformer-Formal-Languages) (for all datasets except `reset Dyck-1` we downloaded their pre-generated datasets; for `reset Dyck-1` we generated a dataset using their code).\n\nThe exact set of datasets we used can also be downloaded from this [Google Drive link](https://drive.google.com/file/d/1eyNGFJpw4lJEbq5HAs4SseaVAg151apR/view?usp=sharing).\n\n## Training\nA generic script to train a model is provided below.\n\n- `TORCH_EXTENSIONS_DIR` is where the compiled custom CUDA kernel will be stored (this way, you will not need to re-compile them every time).\n- the script below assumes the dataset to be available under `data/${task}`. The expected files are: `train_50.src`,  `train_50.tgt`, `valid_50.src`,  `valid_50.tgt`, `test_50.src`, and `test_50.tgt` (`*.src`/`*.tgt` files containing the input/output sequences; \"test\" corresponds to \"valid2\"; see examples in our files provided above, and see `main.py` if further details are needed).\n\n`modeltype` specifies the model type by its ID. \nThe IDs of the models used in the paper are as follows:\n* `0`: LSTM\n* `1`: (regular) Transformer\n* `2`: DeltaNet\n* `7`: Recurrent Delta\n* `8`: Linear Transformer\n* `10`: e-LSTM (called \"Quasi-LSTM\" in the code)\n* `12`: SRWM\n\nThe corresponding CUDA implementations can be found in the following directories:\n* `fast_transformers`: vanilla Linear Transformer\n* `fast_weight`: DeltaNet\n* `rec_update_fwm_tanh`: Recurrent Delta\n* `self_ref_v0`: SRWM\n\nWhen using our data files, we used `task` to specify the language to learn, which has to be consistent with the directory name:\n* `parity`\n* `aa-star`\n* `abab-star`\n* `counter-2`\n* `counter-3`\n* `shuffle-2`\n* `dyck-1`\n* `reset_dyck`\n\nOtherwise, the task is implicitly specified through specification of the data directory `--data_dir` (see below)\n\n`--level` flag can be ignored/removed. We initially thought about having various \"levels\" for each tasks, but in the end, we did not implement such an option, and we always have `level=50`.\nNevertheless, we should not set it to other values than 50, as the dataset file names have to contain the `level` value (see file names above).\n\nOther flags/parameters should be self-explanatory (see hyper-parameter tables in the appendix of the paper).\n\n```\n#!/bin/bash\n\nexport CUDA_VISIBLE_DEVICES=0\n\nexport TORCH_EXTENSIONS_DIR=\"arbitrary_path/torch_extensions/formal_lang\"\n\ntask=\nmodeltype=\nlay=\nsize=\nhead=\nff=\nbatch=\nlr=\n\nDATA_DIR='data/'${task}\n\npython main.py \\\n  --data_dir ${DATA_DIR} \\\n  --level 50 \\\n  --model_type ${modeltype} \\\n  --num_layer ${lay} \\\n  --hidden_size ${size} \\\n  --n_head ${head} \\\n  --ff_factor ${ff} \\\n  --dropout 0.0 \\\n  --batch_size ${batch} \\\n  --learning_rate ${lr} \\\n  --seed 1 \\\n  --grad_cummulate 1 \\\n  --num_epoch 300 \\\n  --report_every 50 \\\n  --project_name \"2023--formal-lang\" \\\n  --remove_pos_enc\n```\n\n**NB:**\n* In training logs, `valid` corresponds to the validation set with the same distribution as the training set (\"Bin0\" in the paper)\nwhile `valid2` is the validation set with longer sequences (\"Bin1\" in the paper).\n* A training run will stop when either 100% accuracy is achieved on the valid2 dataset or the maximum number of training epoch `num_epoch` is reached.\n* In logs, `no-op acc` should be ignored (this used to be originally relevant for other tasks used in [IDSIA/recurrent-fwp\n/algorithmic](https://github.com/IDSIA/recurrent-fwp/tree/master/algorithmic))\n* Our final results are reported by consistently testing a single seed (`seed=1`) for each configuration described in the appendix.\nHowever, it is not impossible that certain good configurations still \"fail\" for some seeds; if some of the good configurations we report happen to fail in your setting (i.e., not achieving 100% accuracy on valid2 using the script above), we recommend trying other seeds.\n\n## BibTex\n```\n@inproceedings{irie2023practical,\n      title={Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions}, \n      author={Kazuki Irie and R\\'obert Csord\\'as and J\\\"urgen Schmidhuber},\n      booktitle={Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP)},\n      address={Sentosa, Singapore},\n      year={2023}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidsia%2Ffwp-formal-lang","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fidsia%2Ffwp-formal-lang","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidsia%2Ffwp-formal-lang/lists"}