{"id":13444585,"url":"https://github.com/OpenNLPLab/HGRN2","last_synced_at":"2025-03-20T19:30:33.867Z","repository":{"id":232759780,"uuid":"783228142","full_name":"OpenNLPLab/HGRN2","owner":"OpenNLPLab","description":"HGRN2: Gated Linear RNNs with State Expansion","archived":false,"fork":false,"pushed_at":"2024-08-20T07:47:14.000Z","size":670,"stargazers_count":48,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-10-28T08:41:44.358Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenNLPLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-07T09:47:42.000Z","updated_at":"2024-09-30T23:27:41.000Z","dependencies_parsed_at":null,"dependency_job_id":"71f093b4-5cc4-496f-9f41-85eefc5c419d","html_url":"https://github.com/OpenNLPLab/HGRN2","commit_stats":null,"previous_names":["opennlplab/hgrn2"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenNLPLab%2FHGRN2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenNLPLab%2FHGRN2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenNLPLab%2FHGRN2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenNLPLab%2FHGRN2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenNLPLab","download_url":"https://codeload.github.com/OpenNLPLab/HGRN2/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244676454,"owners_count":20491828,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T04:00:31.248Z","updated_at":"2025-03-20T19:30:32.868Z","avatar_url":"https://github.com/OpenNLPLab.png","language":"Python","funding_links":[],"categories":["On the replacement of transformer/attention by SSMs"],"sub_categories":[],"readme":"\n# HGRN2\n\n**20240820 Update**: HGRN2 has been accepted by COLM 2024.\n\nOfficial implementation of `HGRN2: Gated Linear RNNs with State Expansion`. This repo does not contain specific codes, but only scripts and some instructions on how to reproduce the results of the paper. The overall directory is as follows:\n\n- [HGRN2](#hgrn2)\n  - [Main result](#main-result)\n  - [Overall Architecture](#overall-architecture)\n  - [Algorithm](#algorithm)\n  - [Standalone code](#standalone-code)\n  - [Experiments](#experiments)\n    - [Environment Preparation](#environment-preparation)\n      - [Env1(for lra experiments)](#env1for-lra-experiments)\n      - [Env2(for imagenet)](#env2for-imagenet)\n      - [Env3(for wikitext)](#env3for-wikitext)\n      - [Env4(for mqar)](#env4for-mqar)\n    - [Wikitext-103](#wikitext-103)\n      - [1) Download the data](#1-download-the-data)\n      - [2) Train the autoregressive language model](#2-train-the-autoregressive-language-model)\n    - [Image modeling](#image-modeling)\n    - [LRA](#lra)\n      - [1) Preparation](#1-preparation)\n      - [2) Training](#2-training)\n    - [Mqar](#mqar)\n  - [Citation](#citation)\n\n\n\n## Main result\n\nWe list the main experimental results in the table below; for the complete experimental results, please refer to the paper.\n\n![lra](./figures/lra.png)\n\n![llm](./figures/llm.png)\n\n\n## Overall Architecture\n\nThe overall network architecture is as follows:\n\n\u003cdiv  align=\"center\"\u003e \u003cimg src=\"./figures/hgrn2.png\" width = \"100%\" height = \"100%\" alt=\"network\" align=center /\u003e\u003c/div\u003e\n\n## Algorithm\n\nThe modification to hgrn2 is very simple. Compared to hgrn1, the recursive formula becomes:\n\n$$\n\\begin{equation}\n\\begin{aligned}\n    \\mathbf h_t  \u0026= \\mathrm{Diag}\\{\\mathbf f_t\\}  \\cdot\n\\mathbf h_{t-1}\n+(1- \\mathbf f_t) \\otimes  \\mathbf  i_t \\in \\mathbb R^{d\\times d}, \\\\\n   \\mathbf{y_t} \u0026= \\mathbf o_t \\cdot \\mathbf h_t \\in \\mathbb R^{1 \\times d},    \n\\end{aligned}\n\\end{equation}\n$$\n\nwhere $\\otimes$ denotes out product, and $\\cdot$ denotes inner product.\n\nKey insights:\n\n1. Expand memory is quite import.\n2. Outproduct is a parameter efficient expanding methods.\n3. Transitioning from linear RNN to linear attention. (The output gate plays the role of Q, (1 - forget gate) plays the role of K, and the input state plays the role of V.)\n4. No need extra parameters to represent forget gate like GLA/Mamba.\n\n## Standalone code\nSee [hgru2-pytorch](https://github.com/Doraemonzzz/hgru2-pytorch). In order to reproduce the experimental results, please use the reproduce branch!\nThe other implementations come from [fla](https://github.com/sustcsonglin/flash-linear-attention/tree/main/fla/models/hgrn2), thanks for [yzhangcs](https://github.com/yzhangcs)'s implementation.\n\n## Experiments\n\n### Environment Preparation\n\nOur experiment uses several conda environments.\n\n#### Env1(for lra experiments)\n\nFirst build the conda environment based on the yaml file:\n\n```\nconda env create --file lra.yaml\n```\n\nThen, install `hgru-pytorch`:\n```\nconda activate lra\ngit clone https://github.com/Doraemonzzz/hgru2-pytorch\ncd hgru2-pytorch\npip install .\n```\n\n#### Env2(for imagenet)\n\nBuild the conda environment based on the yaml file:\n\n```\nconda env create --file im.yaml\n```\n\nThen, install `hgru-pytorch`:\n```\nconda activate im\ngit clone https://github.com/Doraemonzzz/hgru2-pytorch\ncd hgru2-pytorch\npip install .\n```\n\n#### Env3(for wikitext)\nRegarding the wikitext-103 experiment, we provide the main version dependencies:\n```\ntorch==2.0.1\ntriton==2.0.0\ntriton-nightly==2.1.0.dev20230728172942\n```\nAfter setting up the basic environment, you need to use our version of fairseq:\n```\ngit clone https://github.com/OpenNLPLab/fairseq-evo.git\ncd fairseq-eva\npip install -e .\n```\n\n#### Env4(for mqar)\nRegarding the mqar experiment, we provide the main version dependencies:\n```\ntorch==2.0.1\ntriton==2.1.0\n```\nAfter setting up the basic environment, you also need fla:\n```\ngit clone https://github.com/sustcsonglin/flash-linear-attention\ncd flash-linear-attention\npip install -e .\n```\n\n\n### Wikitext-103\n\n#### 1) Download the data\n\nFirst download the wikitext-103 dataset:\n```\ngit clone https://huggingface.co/datasets/OpenNLPLab/wikitext-103\n```\n\n\n#### 2) Train the autoregressive language model\n\nUse the following command to train language model:\n\n```\nbash script_lm.sh arch num_gpus data_dir\n```\nwhere `arch` is chosen from\n```\nhgrn2_lm_expand2\nhgrn2_lm_outproduct_1\nhgrn2_lm_outproduct_2\nhgrn2_lm_outproduct_4\nhgrn2_lm_outproduct_8\nhgrn2_lm_outproduct_16\nhgrn2_lm_outproduct_32\nhgrn2_lm_outproduct_64\nhgrn2_lm_outproduct_128\n```\n`num_gpus` is the number of gpus and `data_dir` is wikitext-103's path.\n\n\n### Image modeling\n\nFirst clone the following codebase:\n```\ngit clone https://github.com/OpenNLPLab/im.git\n```\nThen change the `PROG` and `DATA` in `script_im.sh`, finally run the following script\n```\npython run_im.py\n```\n\n\n### LRA\n\n#### 1) Preparation\n\nDownload the raw data:\n```\nwget https://storage.googleapis.com/long-range-arena/lra_release.gz\nmv lra_release.gz lra_release.tar.gz \ntar -xvf lra_release.tar.gz\n```\nOr download the preprocessed data:\n```\ngit clone https://huggingface.co/datasets/OpenNLPLab/lra\n```\n\nClone the following repo:\n\n```\ngit clone https://github.com/OpenNLPLab/lra.git\ngit checkout release_torch2\n```\n\nChange the `DATA_PATH` and `program_path` in `script_lra_others.sh` and `srcipt_lra_image`.\n\n\n#### 2) Training\n\nUse the following script to run the experiments:\n\n```\npython run_lra.py\n```\n\n### Mqar\nFirst change the `code_dir` and `cache_dir` in `script_mqar.sh`, then run the following script:\n```\nbash script_mqar.sh\n```\n\n\n## Citation\n\nIf you find our repository or paper valuable, please cite it using the following BibTeX.\n\n```\n@misc{2404.07904,\nAuthor = {Zhen Qin and Songlin Yang and Weixuan Sun and Xuyang Shen and Dong Li and Weigao Sun and Yiran Zhong},\nTitle = {HGRN2: Gated Linear RNNs with State Expansion},\nYear = {2024},\nEprint = {arXiv:2404.07904},\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOpenNLPLab%2FHGRN2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FOpenNLPLab%2FHGRN2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOpenNLPLab%2FHGRN2/lists"}