{"id":13444637,"url":"https://github.com/OpenNLPLab/HGRN","last_synced_at":"2025-03-20T19:30:37.901Z","repository":{"id":197515164,"uuid":"698798052","full_name":"OpenNLPLab/HGRN","owner":"OpenNLPLab","description":"[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Sequence Modeling","archived":false,"fork":false,"pushed_at":"2024-04-24T07:10:51.000Z","size":424,"stargazers_count":61,"open_issues_count":0,"forks_count":4,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-10-28T08:41:55.844Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenNLPLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-01T02:12:37.000Z","updated_at":"2024-10-09T17:11:00.000Z","dependencies_parsed_at":"2024-10-28T06:52:16.920Z","dependency_job_id":"12d61daa-3875-4b36-82ee-7c4087a0cef4","html_url":"https://github.com/OpenNLPLab/HGRN","commit_stats":null,"previous_names":["opennlplab/hgrn"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenNLPLab%2FHGRN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenNLPLab%2FHGRN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenNLPLab%2FHGRN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenNLPLab%2FHGRN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenNLPLab","download_url":"https://codeload.github.com/OpenNLPLab/HGRN/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244676454,"owners_count":20491828,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T04:00:32.699Z","updated_at":"2025-03-20T19:30:37.293Z","avatar_url":"https://github.com/OpenNLPLab.png","language":"Python","funding_links":[],"categories":["NeurIPS 2023"],"sub_categories":[],"readme":"\n\u003ch1 align=\"center\"\u003e\n  HGRN - Hierarchically Gated Recurrent Neural Network for Sequence Modeling\n\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n🤗 \u003ca href=\"https://huggingface.co/OpenNLPLab/HGRN-150M\" target=\"_blank\"\u003eHF-150M\u003c/a\u003e •\n🤗 \u003ca href=\"https://huggingface.co/OpenNLPLab/HGRN-355M\" target=\"_blank\"\u003eHF-355M\u003c/a\u003e •\n🤗 \u003ca href=\"https://huggingface.co/OpenNLPLab/HGRN-1B\" target=\"_blank\"\u003eHF-1B\u003c/a\u003e \n\u003c/p\u003e\n\n\nOfficial implementation of Hierarchically Gated Recurrent Neural Network for Sequence Modeling. This repo does not contain specific codes, but only scripts and some instructions on how to reproduce the results of the paper. The overall directory is as follows:\n\n- [Overall Architecture](#overall-architecture)\n- [Algorithm](#algorithm)\n- [Experiments](#experiments)\n  - [Environment Preparation](#environment-preparation)\n    - [Env1](#env1)\n    - [Env2](#env2)\n  - [Autoregressive language model](#autoregressive-language-model)\n    - [1) Preprocess the data](#1-preprocess-the-data)\n    - [2) Train the autoregressive language model](#2-train-the-autoregressive-language-model)\n  - [Image modeling](#image-modeling)\n  - [LRA](#lra)\n    - [1) Preparation](#1-preparation)\n    - [2) Training](#2-training)\n- [Standalone code](#standalone-code)\n\n\n## Overall Architecture\n\nThe overall network architecture is as follows:\n\n\u003cdiv  align=\"center\"\u003e \u003cimg src=\"./hgrn.png\" width = \"100%\" height = \"100%\" alt=\"network\" align=center /\u003e\u003c/div\u003e\n\n## Algorithm\nThe input is $\\mathbf{x}_t \\in \\mathbb R^{d}$, where $d$ is the hidden dimension. First we compute the hidden states:\n\n```math\n\\begin{aligned}\n\\mathrm{Re}\\left(\\mathbf{c}_t\\right) = \\mathrm{SiLU} \\left(\\mathbf{x}_t \\mathbf{W}_{c r} + \\mathbf{b}_{c r}\\right) \\in \\mathbb{R}^{1 \\times d}, \\\\\n\\mathrm{Im}\\left(\\mathbf{c}_t\\right) = \\mathrm{SiLU} \\left(\\mathbf{x}_t \\mathbf{W}_{c i} + \\mathbf{b}_{c i}\\right) \\in \\mathbb{R}^{1 \\times d}.\n\\end{aligned}\n```\n\nThen we compute layer dependent lower bound as follows:\n\n```math\n\\begin{aligned}\n\\mathbf{P} \u0026 =\\left(\\mathrm{Softmax}(\\boldsymbol{\\Gamma}, \\mathrm{dim}=0) \\in \\mathbb{R}^{H \\times d}\\right. ,\\\\\n\\gamma^k \u0026 =[\\mathrm{Cumsum}(\\mathbf{P}, \\mathrm{dim}=0)]_k \\in \\mathbb{R}^{1 \\times d},\n\\end{aligned}\n```\n\nwhere HH is the number of layers and \n\n```math\n[\\mathrm{Cumsum}(\\mathbf{x})]_k=\\sum_{i=1}^{k-1} x_i.\n```\n\nWe use this lower bound to compute forget gate:\n\n```math\n\\begin{aligned}\n\u0026 \\mu_t=\\mathrm{Sigmoid}\\left(\\mathbf{x}_t \\mathbf{W}_\\mu+\\mathbf{b}_\\mu\\right) \\in \\mathbb{R}^{1 \\times d}, \\\\\n\u0026 \\lambda_t=\\gamma^k+\\left(1-\\gamma^k\\right) \\odot \\mu_t \\in \\mathbb{R}^{1 \\times d}.\n\\end{aligned}\n```\n\nThe full recurrence(HRU) is as follows:\n\n```math\n\\mathbf{h}_t=\\lambda_t \\exp (i \\theta) \\cdot \\mathbf{h}_{t-1}+\\left(1-\\lambda_t\\right) \\cdot \\mathbf{c}_t \\in \\mathbb{C}^{1 \\times d}.\n```\n\nCombine $\\mathbf h_t$ with the output gates and projection, we get the final result:\n\n```math\n\\begin{aligned}\n\u0026 \\mathbf{g}_t=\\tau\\left(W_g \\mathbf{x}_t+b_g\\right) \\in \\mathbb{R}^{1 \\times 2 d} \\\\\n\u0026 \\mathbf{o}_t^{\\prime}=\\mathrm{LayerNorm}\\left(\\mathbf{g}_t \\odot\\left[\\mathrm{Re}\\left(\\mathbf{h}_t\\right), \\mathrm{Im}\\left(\\mathbf{h}_t\\right)\\right]\\right) \\in \\mathbb{R}^{1 \\times d} \\\\\n\u0026 \\mathbf{o}_t=\\mathbf{o}_t^{\\prime} \\mathbf{W}_o+\\mathbf{b}_o \\in \\mathbb{R}^{1 \\times d}\n\\end{aligned}\n```\n\n## Experiments\n\n### Environment Preparation\n\nOur experiment uses two conda environments, where Autoregressive language modeling, needs to configure the environment according to the Env1 part, and LRA needs to configure the environment according to the Env2 part.\n\n#### Env1\n\nFirst build the conda environment based on the yaml file:\n\n```\nconda env create --file env1.yaml\n```\n\nIf you meet an error when installing torch, just remove torch and torchvision in the yaml file, rerun the above command, and then run the below commands:\n\n```\nconda activate hgrn\nwget https://download.pytorch.org/whl/cu111/torch-1.8.1%2Bcu111-cp36-cp36m-linux_x86_64.whl\npip install torch-1.8.1+cu111-cp36-cp36m-linux_x86_64.whl\npip install -r requirements_hgrn.txt\n```\n\nThen, install `hgru-pytorch`:\n```\nconda activate hgrn\ncd hgru-pytorch\npip install .\n```\n\nFinally, install our version of fairseq:\n\n```\ncd fairseq\npip install --editable ./\n```\n\n\n\n#### Env2\n\nBuild the conda environment based on the yaml file:\n\n```\nconda env create --file env2.yaml\n```\n\nIf you encounter difficulties in setting up the environment, you can install the conda environment first, and then use the following command to install the pip packages:\n```\npip install torch==1.10.0+cu111 torchvision==0.11.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html\npip install -r requirements_lra.txt\n```\n\nFinally, install `hgru-pytorch`:\n```\nconda activate lra\ncd hgru-pytorch\npip install .\n```\n\n\n### Autoregressive language model\n\n#### 1) Preprocess the data\n\nFirst download the [WikiText-103 dataset](https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/):\n\n```\nwget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-raw-v1.zip\nunzip wikitext-103-raw-v1.zip\n```\n\nNext, encode it with the GPT-2 BPE:\n\n```\nmkdir -p gpt2_bpe\nwget -O gpt2_bpe/encoder.json https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json\nwget -O gpt2_bpe/vocab.bpe https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe\nfor SPLIT in train valid test; do \\\n    python -m examples.roberta.multiprocessing_bpe_encoder \\\n        --encoder-json gpt2_bpe/encoder.json \\\n        --vocab-bpe gpt2_bpe/vocab.bpe \\\n        --inputs wikitext-103-raw/wiki.${SPLIT}.raw \\\n        --outputs wikitext-103-raw/wiki.${SPLIT}.bpe \\\n        --keep-empty \\\n        --workers 60; \\\ndone\n```\n\nFinally, preprocess/binarize the data using the GPT-2 fairseq dictionary:\n\n```\nwget -O gpt2_bpe/dict.txt https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt\nfairseq-preprocess \\\n    --only-source \\\n    --srcdict gpt2_bpe/dict.txt \\\n    --trainpref wikitext-103-raw/wiki.train.bpe \\\n    --validpref wikitext-103-raw/wiki.valid.bpe \\\n    --testpref wikitext-103-raw/wiki.test.bpe \\\n    --destdir data-bin/wikitext-103 \\\n    --workers 60\n```\n\nThis step comes from [fairseq](https://github.com/facebookresearch/fairseq/blob/main/examples/roberta/README.pretraining.md).\n\n\n\n\n#### 2) Train the autoregressive language model\n\nUse the following command to train language model:\n\n```\nbash script_alm.sh\n```\n\nYou should change data_dir to preprocessed data.\n\n\n\n### Image modeling\nFirst clone the following codebase:\n```\ngit clone https://github.com/OpenNLPLab/im.git\n```\nThen change the `code_dir` and `data_dir` in `script_im.sh`, finally run the following script\n```\nbash script_im.sh\n```\n\n### LRA\n\n#### 1) Preparation\n\nDownload the codebase:\n\n```\ngit clone https://github.com/OpenNLPLab/lra.git\n```\n\nDownload the data:\n\n```\nwget https://storage.googleapis.com/long-range-arena/lra_release.gz\nmv lra_release.gz lra_release.tar.gz \ntar -xvf lra_release.tar.gz\n```\n\n\n#### 2) Training\n\nUse the following script to run the experiments, you should change `PREFIX` to your lra path, change `tasks` to a specific task:\n\n```\npython script_lra.py\n```\n\n\n\n## Standalone code\nSee [hgru-pytorch](https://github.com/Doraemonzzz/hgru-pytorch).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOpenNLPLab%2FHGRN","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FOpenNLPLab%2FHGRN","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOpenNLPLab%2FHGRN/lists"}