{"id":13717047,"url":"https://github.com/taolei87/sru","last_synced_at":"2026-01-10T03:52:18.661Z","repository":{"id":46741081,"uuid":"405450017","full_name":"taolei87/sru","owner":"taolei87","description":"Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)","archived":false,"fork":true,"pushed_at":"2021-09-28T01:29:04.000Z","size":974,"stargazers_count":33,"open_issues_count":1,"forks_count":6,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-28T20:55:12.457Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"asappresearch/sru","license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/taolei87.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-09-11T18:08:32.000Z","updated_at":"2025-04-16T10:59:28.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/taolei87/sru","commit_stats":null,"previous_names":[],"tags_count":35,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taolei87%2Fsru","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taolei87%2Fsru/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taolei87%2Fsru/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taolei87%2Fsru/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/taolei87","download_url":"https://codeload.github.com/taolei87/sru/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252826898,"owners_count":21810200,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T00:01:17.295Z","updated_at":"2026-01-10T03:52:18.604Z","avatar_url":"https://github.com/taolei87.png","language":null,"funding_links":[],"categories":["Pytorch \u0026 related libraries｜Pytorch \u0026 相关库","Pytorch \u0026 related libraries"],"sub_categories":["Other libraries｜其他库:","Other libraries:"],"readme":"\n## News\nSRU++, a new SRU variant, is released. [[tech report](https://arxiv.org/pdf/2102.12459.pdf)] [[blog](https://www.asapp.com/blog/reducing-the-high-cost-of-training-nlp-models-with-sru/)]\n\nThe experimental code and SRU++ implementation are available on [the dev branch](https://github.com/asappresearch/sru/tree/3.0.0-dev/experiments/srupp_experiments) which will be merged into master later.\n\n## About\n\n**SRU** is a recurrent unit that can run over 10 times faster than cuDNN LSTM, without loss of accuracy tested on many tasks. \n\u003cp align=\"center\"\u003e\n\u003cimg width=620 src=\"https://raw.githubusercontent.com/taolei87/sru/master/imgs/speed.png\"\u003e\u003cbr\u003e\n\u003ci\u003eAverage processing time of LSTM, conv2d and SRU, tested on GTX 1070\u003c/i\u003e\u003cbr\u003e\n\u003c/p\u003e\nFor example, the figure above presents the processing time of a single mini-batch of 32 samples. SRU achieves 10 to 16 times speed-up compared to LSTM, and operates as fast as (or faster than) word-level convolution using conv2d.\n\n#### Reference:\nSimple Recurrent Units for Highly Parallelizable Recurrence [[paper](https://arxiv.org/abs/1709.02755)]\n```\n@inproceedings{lei2018sru,\n  title={Simple Recurrent Units for Highly Parallelizable Recurrence},\n  author={Tao Lei and Yu Zhang and Sida I. Wang and Hui Dai and Yoav Artzi},\n  booktitle={Empirical Methods in Natural Language Processing (EMNLP)},\n  year={2018}\n}\n```\n\nWhen Attention Meets Fast Recurrence: Training Language Models with Reduced Compute [[paper](https://arxiv.org/pdf/2102.12459)]\n```\n@article{lei2021srupp,\n  title={When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute},\n  author={Tao Lei},\n  journal={arXiv preprint arXiv:2102.12459},\n  year={2021}\n}\n```\n\u003cbr\u003e\n\n## Requirements\n - [PyTorch](http://pytorch.org/) \u003e=1.6 recommended\n - [ninja](https://ninja-build.org/)\n\nInstall requirements via `pip install -r requirements.txt`.\n\n\u003cbr\u003e\n\n## Installation\n\n#### From source:\nSRU can be installed as a regular package via `python setup.py install` or `pip install .`.\n\n#### From PyPi:\n`pip install sru`\n\n\n#### Directly use the source without installation:\nMake sure this repo and CUDA library can be found by the system, e.g. \n```\nexport PYTHONPATH=path_to_repo/sru\nexport LD_LIBRARY_PATH=/usr/local/cuda/lib64\n```\n\n\u003cbr\u003e\n\n## Examples\nThe usage of SRU is similar to `nn.LSTM`. SRU likely requires more stacking layers than LSTM. We recommend starting by 2 layers and use more if necessary (see our report for more experimental details).\n```python\nimport torch\nfrom sru import SRU, SRUCell\n\n# input has length 20, batch size 32 and dimension 128\nx = torch.FloatTensor(20, 32, 128).cuda()\n\ninput_size, hidden_size = 128, 128\n\nrnn = SRU(input_size, hidden_size,\n    num_layers = 2,          # number of stacking RNN layers\n    dropout = 0.0,           # dropout applied between RNN layers\n    bidirectional = False,   # bidirectional RNN\n    layer_norm = False,      # apply layer normalization on the output of each layer\n    highway_bias = -2,        # initial bias of highway gate (\u003c= 0)\n)\nrnn.cuda()\n\noutput_states, c_states = rnn(x)      # forward pass\n\n# output_states is (length, batch size, number of directions * hidden size)\n# c_states is (layers, batch size, number of directions * hidden size)\n\n```\n  \n\u003cbr\u003e\n\n## Contributing\nPlease read and follow the [guidelines](CONTRIBUTING.md).\n\n\n### Other Implementations\n\n[@musyoku](https://github.com/musyoku) had a very nice [SRU implementaion](https://github.com/musyoku/chainer-sru) in chainer.\n\n[@adrianbg](https://github.com/adrianbg) implemented the first [CPU version](https://github.com/taolei87/sru/pull/42).\n\n\u003cbr\u003e\n\n  \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaolei87%2Fsru","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftaolei87%2Fsru","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaolei87%2Fsru/lists"}