{"id":13754390,"url":"https://github.com/salesforce/pytorch-qrnn","last_synced_at":"2025-09-28T21:30:27.981Z","repository":{"id":44797029,"uuid":"105064915","full_name":"salesforce/pytorch-qrnn","owner":"salesforce","description":"PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM","archived":true,"fork":false,"pushed_at":"2022-02-12T14:34:07.000Z","size":74,"stargazers_count":1260,"open_issues_count":23,"forks_count":193,"subscribers_count":51,"default_branch":"master","last_synced_at":"2024-12-27T13:03:04.699Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/salesforce.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null}},"created_at":"2017-09-27T20:16:39.000Z","updated_at":"2024-11-21T03:15:26.000Z","dependencies_parsed_at":"2022-07-19T18:08:33.298Z","dependency_job_id":null,"html_url":"https://github.com/salesforce/pytorch-qrnn","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salesforce%2Fpytorch-qrnn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salesforce%2Fpytorch-qrnn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salesforce%2Fpytorch-qrnn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salesforce%2Fpytorch-qrnn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/salesforce","download_url":"https://codeload.github.com/salesforce/pytorch-qrnn/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":234563123,"owners_count":18853056,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T09:01:58.004Z","updated_at":"2025-09-28T21:30:22.505Z","avatar_url":"https://github.com/salesforce.png","language":"Python","funding_links":[],"categories":["其他_NLP自然语言处理","Python","Paper implementations｜论文实现","Paper implementations"],"sub_categories":["其他_文本生成、文本对话","Other libraries｜其他库:","Other libraries:"],"readme":"# Quasi-Recurrent Neural Network (QRNN) for PyTorch\n\nUpdated to support multi-GPU environments via `DataParallel` - see the the `multigpu_dataparallel.py` example.\n\nThis repository contains a PyTorch implementation of [Salesforce Research](https://einstein.ai/)'s [Quasi-Recurrent Neural Networks](https://arxiv.org/abs/1611.01576) paper.\n\nThe QRNN provides similar accuracy to the LSTM but can be betwen 2 and 17 times faster than the highly optimized NVIDIA cuDNN LSTM implementation depending on the use case.\n\nTo install, simply run:\n\n`pip install cupy pynvrtc git+https://github.com/salesforce/pytorch-qrnn`\n\nIf you use this code or our results in your research, please cite:\n\n```\n@article{bradbury2016quasi,\n  title={{Quasi-Recurrent Neural Networks}},\n  author={Bradbury, James and Merity, Stephen and Xiong, Caiming and Socher, Richard},\n  journal={International Conference on Learning Representations (ICLR 2017)},\n  year={2017}\n}\n```\n\n## Software Requirements\n\nThis codebase requires Python 3, [PyTorch](http://pytorch.org/), [pynvrtc](https://github.com/NVIDIA/pynvrtc) (NVIDIA's Python Bindings to NVRTC), and [CuPy](https://cupy.chainer.org/).\nWhile the codebase contains a CPU implementation of the QRNN, the GPU QRNN implementation is used by default if possible.\nRequirements are provided in `requirements.txt`.\n\n## Example Usage\n\nWe've updated the previously released Salesforce Research [AWD-LSTM language modeling](https://github.com/salesforce/awd-lstm-lm) codebase to support use of the `AWD-QRNN`.\nWith the same number of parameters as the LSTM and less well tuned hyper parameters, the QRNN model trains over twice as quickly and achieves nearly equivalent state-of-the-art language modeling results.\nFor full details, refer to the [AWD-LSTM-LM repository](https://github.com/salesforce/awd-lstm-lm).\n\n## Usage\n\nThe QRNN API is meant to be drop-in compatible with the [LSTM](http://pytorch.org/docs/master/_modules/torch/nn/modules/rnn.html#LSTM) for many standard use cases.\nAs such, the easiest thing to do is replace any `GRU` or `LSTM` module with the `QRNN`.\n\nNote: bidirectional QRNN is not yet supported though will be in the near future.\n\n```python\nimport torch\nfrom torchqrnn import QRNN\n\nseq_len, batch_size, hidden_size = 7, 20, 256\nsize = (seq_len, batch_size, hidden_size)\nX = torch.autograd.Variable(torch.rand(size), requires_grad=True).cuda()\n\nqrnn = QRNN(hidden_size, hidden_size, num_layers=2, dropout=0.4)\nqrnn.cuda()\noutput, hidden = qrnn(X)\n\nprint(output.size(), hidden.size())\n```\n\nThe full documentation for the `QRNN` is listed below:\n\n```\nQRNN(input_size, hidden_size, num_layers, dropout=0):\n    Applies a multiple layer Quasi-Recurrent Neural Network (QRNN) to an input sequence.\n\n    Args:\n        input_size: The number of expected features in the input x.\n        hidden_size: The number of features in the hidden state h. If not specified, the input size is used.\n        num_layers: The number of QRNN layers to produce.\n        layers: List of preconstructed QRNN layers to use for the QRNN module (optional).\n        save_prev_x: Whether to store previous inputs for use in future convolutional windows (i.e. for a continuing sequence such as in language modeling). If true, you must call reset to remove cached previous values of x. Default: False.\n        window: Defines the size of the convolutional window (how many previous tokens to look when computing the QRNN values). Supports 1 and 2. Default: 1.\n        zoneout: Whether to apply zoneout (i.e. failing to update elements in the hidden state) to the hidden state updates. Default: 0.\n        output_gate: If True, performs QRNN-fo (applying an output gate to the output). If False, performs QRNN-f. Default: True.\n        use_cuda: If True, uses fast custom CUDA kernel. If False, uses naive for loop. Default: True.\n\n    Inputs: X, hidden\n        - X (seq_len, batch, input_size): tensor containing the features of the input sequence.\n        - hidden (layers, batch, hidden_size): tensor containing the initial hidden state for the QRNN.\n\n    Outputs: output, h_n\n        - output (seq_len, batch, hidden_size): tensor containing the output of the QRNN for each timestep.\n        - h_n (layers, batch, hidden_size): tensor containing the hidden state for t=seq_len\n```\n\nThe included QRNN layer supports convolutional windows of size 1 or 2 but will be extended in the future to support arbitrary convolutions.\n\nIf you are using convolutional windows of size 2 (i.e. looking at the inputs from two previous timesteps to compute the input) and want to run over a long sequence in batches, such as when using BPTT, you can set `save_prev_x=True` and call `reset` when you wish to reset the cached previous inputs.\n \nIf you want flexibility in the definition of each QRNN layer, you can construct individual `QRNNLayer` modules and pass them to the `QRNN` module using the `layer` argument.\n\n## Speed\n\nSpeeds are between 2 and 17 times faster than NVIDIA's cuDNN LSTM, with the difference as a result of varying batch size and sequence length.\nThe largest gains are for small batch sizes or long sequence lengths, both highlighting the LSTMs parallelization difficulty due to forced sequentiality.\nFor full information, refer to the [Quasi-Recurrent Neural Networks](https://arxiv.org/abs/1611.01576) paper.\n\n![Figure 4 from QRNN paper](images/qrnn_speed.png)\n\nPictured above is Figure 4 from the QRNN paper:  \n*Left: Training speed for two-layer 640-unit PTB LM on a batch of 20 examples of 105 timesteps. “RNN” and “softmax” include the forward and backward times, while “optimization overhead” includes gradient clipping, L2 regularization, and SGD computations.  \nRight: Inference speed advantage of a 320-unit QRNN layer alone over an equal-sized cuDNN LSTM layer for data with the given batch size and sequence length. Training results are similar.*\n\n## Extending the QRNN speed advantage to other recurrent architectures with ForgetMult\n\nThe QRNN architecture's speed advantage comes from two primary sources: the ability to batch all computations into a few large matrix multiplications and the use of a fast element-wise recurrence function.\nThis recurrence function, named `ForgetMult`, is general and can be used in other scenarios.\nThe `ForgetMult` takes two arguments - the candidate input `x` and forget gates `f` - and computes `h = f * x + (1 - f) * hm1` where `hm1` is the previous hidden state output.\n\nThe `QRNN` class is a thin wrapper around this that performs the large matrix multiplications for the candidate `x`, the forget gates `f`, and the output gates `o`.\nAny other operation which requires recurrence and can have precomputed values for the candidate `x` and forget gates `f` can use this fast form of recurrence.\n\nExample usage of the ForgetMult module: `output = ForgetMult()(f, x, hidden)`.\n\n```\n    ForgetMult computes a simple recurrent equation:\n    h_t = f_t * x_t + (1 - f_t) * h_{t-1}\n\n    This equation is equivalent to dynamic weighted averaging.\n\n    Inputs: X, hidden\n        - X (seq_len, batch, input_size): tensor containing the features of the input sequence.\n        - F (seq_len, batch, input_size): tensor containing the forget gate values, assumed in range [0, 1].\n        - hidden_init (batch, input_size): tensor containing the initial hidden state for the recurrence (h_{t-1}).\n        - cuda: If True, use the fast element-wise CUDA kernel for recurrence. If False, uses naive for loop. Default: True.\n```\n## Want to help out?\n\nFirst, thanks! :)\n\nOpen tasks that are interesting:\n\n+ Modify the `ForgetMult` CUDA kernel to produce a `BackwardForgetMult`. This will enable a bidirectional QRNN. The input should be the same - `f` and `x` - but the kernel should walk backwards through the inputs.\n+ Bidirectional QRNN support (requires the modification above)\n+ Support PyTorch's `PackedSequence` such that variable length sequences are correctly masked\n+ Show how to use the underlying fast recurrence operator `ForgetMult` in other generic ways\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsalesforce%2Fpytorch-qrnn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsalesforce%2Fpytorch-qrnn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsalesforce%2Fpytorch-qrnn/lists"}