{"id":25559803,"url":"https://github.com/lmnt-com/haste","last_synced_at":"2025-04-04T21:10:23.728Z","repository":{"id":44959414,"uuid":"237056148","full_name":"lmnt-com/haste","owner":"lmnt-com","description":"Haste: a fast, simple, and open RNN library","archived":false,"fork":false,"pushed_at":"2023-07-18T01:29:25.000Z","size":227,"stargazers_count":330,"open_issues_count":11,"forks_count":28,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-03-28T20:08:33.494Z","etag":null,"topics":["algorithm","api","cpp","cuda","deep-learning","gru","lstm","machine-learning","python","pytorch","rnn","rnn-implementations","rnn-layers","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lmnt-com.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-01-29T18:42:44.000Z","updated_at":"2025-03-11T08:05:07.000Z","dependencies_parsed_at":"2025-03-06T19:11:26.369Z","dependency_job_id":"07b4f710-7786-49b6-9335-8286d85bc36b","html_url":"https://github.com/lmnt-com/haste","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lmnt-com%2Fhaste","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lmnt-com%2Fhaste/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lmnt-com%2Fhaste/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lmnt-com%2Fhaste/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lmnt-com","download_url":"https://codeload.github.com/lmnt-com/haste/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247249532,"owners_count":20908212,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithm","api","cpp","cuda","deep-learning","gru","lstm","machine-learning","python","pytorch","rnn","rnn-implementations","rnn-layers","tensorflow"],"created_at":"2025-02-20T17:20:02.432Z","updated_at":"2025-04-04T21:10:23.703Z","avatar_url":"https://github.com/lmnt-com.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://lmnt.com/assets/haste-logo_social_media.png\"\u003e\n\u003c/div\u003e\n\n--------------------------------------------------------------------------------\n[![GitHub release (latest SemVer including pre-releases)](https://img.shields.io/github/v/release/lmnt-com/haste?include_prereleases)](https://github.com/lmnt-com/haste/releases) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1hzYhcyvbXYMAUwa3515BszSkhx1UUFSt) [![GitHub](https://img.shields.io/github/license/lmnt-com/haste)](LICENSE)\n\n**We're hiring!**\nIf you like what we're building here, [come join us at LMNT](https://explore.lmnt.com).\n\nHaste is a CUDA implementation of fused RNN layers with built-in [DropConnect](http://proceedings.mlr.press/v28/wan13.html) and [Zoneout](https://arxiv.org/abs/1606.01305) regularization. These layers are exposed through C++ and Python APIs for easy integration into your own projects or machine learning frameworks.\n\nWhich RNN types are supported?\n- [GRU](https://en.wikipedia.org/wiki/Gated_recurrent_unit)\n- [IndRNN](http://arxiv.org/abs/1803.04831)\n- [LSTM](https://en.wikipedia.org/wiki/Long_short-term_memory)\n- [Layer Normalized GRU](https://arxiv.org/abs/1607.06450)\n- [Layer Normalized LSTM](https://arxiv.org/abs/1607.06450)\n\nWhat's included in this project?\n- a standalone C++ API (`libhaste`)\n- a TensorFlow Python API (`haste_tf`)\n- a PyTorch API (`haste_pytorch`)\n- examples for writing your own custom C++ inference / training code using `libhaste`\n- benchmarking programs to evaluate the performance of RNN implementations\n\nFor questions or feedback about Haste, please open an issue on GitHub or send us an email at [haste@lmnt.com](mailto:haste@lmnt.com).\n\n## Install\nHere's what you'll need to get started:\n- a [CUDA Compute Capability](https://developer.nvidia.com/cuda-gpus) 3.7+ GPU (required)\n- [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit) 10.0+ (required)\n- [TensorFlow GPU](https://www.tensorflow.org/install/gpu) 1.14+ or 2.0+ for TensorFlow integration (optional)\n- [PyTorch](https://pytorch.org) 1.3+ for PyTorch integration (optional)\n- [Eigen 3](http://eigen.tuxfamily.org/) to build the C++ examples (optional)\n- [cuDNN Developer Library](https://developer.nvidia.com/rdp/cudnn-archive) to build benchmarking programs (optional)\n\nOnce you have the prerequisites, you can install with pip or by building the source code.\n\n### Using pip\n```\npip install haste_pytorch\npip install haste_tf\n```\n\n### Building from source\n```\nmake               # Build everything\nmake haste         # ;) Build C++ API\nmake haste_tf      # Build TensorFlow API\nmake haste_pytorch # Build PyTorch API\nmake examples\nmake benchmarks\n```\n\nIf you built the TensorFlow or PyTorch API, install it with `pip`:\n```\npip install haste_tf-*.whl\npip install haste_pytorch-*.whl\n```\n\nIf the CUDA Toolkit that you're building against is not in `/usr/local/cuda`, you must specify the\n`$CUDA_HOME` environment variable before running make:\n```\nCUDA_HOME=/usr/local/cuda-10.2 make\n```\n\n## Performance\nOur LSTM and GRU benchmarks indicate that Haste has the fastest publicly available implementation for nearly all problem sizes. The following charts show our LSTM results, but the GRU results are qualitatively similar.\n\u003ctable\u003e\n  \u003ctr\u003e\u003ctd\u003e\u003cimg src=\"https://lmnt.com/assets/haste/benchmark/report_n=16_c=128.png\"\u003e\u003c/td\u003e\u003ctd\u003e\u003cimg src=\"https://lmnt.com/assets/haste/benchmark/report_n=32_c=256.png\"\u003e\u003c/td\u003e\u003c/tr\u003e\n  \u003ctr\u003e\u003c/tr\u003e\n  \u003ctr\u003e\u003ctd\u003e\u003cimg src=\"https://lmnt.com/assets/haste/benchmark/report_n=64_c=128.png\"\u003e\u003c/td\u003e\u003ctd\u003e\u003cimg src=\"https://lmnt.com/assets/haste/benchmark/report_n=128_c=256.png\"\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003c/table\u003e\n\nHere is our complete LSTM benchmark result grid:\n\u003cbr\u003e\n[`N=1 C=64`](https://lmnt.com/assets/haste/benchmark/report_n=1_c=64.png)\n[`N=1 C=128`](https://lmnt.com/assets/haste/benchmark/report_n=1_c=128.png)\n[`N=1 C=256`](https://lmnt.com/assets/haste/benchmark/report_n=1_c=256.png)\n[`N=1 C=512`](https://lmnt.com/assets/haste/benchmark/report_n=1_c=512.png)\n\u003cbr\u003e\n[`N=32 C=64`](https://lmnt.com/assets/haste/benchmark/report_n=32_c=64.png)\n[`N=32 C=128`](https://lmnt.com/assets/haste/benchmark/report_n=32_c=128.png)\n[`N=32 C=256`](https://lmnt.com/assets/haste/benchmark/report_n=32_c=256.png)\n[`N=32 C=512`](https://lmnt.com/assets/haste/benchmark/report_n=32_c=512.png)\n\u003cbr\u003e\n[`N=64 C=64`](https://lmnt.com/assets/haste/benchmark/report_n=64_c=64.png)\n[`N=64 C=128`](https://lmnt.com/assets/haste/benchmark/report_n=64_c=128.png)\n[`N=64 C=256`](https://lmnt.com/assets/haste/benchmark/report_n=64_c=256.png)\n[`N=64 C=512`](https://lmnt.com/assets/haste/benchmark/report_n=64_c=512.png)\n\u003cbr\u003e\n[`N=128 C=64`](https://lmnt.com/assets/haste/benchmark/report_n=128_c=64.png)\n[`N=128 C=128`](https://lmnt.com/assets/haste/benchmark/report_n=128_c=128.png)\n[`N=128 C=256`](https://lmnt.com/assets/haste/benchmark/report_n=128_c=256.png)\n[`N=128 C=512`](https://lmnt.com/assets/haste/benchmark/report_n=128_c=512.png)\n\n## Documentation\n### TensorFlow API\n```python\nimport haste_tf as haste\n\ngru_layer = haste.GRU(num_units=256, direction='bidirectional', zoneout=0.1, dropout=0.05)\nindrnn_layer = haste.IndRNN(num_units=256, direction='bidirectional', zoneout=0.1)\nlstm_layer = haste.LSTM(num_units=256, direction='bidirectional', zoneout=0.1, dropout=0.05)\nnorm_gru_layer = haste.LayerNormGRU(num_units=256, direction='bidirectional', zoneout=0.1, dropout=0.05)\nnorm_lstm_layer = haste.LayerNormLSTM(num_units=256, direction='bidirectional', zoneout=0.1, dropout=0.05)\n\n# `x` is a tensor with shape [N,T,C]\nx = tf.random.normal([5, 25, 128])\n\ny, state = gru_layer(x, training=True)\ny, state = indrnn_layer(x, training=True)\ny, state = lstm_layer(x, training=True)\ny, state = norm_gru_layer(x, training=True)\ny, state = norm_lstm_layer(x, training=True)\n```\n\nThe TensorFlow Python API is documented in [`docs/tf/haste_tf.md`](docs/tf/haste_tf.md).\n\n### PyTorch API\n```python\nimport torch\nimport haste_pytorch as haste\n\ngru_layer = haste.GRU(input_size=128, hidden_size=256, zoneout=0.1, dropout=0.05)\nindrnn_layer = haste.IndRNN(input_size=128, hidden_size=256, zoneout=0.1)\nlstm_layer = haste.LSTM(input_size=128, hidden_size=256, zoneout=0.1, dropout=0.05)\nnorm_gru_layer = haste.LayerNormGRU(input_size=128, hidden_size=256, zoneout=0.1, dropout=0.05)\nnorm_lstm_layer = haste.LayerNormLSTM(input_size=128, hidden_size=256, zoneout=0.1, dropout=0.05)\n\ngru_layer.cuda()\nindrnn_layer.cuda()\nlstm_layer.cuda()\nnorm_gru_layer.cuda()\nnorm_lstm_layer.cuda()\n\n# `x` is a CUDA tensor with shape [T,N,C]\nx = torch.rand([25, 5, 128]).cuda()\n\ny, state = gru_layer(x)\ny, state = indrnn_layer(x)\ny, state = lstm_layer(x)\ny, state = norm_gru_layer(x)\ny, state = norm_lstm_layer(x)\n```\n\nThe PyTorch API is documented in [`docs/pytorch/haste_pytorch.md`](docs/pytorch/haste_pytorch.md).\n\n### C++ API\nThe C++ API is documented in [`lib/haste/*.h`](lib/haste/) and there are code samples in [`examples/`](examples/).\n\n## Code layout\n- [`benchmarks/`](benchmarks): programs to evaluate performance of RNN implementations\n- [`docs/tf/`](docs/tf): API reference documentation for `haste_tf`\n- [`docs/pytorch/`](docs/pytorch): API reference documentation for `haste_pytorch`\n- [`examples/`](examples): examples for writing your own C++ inference / training code using `libhaste`\n- [`frameworks/tf/`](frameworks/tf): TensorFlow Python API and custom op code\n- [`frameworks/pytorch/`](frameworks/pytorch): PyTorch API and custom op code\n- [`lib/`](lib): CUDA kernels and C++ API\n- [`validation/`](validation): scripts to validate output and gradients of RNN layers\n\n## Implementation notes\n- the GRU implementation is based on `1406.1078v1` (same as cuDNN) rather than `1406.1078v3`\n- Zoneout on LSTM cells is applied to the hidden state only, and not the cell state\n- the layer normalized LSTM implementation uses [these equations](https://github.com/lmnt-com/haste/issues/1)\n\n## References\n1. Hochreiter, S., \u0026 Schmidhuber, J. (1997). Long Short-Term Memory. _Neural Computation_, _9_(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735\n1. Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., \u0026 Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. _arXiv:1406.1078 [cs, stat]_. http://arxiv.org/abs/1406.1078.\n1. Wan, L., Zeiler, M., Zhang, S., Cun, Y. L., \u0026 Fergus, R. (2013). Regularization of Neural Networks using DropConnect. In _International Conference on Machine Learning_ (pp. 1058–1066). Presented at the International Conference on Machine Learning. http://proceedings.mlr.press/v28/wan13.html.\n1. Krueger, D., Maharaj, T., Kramár, J., Pezeshki, M., Ballas, N., Ke, N. R., et al. (2017). Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations. _arXiv:1606.01305 [cs]_. http://arxiv.org/abs/1606.01305.\n1. Ba, J., Kiros, J.R., \u0026 Hinton, G.E. (2016). Layer Normalization. _arXiv:1607.06450 [cs, stat]_. https://arxiv.org/abs/1607.06450.\n1. Li, S., Li, W., Cook, C., Zhu, C., \u0026 Gao, Y. (2018). Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN. _arXiv:1803.04831 [cs]_. http://arxiv.org/abs/1803.04831.\n\n## Citing this work\nTo cite this work, please use the following BibTeX entry:\n```\n@misc{haste2020,\n  title  = {Haste: a fast, simple, and open RNN library},\n  author = {Sharvil Nanavati},\n  year   = 2020,\n  month  = \"Jan\",\n  howpublished = {\\url{https://github.com/lmnt-com/haste/}},\n}\n```\n\n## License\n[Apache 2.0](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flmnt-com%2Fhaste","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flmnt-com%2Fhaste","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flmnt-com%2Fhaste/lists"}