https://github.com/Hannibal046/RWKV-howto
possibly useful materials for learning RWKV language model.
https://github.com/Hannibal046/RWKV-howto
Last synced: 6 months ago
JSON representation
possibly useful materials for learning RWKV language model.
- Host: GitHub
- URL: https://github.com/Hannibal046/RWKV-howto
- Owner: Hannibal046
- Created: 2023-05-21T08:54:14.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-06-08T15:54:11.000Z (almost 2 years ago)
- Last Synced: 2024-11-12T04:42:20.534Z (6 months ago)
- Size: 5.86 KB
- Stars: 25
- Watchers: 4
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-llm - RWKV教程 - RWKV学习相关材料和教程。 (其他相关论文)
- awesome-llm - RWKV教程 - RWKV学习相关材料和教程。 (其他相关论文)
- Awesome-LLM - RWKV-howto - possibly useful materials and tutorial for learning RWKV. (Other Papers)
README
# RWKV-howto
possibly useful materials and tutorial for learning [RWKV](https://www.rwkv.com).
> RWKV: Parallelizable RNN with Transformer-level LLM Performance.
### Relevant Papers
- :star2:(2023-05) RWKV: Reinventing RNNs for the Transformer Era [arxiv](https://arxiv.org/abs/2305.13048)
- (2023-03) Resurrecting Recurrent Neural Networks for Long Sequences [arxiv](https://arxiv.org/abs/2303.06349)- (2023-02) SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks [arxiv](https://arxiv.org/abs/2302.13939)
- (2022-08) Simplified State Space Layers for Sequence Modeling [ICLR2023](https://openreview.net/forum?id=Ai8Hw3AXqks)- :star2:(2021-05) An Attention Free Transformer [arxiv](https://arxiv.org/abs/2105.14103)
- (2021-10) Efficiently Modeling Long Sequences with Structured State Spaces [ICLR2022](https://arxiv.org/abs/2111.00396)
- (2020-08) Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention [ICML2020](https://arxiv.org/abs/2006.16236)
- (2018) Parallelizing Linear Recurrent Neural Nets Over Sequence Length [ICLR2018](https://openreview.net/forum?id=HyUNwulC-)
- (2017-09) Simple Recurrent Units for Highly Parallelizable Recurrence [EMNLP2017](https://arxiv.org/abs/1709.02755)
- (2017-10) MinimalRNN: Toward More Interpretable and Trainable Recurrent Neural Networks [Neurips2017](https://arxiv.org/abs/1711.06788)
- (2017-06) Attention Is All You Need [Neurips2017](https://arxiv.org/abs/1706.03762)
- (2016-11) Quasi-Recurrent Neural Networks [ICLR2017](https://arxiv.org/abs/1611.01576)### Resources
- Introducing RWKV - An RNN with the advantages of a transformer [Hugging Face](https://huggingface.co/blog/rwkv)
- 有了Transformer框架后是不是RNN完全可以废弃了?[知乎](https://www.zhihu.com/question/302392659/answer/2954997969)
- RNN最简单有效的形式是什么?[知乎](https://zhuanlan.zhihu.com/p/616357772)
- :star2:RWKV的RNN CNN二象性 [知乎](https://zhuanlan.zhihu.com/p/614311961)
- RNN的隐藏层需要非线性吗?[知乎](https://zhuanlan.zhihu.com/p/615672175)
- Google新作试图“复活”RNN:RNN能否再次辉煌? [苏剑林](https://kexue.fm/archives/9554)
- :star2:How the RWKV language model works [Johan Sokrates Wind](https://www.mn.uio.no/math/english/people/aca/johanswi/index.html)- :star2:The RWKV language model: An RNN with the advantages of a transformer [Johan Sokrates Wind](https://johanwind.github.io/2023/03/23/rwkv_overview.html)
- The Unreasonable Effectiveness of Recurrent Neural Networks [Andrej Karpathy blog](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)### Code
- [RKWV-LM](https://github.com/BlinkDL/RWKV-LM)
- [ChatRWKV](https://github.com/BlinkDL/ChatRWKV)
- [RWKV_in_150_lines](https://github.com/BlinkDL/ChatRWKV/blob/main/RWKV_in_150_lines.py)