Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/BlinkDL/LinearAttentionArena
Here we will test various linear attention designs.
https://github.com/BlinkDL/LinearAttentionArena
Last synced: about 1 month ago
JSON representation
Here we will test various linear attention designs.
- Host: GitHub
- URL: https://github.com/BlinkDL/LinearAttentionArena
- Owner: BlinkDL
- License: apache-2.0
- Created: 2024-04-05T10:55:43.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-04-25T09:35:47.000Z (5 months ago)
- Last Synced: 2024-05-30T05:18:20.099Z (4 months ago)
- Language: Python
- Homepage:
- Size: 229 KB
- Stars: 50
- Watchers: 8
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# LinearAttentionArena
Here we will test various linear attention designs.
```
pip install pytorch-lightning==1.9.5 torch deepspeed wandb ninja --upgrade
```
RWKV-6.0b differences (vs RWKV-6.0): GroupNorm => LayerNorm, and remove "gate" in TimeMix, so the params count is lower.
```
# Example: RWKV-6.0b L12-D768 (189M params) on 4x4090, minipile 1.5B tokens loss 2.812./prepare.sh --model_type "x060b" --layer 12 --emb 768 --ctx_len 512 --suffix "-0"
./train.sh --model_type "x060b" --layer 12 --emb 768 --lr_init "6e-4" --lr_final "2e-4" --ctx_len 512 --n_gpu 4 --m_bsz 32 --grad_cp 0 --save_period 1000 --suffix "-0"
```
```
# Example: Mamba L12-D768 (191M params) on 4x4090, minipile 1.5B tokens loss 2.885./prepare.sh --model_type "mamba" --layer 12 --emb 768 --ctx_len 512 --suffix "-0"
./train.sh --model_type "mamba" --layer 12 --emb 768 --lr_init "6e-4" --lr_final "2e-4" --ctx_len 512 --n_gpu 4 --m_bsz 32 --grad_cp 0 --save_period 1000 --suffix "-0"
```
![rwkv-x060b-mamba](rwkv-x060b-mamba.png)