https://github.com/epfml/dynamic-sparse-flash-attention
https://github.com/epfml/dynamic-sparse-flash-attention
Last synced: 5 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/epfml/dynamic-sparse-flash-attention
- Owner: epfml
- License: other
- Created: 2023-05-24T06:16:16.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-06-02T12:28:57.000Z (over 2 years ago)
- Last Synced: 2025-04-28T12:38:49.594Z (10 months ago)
- Language: Jupyter Notebook
- Size: 177 KB
- Stars: 143
- Watchers: 7
- Forks: 6
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Dynamic Sparse FlashAttention
Code to reproduce results for the paper "Faster Causal Attention Over Large Sequences Through Sparse Flash Attention"
# Setup
To install the required python dependencies, first run:
```bash
pip install -r ./requirements.txt
```
Then install Triton:
```bash
git clone https://github.com/openai/triton.git
cd triton
git checkout b2a757d00028fe844a93904036a18e8670bfe92f
cd python
pip install cmake
pip install -e .
```
In the command above we set the Triton library to the commit used in our experiments. Feel free to experiment with later Triton versions.
# Reproducing our LM experiments on OpenWebText2
**GPU requirements:** Preferably, you need at least one A100. Some of our experiments use data-parallelism with up to 3 A100s. You should have no problem running those experiments on any GPU supporting `bfloat16`, you might have to change the model parameters to adapt to the memory available.
Go in the `openwebtext2-experiments` folder and run the `script/train-LMs.sh` command.
# Reproducing our runtime results
**GPU requirements:** We used one A100.
For the Hash-sparse and QK-sparse results, go in the `runtime-experiments` folder and check the `timeperf-hash-and-qk-sparse.ipynb` notebook.
# Reproducing our Reformer results
Coming soon