An open API service indexing awesome lists of open source software.

https://github.com/jepeake/tiny-flash-attention

flash attention in ~20 lines
https://github.com/jepeake/tiny-flash-attention

Last synced: 11 days ago
JSON representation

flash attention in ~20 lines

Awesome Lists containing this project

README

        

# _tiny-flash-attention_

**[Flash Attention](https://github.com/Dao-AILab/flash-attention)** is a fast & memory-efficient attention algorithm that fuses operations into a single kernel.

**Tiny Flash Attention** is a minimal implementation which expresses the forward-pass in ~20 lines of CUDA.

---

### _Algorithm_

![image](https://github.com/user-attachments/assets/43ef0742-fbdd-49d5-86ea-c3ef2172772d)

---

### _Running_

[Colab Demo](https://colab.research.google.com/drive/1qgFiS23-pCNx7MiHt5-Xycm-GdlBJ52R#scrollTo=zn9U4xkHiWzI)

---

```
@misc{dao2022flashattentionfastmemoryefficientexact,
title={FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness},
author={Tri Dao and Daniel Y. Fu and Stefano Ermon and Atri Rudra and Christopher Ré},
year={2022},
eprint={2205.14135},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2205.14135},
}
```