https://github.com/jepeake/tiny-flash-attention
flash attention in ~20 lines
https://github.com/jepeake/tiny-flash-attention
Last synced: 11 days ago
JSON representation
flash attention in ~20 lines
- Host: GitHub
- URL: https://github.com/jepeake/tiny-flash-attention
- Owner: jepeake
- Created: 2024-12-29T17:16:30.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-01-27T19:12:35.000Z (3 months ago)
- Last Synced: 2025-03-26T04:41:33.555Z (28 days ago)
- Language: Cuda
- Homepage:
- Size: 16.6 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-cuda-and-hpc - jepeake/tiny-flash-attention - flash-attention?style=social"/> : flash attention in ~20 lines. (Frameworks)
README
# _tiny-flash-attention_
**[Flash Attention](https://github.com/Dao-AILab/flash-attention)** is a fast & memory-efficient attention algorithm that fuses operations into a single kernel.
**Tiny Flash Attention** is a minimal implementation which expresses the forward-pass in ~20 lines of CUDA.
---
### _Algorithm_

---
### _Running_
[Colab Demo](https://colab.research.google.com/drive/1qgFiS23-pCNx7MiHt5-Xycm-GdlBJ52R#scrollTo=zn9U4xkHiWzI)
---
```
@misc{dao2022flashattentionfastmemoryefficientexact,
title={FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness},
author={Tri Dao and Daniel Y. Fu and Stefano Ermon and Atri Rudra and Christopher Ré},
year={2022},
eprint={2205.14135},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2205.14135},
}
```