https://github.com/jepeake/tiny-flash-attention

flash attention in ~20 lines
https://github.com/jepeake/tiny-flash-attention

Last synced: 7 months ago
JSON representation

flash attention in ~20 lines

Host: GitHub
URL: https://github.com/jepeake/tiny-flash-attention
Owner: jepeake
Created: 2024-12-29T17:16:30.000Z (11 months ago)
Default Branch: main
Last Pushed: 2025-01-27T19:12:35.000Z (10 months ago)
Last Synced: 2025-03-26T04:41:33.555Z (8 months ago)
Language: Cuda
Homepage:
Size: 16.6 KB
Stars: 4
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-cuda-and-hpc - jepeake/tiny-flash-attention - flash-attention?style=social"/> : flash attention in ~20 lines. (Frameworks)

README

          # _tiny-flash-attention_

**[Flash Attention](https://github.com/Dao-AILab/flash-attention)** is a fast & memory-efficient attention algorithm that fuses operations into a single kernel. 

**Tiny Flash Attention** is a minimal implementation which expresses the forward-pass in ~20 lines of CUDA.

---

### _Algorithm_

![image](https://github.com/user-attachments/assets/43ef0742-fbdd-49d5-86ea-c3ef2172772d)

---

### _Running_

[Colab Demo](https://colab.research.google.com/drive/1qgFiS23-pCNx7MiHt5-Xycm-GdlBJ52R#scrollTo=zn9U4xkHiWzI)

---

```

@misc{dao2022flashattentionfastmemoryefficientexact,

      title={FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness}, 

      author={Tri Dao and Daniel Y. Fu and Stefano Ermon and Atri Rudra and Christopher Ré},

      year={2022},

      eprint={2205.14135},

      archivePrefix={arXiv},

      primaryClass={cs.LG},

      url={https://arxiv.org/abs/2205.14135}, 

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jepeake/tiny-flash-attention

Awesome Lists containing this project

README