Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/weiyu0824/flash-attention-lite
Basic Flash attention Implmentation
https://github.com/weiyu0824/flash-attention-lite
attention cuda torch
Last synced: 28 days ago
JSON representation
Basic Flash attention Implmentation
- Host: GitHub
- URL: https://github.com/weiyu0824/flash-attention-lite
- Owner: weiyu0824
- Created: 2024-07-31T19:26:41.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-08-25T21:18:01.000Z (3 months ago)
- Last Synced: 2024-08-25T23:02:57.830Z (3 months ago)
- Topics: attention, cuda, torch
- Language: Cuda
- Homepage:
- Size: 43 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Flash Attention Lite
This repository provides a very basic implementation of FlashAttention-1, including both the forward and backward passes. The goal of this project is to deepen my understanding of the FlashAttention mechanism.## Prerequisites:
- CUDA/CUDAtoolkit
- Torch## Run Code
- Check out `kernels/` directory to review the minimal implementaion.
- To compile cpp-extension, run:
```
bash local_build.sh
```
- To test flash-attention, run:
```
python3 main.py
```
## Details
- Forward pass: Implements follow dynamic block size described in the FlashAttention paper. The block size for rows and columns are different, with the number of threads per block equal to the block size of the rows.
- Backward pass: Uses a fixed block size of 32 for both rows and columns, with the number of threads per block also set to 32 for simplicity. The author metioned this simplification in [this issue](https://github.com/Dao-AILab/flash-attention/issues/618).
- Data Type: All tensors are fixed to the float data type for simplicity.## Reference
- https://github.com/tspeterkim/flash-attention-minimal