Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/weiyu0824/flash-attention-lite

Basic Flash attention Implmentation
https://github.com/weiyu0824/flash-attention-lite

attention cuda torch

Last synced: 8 days ago
JSON representation

Basic Flash attention Implmentation

Host: GitHub
URL: https://github.com/weiyu0824/flash-attention-lite
Owner: weiyu0824
Created: 2024-07-31T19:26:41.000Z (7 months ago)
Default Branch: main
Last Pushed: 2024-08-25T21:18:01.000Z (6 months ago)
Last Synced: 2024-12-10T22:41:24.385Z (2 months ago)
Topics: attention, cuda, torch
Language: Cuda
Homepage:
Size: 43 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Flash Attention Lite
This repository provides a very basic implementation of FlashAttention-1, including both the forward and backward passes. The goal of this project is to deepen my understanding of the FlashAttention mechanism.

## Prerequisites:
- CUDA/CUDAtoolkit
- Torch

## Run Code
- Check out `kernels/` directory to review the minimal implementaion.

- To compile cpp-extension, run:
```
bash local_build.sh
```
- To test flash-attention, run:
```
python3 main.py
```

## Details
- Forward pass: Implements follow dynamic block size described in the FlashAttention paper. The block size for rows and columns are different, with the number of threads per block equal to the block size of the rows.
- Backward pass: Uses a fixed block size of 32 for both rows and columns, with the number of threads per block also set to 32 for simplicity. The author metioned this simplification in [this issue](https://github.com/Dao-AILab/flash-attention/issues/618).
- Data Type: All tensors are fixed to the float data type for simplicity.

## Reference
- https://github.com/tspeterkim/flash-attention-minimal