https://github.com/DefTruth/ffpa-attn-mma
📚[WIP] FFPA: Yet antother Faster Flash Prefill Attention with O(1)⚡️GPU SRAM complexity for headdim > 256, 1.8x~3x↑🎉faster vs SDPA EA.
https://github.com/DefTruth/ffpa-attn-mma
attention cuda flash-attention mlsys sdpa tensor-cores
Last synced: about 1 month ago
JSON representation
📚[WIP] FFPA: Yet antother Faster Flash Prefill Attention with O(1)⚡️GPU SRAM complexity for headdim > 256, 1.8x~3x↑🎉faster vs SDPA EA.
- Host: GitHub
- URL: https://github.com/DefTruth/ffpa-attn-mma
- Owner: DefTruth
- License: gpl-3.0
- Created: 2024-11-29T11:47:23.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-01-20T07:20:27.000Z (10 months ago)
- Last Synced: 2025-01-20T08:26:05.808Z (10 months ago)
- Topics: attention, cuda, flash-attention, mlsys, sdpa, tensor-cores
- Language: Cuda
- Homepage:
- Size: 4.08 MB
- Stars: 53
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-LLM-Inference - **FFPA** - attn-mma]](https://github.com/DefTruth/ffpa-attn-mma) |⭐️⭐️ | (📖Contents / 📖IO/FLOPs-Aware/Sparse Attention ([©️back👆🏻](#paperlist)))
README
# Notes 👇👇
This project has been moved to [xlite-dev/ffpa-attn-mma](https://github.com/xlite-dev/ffpa-attn-mma). Please check [xlite-dev/ffpa-attn-mma](https://github.com/xlite-dev/ffpa-attn-mma) for latest updates! 👏👋
---