https://github.com/torotoki/simple-paged-attention
A simple implementation of PagedAttention purely written in CUDA and C++.
https://github.com/torotoki/simple-paged-attention
attention cpp cuda llm transformer
Last synced: about 1 month ago
JSON representation
A simple implementation of PagedAttention purely written in CUDA and C++.
- Host: GitHub
- URL: https://github.com/torotoki/simple-paged-attention
- Owner: torotoki
- License: mit
- Created: 2025-06-30T19:10:46.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-08-09T10:18:29.000Z (11 months ago)
- Last Synced: 2025-08-09T12:16:06.161Z (11 months ago)
- Topics: attention, cpp, cuda, llm, transformer
- Language: Cuda
- Homepage:
- Size: 44.9 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# simple-paged-attention
This is a CUDA and C++ implementation of PagedAttention.
This repo contains five types of attention implementations with and without the Key-Value caching mechanism (KV cache) as follows:
| Method | Non KV cache | KV cache |
|----------------------------------|:----------:|:----------:|
| Standard causal attention on CPU | ✅ | - |
| Standard causal attention on GPU | ✅ | - |
| Attention with autoregressive output (common in inference) on CPU | ✅ | ✅ |
| Attention with autoregressive output (common in inference) on GPU | ✅ | ✅ |
| PagedAttention on GPU | - | 🚧 |
## 📊 Benchmark Results:
```
Command: attention_cpu
Averaged Time (msec): 3.42877
Command: attention_gpu
Averaged Time (msec): 1.26602
Command: attention_cpu_autoregressive
Enable KV cache: 0
Averaged Time (msec): 18.6311
Command: attention_cpu_autoregressive
Enable KV cache: 1
Averaged Time (msec): 3.65721
Command: attention_gpu_autoregressive
Enable KV cache: 0
Averaged Time (msec): 3.11079
Command: attention_gpu_autoregressive
Enable KV cache: 1
Averaged Time (msec): 2.88444
```
## 📥 Get Started
Coming soon: installation, usage examples, and code walkthroughs.
Stay tuned and ⭐️ the repo to keep updated!