https://github.com/torotoki/simple-paged-attention

A simple implementation of PagedAttention purely written in CUDA and C++.
https://github.com/torotoki/simple-paged-attention

attention cpp cuda llm transformer

Last synced: about 2 months ago
JSON representation

A simple implementation of PagedAttention purely written in CUDA and C++.

Host: GitHub
URL: https://github.com/torotoki/simple-paged-attention
Owner: torotoki
License: mit
Created: 2025-06-30T19:10:46.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-08-09T10:18:29.000Z (11 months ago)
Last Synced: 2025-08-09T12:16:06.161Z (11 months ago)
Topics: attention, cpp, cuda, llm, transformer
Language: Cuda
Homepage:
Size: 44.9 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # simple-paged-attention

This is a CUDA and C++ implementation of PagedAttention.

This repo contains five types of attention implementations with and without the Key-Value caching mechanism (KV cache) as follows:

| Method                      | Non KV cache | KV cache |

|----------------------------------|:----------:|:----------:|

| Standard causal attention on CPU | ✅        | -         |

| Standard causal attention on GPU | ✅        | -             |

| Attention with autoregressive output (common in inference) on CPU  | ✅        | ✅             |

| Attention with autoregressive output (common in inference) on GPU  | ✅        | ✅            |

| PagedAttention on GPU | - | 🚧 |

## 📊 Benchmark Results:

```

Command: attention_cpu

Averaged Time (msec): 3.42877

Command: attention_gpu

Averaged Time (msec): 1.26602

Command: attention_cpu_autoregressive

Enable KV cache: 0

Averaged Time (msec): 18.6311

Command: attention_cpu_autoregressive

Enable KV cache: 1

Averaged Time (msec): 3.65721

Command: attention_gpu_autoregressive

Enable KV cache: 0

Averaged Time (msec): 3.11079

Command: attention_gpu_autoregressive

Enable KV cache: 1

Averaged Time (msec): 2.88444

```

## 📥 Get Started

Coming soon: installation, usage examples, and code walkthroughs.

Stay tuned and ⭐️ the repo to keep updated!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/torotoki/simple-paged-attention

Awesome Lists containing this project

README