https://github.com/elinx/llm-mem-calculator

Interactive KV cache memory calculator for LLMs — supports MLA, GQA, hybrid attention, sliding window, and linear attention architectures. Estimate GPU memory for serving any model at any context length.
https://github.com/elinx/llm-mem-calculator

calculator gpu-memory kv-cache llm llm-serving vllm

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/elinx/llm-mem-calculator
Owner: elinx
Created: 2026-06-06T12:49:48.000Z (about 1 month ago)
Default Branch: main
Last Pushed: 2026-06-06T15:01:13.000Z (about 1 month ago)
Last Synced: 2026-06-06T15:08:15.138Z (about 1 month ago)
Topics: calculator, gpu-memory, kv-cache, llm, llm-serving, vllm
Language: JavaScript
Homepage: https://elinx.github.io/llm-mem-calculator/
Size: 493 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# KV Cache Calculator

A web-based tool for estimating LLM KV cache memory requirements. Supports modern architectures including MLA, GQA, hybrid attention, sliding window, and linear attention models.

**Live Demo**: [elinx.github.io/llm-mem-calculator](https://elinx.github.io/llm-mem-calculator/)

## Calculator

Calculate KV cache size for a single model with customizable parameters — context length, batch size, KV precision, and more.

![Calculator](./assets/calculator.png)

## Compare

Compare KV cache memory across multiple models side-by-side with an interactive chart.

![Compare](./assets/compare.png)

## Supported Architectures

| Architecture | Example Models |
|---|---|
| Standard GQA | Qwen3, Llama 3.x, Qwen2.5, MiniMax M2.x |
| MLA (Multi-head Latent Attention) | DeepSeek V3, DeepSeek R1, Kimi K2.5/K2.6 |
| DSA+MLA (DeepSeek V4 Hybrid) | DeepSeek V4 Pro, DeepSeek V4 Flash, DeepSeek V3.2, GLM-5/5.1 |
| Mixed Full + Sliding Window | Gemma 4, Cohere Command, MiMo-V2.5 |
| Linear + Full Hybrid | Qwen3.5, Qwen3.6 |

## Features

- **Precision options**: BF16/FP16, FP8/INT8, FP4/INT4
- **Draft KV cache**: Account for MTP/draft model KV layers
- **Linear attention KV**: Include linear attention layer contributions
- **Context presets**: Quick-select from 1K to 1M tokens
- **Breakdown view**: Detailed per-layer KV cache breakdown
- **Formula display**: Shows the exact formula used for each model
- **Dark mode**: Toggle between light and dark themes
- **Chart export**: Download comparison charts as PNG or copy to clipboard

## Development

No build step required — just open `index.html` in a browser or serve the directory with any static file server.

```bash
# Quick local server
python3 -m http.server 8765
```

## License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/elinx/llm-mem-calculator

Awesome Lists containing this project

README