https://github.com/massif-01/vllm_auto_tune

Automated vLLM server parameter tuning tool. Finds optimal max-num-seqs and max-num-batched-tokens to maximize throughput. Includes presets for Llama/Qwen/Mixtral, batch processing, result analysis with visualizations, and environment checks. Supports TPU/GPU with latency constraints.
https://github.com/massif-01/vllm_auto_tune

auto-tuning autotune benchmark optimization performance-tuning vllm

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/massif-01/vllm_auto_tune
Owner: massif-01
License: other
Created: 2025-10-30T10:21:57.000Z (8 months ago)
Default Branch: master
Last Pushed: 2025-10-30T10:26:06.000Z (8 months ago)
Last Synced: 2026-04-08T05:29:39.934Z (2 months ago)
Topics: auto-tuning, autotune, benchmark, optimization, performance-tuning, vllm
Language: Shell
Homepage: https://docs.vllm.ai
Size: 30.3 KB
Stars: 2
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# vLLM Auto-Tune

> 🌐 [English](README.md) | [中文](README_zh.md)

Automated vLLM Server Parameter Tuning Tool - Find optimal `max-num-seqs` and `max-num-batched-tokens` configurations to maximize throughput for your vLLM deployment.

## Features

- 🚀 **Automated Parameter Tuning**: Automatically tests parameter combinations to find optimal throughput
- 📊 **Result Analysis**: Built-in tools to analyze and visualize tuning results
- 🎯 **Model Presets**: Pre-configured scripts for popular models (Llama, Qwen, Mixtral)
- 🔄 **Batch Processing**: Run multiple tuning experiments sequentially
- 📈 **Visualization**: Generate charts and comparisons of different configurations
- ✅ **Environment Check**: Pre-flight checks to ensure your environment is ready

## Quick Start

### 1. Environment Check

```bash
bash scripts/server_check.sh
```

### 2. Single Model Tuning

```bash
# Llama models
bash scripts/tune_llama.sh meta-llama/Llama-3.1-8B-Instruct 1

# Qwen models
bash scripts/tune_qwen.sh Qwen/Qwen2.5-7B-Instruct 1

# Mixtral models
bash scripts/tune_mixtral.sh mistralai/Mixtral-8x7B-Instruct-v0.1 2
```

### 3. Batch Tuning (Multiple Models)

```bash
bash batch_auto_tune.sh configs/example_batch_config.json
```

## Configuration

Key environment variables:

| Variable | Description | Default |
|----------|-------------|---------|
| `VLLM_DIR` | Path to vLLM installation | `$HOME/vllm` |
| `MODEL` | Hugging Face model identifier | `meta-llama/Llama-3.1-8B-Instruct` |
| `SYSTEM` | Hardware platform (`TPU` or `GPU`) | `TPU` |
| `TP` | Tensor parallelism size | `1` |
| `INPUT_LEN` | Request input length | `4000` |
| `OUTPUT_LEN` | Request output length | `16` |
| `MAX_MODEL_LEN` | Maximum model length | `4096` |
| `MAX_LATENCY_ALLOWED_MS` | Max allowed P99 latency (ms) | `100000000000` |

## Result Analysis

```bash
# Analyze single result
python tools/analyze_results.py $BASE/auto-benchmark/YYYY_MM_DD_HH_MM/result.txt

# Compare multiple results
python tools/analyze_results.py $BASE/auto-benchmark/ --compare
```

## Output

Results are saved in `$BASE/auto-benchmark/YYYY_MM_DD_HH_MM/`:
- `result.txt`: Summary of all tested configurations
- `vllm_log_*.txt`: Server logs for each configuration
- `bm_log_*.txt`: Benchmark logs
- `profile/`: Profiler traces for the best configuration

## Project Structure

```
vllm_auto_tune/
├── README.md # English documentation
├── README_zh.md # Chinese documentation
├── auto_tune.sh # Main tuning script
├── batch_auto_tune.sh # Batch processing script
├── scripts/ # Helper scripts
│ ├── server_check.sh # Environment check
│ ├── tune_llama.sh # Llama preset
│ ├── tune_qwen.sh # Qwen preset
│ └── tune_mixtral.sh # Mixtral preset
├── configs/ # Configuration files
│ ├── models.json # Model presets
│ └── example_batch_config.json # Batch config example
└── tools/ # Analysis tools
└── analyze_results.py # Result analysis & visualization
```

## Prerequisites

- vLLM installed and accessible via `vllm` command
- Conda environment activated (e.g., `conda activate vllm`)
- System tools: `bc`, `jq`, `curl`
- Optional: `matplotlib`, `pandas` for visualization

## License

Apache-2.0 License

## Contributing

Contributions are welcome! Please feel free to submit Issues and Pull Requests.

---

⭐ **If this project helps you, please give us a Star!**

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/massif-01/vllm_auto_tune

Awesome Lists containing this project

README