An open API service indexing awesome lists of open source software.

https://github.com/massif-01/vllm_auto_tune

Automated vLLM server parameter tuning tool. Finds optimal max-num-seqs and max-num-batched-tokens to maximize throughput. Includes presets for Llama/Qwen/Mixtral, batch processing, result analysis with visualizations, and environment checks. Supports TPU/GPU with latency constraints.
https://github.com/massif-01/vllm_auto_tune

auto-tuning autotune benchmark optimization performance-tuning vllm

Last synced: about 1 month ago
JSON representation

Automated vLLM server parameter tuning tool. Finds optimal max-num-seqs and max-num-batched-tokens to maximize throughput. Includes presets for Llama/Qwen/Mixtral, batch processing, result analysis with visualizations, and environment checks. Supports TPU/GPU with latency constraints.

Awesome Lists containing this project

README

          

# vLLM Auto-Tune

> 🌐 [English](README.md) | [δΈ­ζ–‡](README_zh.md)

Automated vLLM Server Parameter Tuning Tool - Find optimal `max-num-seqs` and `max-num-batched-tokens` configurations to maximize throughput for your vLLM deployment.

## Features

- πŸš€ **Automated Parameter Tuning**: Automatically tests parameter combinations to find optimal throughput
- πŸ“Š **Result Analysis**: Built-in tools to analyze and visualize tuning results
- 🎯 **Model Presets**: Pre-configured scripts for popular models (Llama, Qwen, Mixtral)
- πŸ”„ **Batch Processing**: Run multiple tuning experiments sequentially
- πŸ“ˆ **Visualization**: Generate charts and comparisons of different configurations
- βœ… **Environment Check**: Pre-flight checks to ensure your environment is ready

## Quick Start

### 1. Environment Check

```bash
bash scripts/server_check.sh
```

### 2. Single Model Tuning

```bash
# Llama models
bash scripts/tune_llama.sh meta-llama/Llama-3.1-8B-Instruct 1

# Qwen models
bash scripts/tune_qwen.sh Qwen/Qwen2.5-7B-Instruct 1

# Mixtral models
bash scripts/tune_mixtral.sh mistralai/Mixtral-8x7B-Instruct-v0.1 2
```

### 3. Batch Tuning (Multiple Models)

```bash
bash batch_auto_tune.sh configs/example_batch_config.json
```

## Configuration

Key environment variables:

| Variable | Description | Default |
|----------|-------------|---------|
| `VLLM_DIR` | Path to vLLM installation | `$HOME/vllm` |
| `MODEL` | Hugging Face model identifier | `meta-llama/Llama-3.1-8B-Instruct` |
| `SYSTEM` | Hardware platform (`TPU` or `GPU`) | `TPU` |
| `TP` | Tensor parallelism size | `1` |
| `INPUT_LEN` | Request input length | `4000` |
| `OUTPUT_LEN` | Request output length | `16` |
| `MAX_MODEL_LEN` | Maximum model length | `4096` |
| `MAX_LATENCY_ALLOWED_MS` | Max allowed P99 latency (ms) | `100000000000` |

## Result Analysis

```bash
# Analyze single result
python tools/analyze_results.py $BASE/auto-benchmark/YYYY_MM_DD_HH_MM/result.txt

# Compare multiple results
python tools/analyze_results.py $BASE/auto-benchmark/ --compare
```

## Output

Results are saved in `$BASE/auto-benchmark/YYYY_MM_DD_HH_MM/`:
- `result.txt`: Summary of all tested configurations
- `vllm_log_*.txt`: Server logs for each configuration
- `bm_log_*.txt`: Benchmark logs
- `profile/`: Profiler traces for the best configuration

## Project Structure

```
vllm_auto_tune/
β”œβ”€β”€ README.md # English documentation
β”œβ”€β”€ README_zh.md # Chinese documentation
β”œβ”€β”€ auto_tune.sh # Main tuning script
β”œβ”€β”€ batch_auto_tune.sh # Batch processing script
β”œβ”€β”€ scripts/ # Helper scripts
β”‚ β”œβ”€β”€ server_check.sh # Environment check
β”‚ β”œβ”€β”€ tune_llama.sh # Llama preset
β”‚ β”œβ”€β”€ tune_qwen.sh # Qwen preset
β”‚ └── tune_mixtral.sh # Mixtral preset
β”œβ”€β”€ configs/ # Configuration files
β”‚ β”œβ”€β”€ models.json # Model presets
β”‚ └── example_batch_config.json # Batch config example
└── tools/ # Analysis tools
└── analyze_results.py # Result analysis & visualization
```

## Prerequisites

- vLLM installed and accessible via `vllm` command
- Conda environment activated (e.g., `conda activate vllm`)
- System tools: `bc`, `jq`, `curl`
- Optional: `matplotlib`, `pandas` for visualization

## License

Apache-2.0 License

## Contributing

Contributions are welcome! Please feel free to submit Issues and Pull Requests.

---

⭐ **If this project helps you, please give us a Star!**