https://github.com/massif-01/vllm_auto_tune
Automated vLLM server parameter tuning tool. Finds optimal max-num-seqs and max-num-batched-tokens to maximize throughput. Includes presets for Llama/Qwen/Mixtral, batch processing, result analysis with visualizations, and environment checks. Supports TPU/GPU with latency constraints.
https://github.com/massif-01/vllm_auto_tune
auto-tuning autotune benchmark optimization performance-tuning vllm
Last synced: about 1 month ago
JSON representation
Automated vLLM server parameter tuning tool. Finds optimal max-num-seqs and max-num-batched-tokens to maximize throughput. Includes presets for Llama/Qwen/Mixtral, batch processing, result analysis with visualizations, and environment checks. Supports TPU/GPU with latency constraints.
- Host: GitHub
- URL: https://github.com/massif-01/vllm_auto_tune
- Owner: massif-01
- License: other
- Created: 2025-10-30T10:21:57.000Z (8 months ago)
- Default Branch: master
- Last Pushed: 2025-10-30T10:26:06.000Z (8 months ago)
- Last Synced: 2026-04-08T05:29:39.934Z (2 months ago)
- Topics: auto-tuning, autotune, benchmark, optimization, performance-tuning, vllm
- Language: Shell
- Homepage: https://docs.vllm.ai
- Size: 30.3 KB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# vLLM Auto-Tune
> π [English](README.md) | [δΈζ](README_zh.md)
Automated vLLM Server Parameter Tuning Tool - Find optimal `max-num-seqs` and `max-num-batched-tokens` configurations to maximize throughput for your vLLM deployment.
## Features
- π **Automated Parameter Tuning**: Automatically tests parameter combinations to find optimal throughput
- π **Result Analysis**: Built-in tools to analyze and visualize tuning results
- π― **Model Presets**: Pre-configured scripts for popular models (Llama, Qwen, Mixtral)
- π **Batch Processing**: Run multiple tuning experiments sequentially
- π **Visualization**: Generate charts and comparisons of different configurations
- β
**Environment Check**: Pre-flight checks to ensure your environment is ready
## Quick Start
### 1. Environment Check
```bash
bash scripts/server_check.sh
```
### 2. Single Model Tuning
```bash
# Llama models
bash scripts/tune_llama.sh meta-llama/Llama-3.1-8B-Instruct 1
# Qwen models
bash scripts/tune_qwen.sh Qwen/Qwen2.5-7B-Instruct 1
# Mixtral models
bash scripts/tune_mixtral.sh mistralai/Mixtral-8x7B-Instruct-v0.1 2
```
### 3. Batch Tuning (Multiple Models)
```bash
bash batch_auto_tune.sh configs/example_batch_config.json
```
## Configuration
Key environment variables:
| Variable | Description | Default |
|----------|-------------|---------|
| `VLLM_DIR` | Path to vLLM installation | `$HOME/vllm` |
| `MODEL` | Hugging Face model identifier | `meta-llama/Llama-3.1-8B-Instruct` |
| `SYSTEM` | Hardware platform (`TPU` or `GPU`) | `TPU` |
| `TP` | Tensor parallelism size | `1` |
| `INPUT_LEN` | Request input length | `4000` |
| `OUTPUT_LEN` | Request output length | `16` |
| `MAX_MODEL_LEN` | Maximum model length | `4096` |
| `MAX_LATENCY_ALLOWED_MS` | Max allowed P99 latency (ms) | `100000000000` |
## Result Analysis
```bash
# Analyze single result
python tools/analyze_results.py $BASE/auto-benchmark/YYYY_MM_DD_HH_MM/result.txt
# Compare multiple results
python tools/analyze_results.py $BASE/auto-benchmark/ --compare
```
## Output
Results are saved in `$BASE/auto-benchmark/YYYY_MM_DD_HH_MM/`:
- `result.txt`: Summary of all tested configurations
- `vllm_log_*.txt`: Server logs for each configuration
- `bm_log_*.txt`: Benchmark logs
- `profile/`: Profiler traces for the best configuration
## Project Structure
```
vllm_auto_tune/
βββ README.md # English documentation
βββ README_zh.md # Chinese documentation
βββ auto_tune.sh # Main tuning script
βββ batch_auto_tune.sh # Batch processing script
βββ scripts/ # Helper scripts
β βββ server_check.sh # Environment check
β βββ tune_llama.sh # Llama preset
β βββ tune_qwen.sh # Qwen preset
β βββ tune_mixtral.sh # Mixtral preset
βββ configs/ # Configuration files
β βββ models.json # Model presets
β βββ example_batch_config.json # Batch config example
βββ tools/ # Analysis tools
βββ analyze_results.py # Result analysis & visualization
```
## Prerequisites
- vLLM installed and accessible via `vllm` command
- Conda environment activated (e.g., `conda activate vllm`)
- System tools: `bc`, `jq`, `curl`
- Optional: `matplotlib`, `pandas` for visualization
## License
Apache-2.0 License
## Contributing
Contributions are welcome! Please feel free to submit Issues and Pull Requests.
---
β **If this project helps you, please give us a Star!**