{"id":49666733,"url":"https://github.com/massif-01/vllm_auto_tune","last_synced_at":"2026-05-06T17:03:12.147Z","repository":{"id":349913106,"uuid":"1086368573","full_name":"massif-01/vllm_auto_tune","owner":"massif-01","description":"Automated vLLM server parameter tuning tool. Finds optimal max-num-seqs and max-num-batched-tokens to maximize throughput. Includes presets for Llama/Qwen/Mixtral, batch processing, result analysis with visualizations, and environment checks. Supports TPU/GPU with latency constraints.","archived":false,"fork":false,"pushed_at":"2025-10-30T10:26:06.000Z","size":31,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-04-08T05:29:39.934Z","etag":null,"topics":["auto-tuning","autotune","benchmark","optimization","performance-tuning","vllm"],"latest_commit_sha":null,"homepage":"https://docs.vllm.ai","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/massif-01.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-30T10:21:57.000Z","updated_at":"2025-11-23T18:10:05.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/massif-01/vllm_auto_tune","commit_stats":null,"previous_names":["massif-01/vllm_auto_tune"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/massif-01/vllm_auto_tune","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/massif-01%2Fvllm_auto_tune","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/massif-01%2Fvllm_auto_tune/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/massif-01%2Fvllm_auto_tune/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/massif-01%2Fvllm_auto_tune/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/massif-01","download_url":"https://codeload.github.com/massif-01/vllm_auto_tune/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/massif-01%2Fvllm_auto_tune/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32703532,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-06T08:33:17.875Z","status":"ssl_error","status_checked_at":"2026-05-06T08:33:17.221Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["auto-tuning","autotune","benchmark","optimization","performance-tuning","vllm"],"created_at":"2026-05-06T17:03:11.056Z","updated_at":"2026-05-06T17:03:12.136Z","avatar_url":"https://github.com/massif-01.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# vLLM Auto-Tune\n\n\u003e 🌐 [English](README.md) | [中文](README_zh.md)\n\nAutomated vLLM Server Parameter Tuning Tool - Find optimal `max-num-seqs` and `max-num-batched-tokens` configurations to maximize throughput for your vLLM deployment.\n\n## Features\n\n- 🚀 **Automated Parameter Tuning**: Automatically tests parameter combinations to find optimal throughput\n- 📊 **Result Analysis**: Built-in tools to analyze and visualize tuning results\n- 🎯 **Model Presets**: Pre-configured scripts for popular models (Llama, Qwen, Mixtral)\n- 🔄 **Batch Processing**: Run multiple tuning experiments sequentially\n- 📈 **Visualization**: Generate charts and comparisons of different configurations\n- ✅ **Environment Check**: Pre-flight checks to ensure your environment is ready\n\n## Quick Start\n\n### 1. Environment Check\n\n```bash\nbash scripts/server_check.sh\n```\n\n### 2. Single Model Tuning\n\n```bash\n# Llama models\nbash scripts/tune_llama.sh meta-llama/Llama-3.1-8B-Instruct 1\n\n# Qwen models\nbash scripts/tune_qwen.sh Qwen/Qwen2.5-7B-Instruct 1\n\n# Mixtral models\nbash scripts/tune_mixtral.sh mistralai/Mixtral-8x7B-Instruct-v0.1 2\n```\n\n### 3. Batch Tuning (Multiple Models)\n\n```bash\nbash batch_auto_tune.sh configs/example_batch_config.json\n```\n\n## Configuration\n\nKey environment variables:\n\n| Variable | Description | Default |\n|----------|-------------|---------|\n| `VLLM_DIR` | Path to vLLM installation | `$HOME/vllm` |\n| `MODEL` | Hugging Face model identifier | `meta-llama/Llama-3.1-8B-Instruct` |\n| `SYSTEM` | Hardware platform (`TPU` or `GPU`) | `TPU` |\n| `TP` | Tensor parallelism size | `1` |\n| `INPUT_LEN` | Request input length | `4000` |\n| `OUTPUT_LEN` | Request output length | `16` |\n| `MAX_MODEL_LEN` | Maximum model length | `4096` |\n| `MAX_LATENCY_ALLOWED_MS` | Max allowed P99 latency (ms) | `100000000000` |\n\n## Result Analysis\n\n```bash\n# Analyze single result\npython tools/analyze_results.py $BASE/auto-benchmark/YYYY_MM_DD_HH_MM/result.txt\n\n# Compare multiple results\npython tools/analyze_results.py $BASE/auto-benchmark/ --compare\n```\n\n## Output\n\nResults are saved in `$BASE/auto-benchmark/YYYY_MM_DD_HH_MM/`:\n- `result.txt`: Summary of all tested configurations\n- `vllm_log_*.txt`: Server logs for each configuration\n- `bm_log_*.txt`: Benchmark logs\n- `profile/`: Profiler traces for the best configuration\n\n## Project Structure\n\n```\nvllm_auto_tune/\n├── README.md                      # English documentation\n├── README_zh.md                   # Chinese documentation\n├── auto_tune.sh                   # Main tuning script\n├── batch_auto_tune.sh            # Batch processing script\n├── scripts/                       # Helper scripts\n│   ├── server_check.sh           # Environment check\n│   ├── tune_llama.sh            # Llama preset\n│   ├── tune_qwen.sh             # Qwen preset\n│   └── tune_mixtral.sh          # Mixtral preset\n├── configs/                       # Configuration files\n│   ├── models.json               # Model presets\n│   └── example_batch_config.json # Batch config example\n└── tools/                        # Analysis tools\n    └── analyze_results.py       # Result analysis \u0026 visualization\n```\n\n## Prerequisites\n\n- vLLM installed and accessible via `vllm` command\n- Conda environment activated (e.g., `conda activate vllm`)\n- System tools: `bc`, `jq`, `curl`\n- Optional: `matplotlib`, `pandas` for visualization\n\n## License\n\nApache-2.0 License\n\n## Contributing\n\nContributions are welcome! Please feel free to submit Issues and Pull Requests.\n\n---\n\n⭐ **If this project helps you, please give us a Star!**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmassif-01%2Fvllm_auto_tune","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmassif-01%2Fvllm_auto_tune","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmassif-01%2Fvllm_auto_tune/lists"}