{"id":28423181,"url":"https://github.com/neuralmagic/benchmark-compare","last_synced_at":"2025-06-24T20:31:08.461Z","repository":{"id":284458703,"uuid":"955003402","full_name":"neuralmagic/benchmark-compare","owner":"neuralmagic","description":"Fun with benchmarks","archived":false,"fork":false,"pushed_at":"2025-04-23T22:56:04.000Z","size":29,"stargazers_count":5,"open_issues_count":1,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-17T00:08:42.222Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/neuralmagic.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-03-26T00:35:23.000Z","updated_at":"2025-04-23T22:56:08.000Z","dependencies_parsed_at":"2025-03-26T02:25:28.876Z","dependency_job_id":"48d09b01-16f2-4bae-99bd-cc77b21cee6c","html_url":"https://github.com/neuralmagic/benchmark-compare","commit_stats":null,"previous_names":["robertgshaw2-redhat/benchmark-compare","neuralmagic/benchmark-compare"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/neuralmagic/benchmark-compare","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuralmagic%2Fbenchmark-compare","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuralmagic%2Fbenchmark-compare/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuralmagic%2Fbenchmark-compare/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuralmagic%2Fbenchmark-compare/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/neuralmagic","download_url":"https://codeload.github.com/neuralmagic/benchmark-compare/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuralmagic%2Fbenchmark-compare/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261751476,"owners_count":23204433,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-05T08:35:37.636Z","updated_at":"2025-06-24T20:31:08.452Z","avatar_url":"https://github.com/neuralmagic.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Benchmarking Comparison\n\n## Launch `vllm`\n\n### Install\n\n```bash\nuv venv venv-vllm --python 3.12\nsource venv-vllm/bin/activate\nuv pip install vllm==0.8.3\n```\n\n### Launch\n\n```bash\nMODEL=meta-llama/Llama-3.1-8B-Instruct\nvllm serve $MODEL --disable-log-requests\n```\n\n\u003e When inspecting logs, make sure prefix cache hit rate is low!\n\n## Launch `sglang`\n\n### Install\n\n```bash\nuv venv venv-sgl --python 3.12\nsource venv-sgl/bin/activate\nuv pip install \"sglang[all]==0.4.4.post1\" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python\n```\n\n### Launch Server\n\n```bash\nMODEL=meta-llama/Llama-3.1-8B-Instruct\npython3 -m sglang.launch_server --model-path $MODEL  --host 0.0.0.0 --port 8000 # --enable-mixed-chunk --enable-torch-compile\n```\n\n\u003e When inspecting logs, make sure cached-tokens is small!\n\n## Benchmark\n\n### Install\n```bash\ngit clone https://github.com/vllm-project/vllm.git\ncd vllm\ngit checkout benchmark-output\nuv venv venv-vllm-src --python 3.12\nsource venv-vllm-src/bin/activate\nVLLM_USE_PRECOMPILED=1 uv pip install -e .\nuv pip install pandas datasets\ncd ..\n```\n\n### Run Benchmark\n\n```bash\nMODEL=meta-llama/Llama-3.1-8B-Instruct FRAMEWORK=vllm bash ./benchmark_1000_in_100_out.sh\nMODEL=meta-llama/Llama-3.1-8B-Instruct FRAMEWORK=sgl bash ./benchmark_1000_in_100_out.sh\npython3 convert_to_csv.py --input-path results.json --output-path results.csv\n```\n\n### Pull Into Local\n\n```bash\nscp rshaw@beaker:~/benchmark-compare/results.csv ~/Desktop/\n```\n\n### Running in a Container\n\nBuild the container image using the Containerfile located in the root of the directory. This example uses quay registry and podman runtime. Swap these with the container tools of your choice.\n\nRun the container build from the root of this repo.\n\n```\nQUAY_ORG=\u003cquay_account_here\u003e\npodman build -f Containerfile -t quay.io/$QUAY_ORG/vllm-benchmark:latest\n```\n\n- Start the inference engine you are testing as shown above, e.g. SGLang or vLLM. Update the endpoint in the run command below if not using the default HOST:PORT values.\n\n```\nMODEL=meta-llama/Llama-3.1-8B-Instruct\npodman run --rm -it \\\n    --network host \\\n    -e MODEL=\"${MODEL}\" \\\n    -e FRAMEWORK=vllm \\\n    -e HF_TOKEN=\u003cINSERT_HF_TOKEN\u003e \\\n    -e PORT=8000 \\\n    -e HOST=127.0.0.1 \\\n    -v \"$(pwd)\":/host:Z \\\n    -w /opt/benchmark \\\n    quay.io/$QUAY_ORG/vllm-benchmark:latest\n```\n\nThe json benchmark results will be copied to the host machine in the same directory the container was run from.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneuralmagic%2Fbenchmark-compare","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fneuralmagic%2Fbenchmark-compare","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneuralmagic%2Fbenchmark-compare/lists"}