{"id":34577386,"url":"https://github.com/dsw7/local-llm-benchmark","last_synced_at":"2026-04-25T02:37:50.935Z","repository":{"id":328944000,"uuid":"1117454332","full_name":"dsw7/local-llm-benchmark","owner":"dsw7","description":"Miscellaneous scripts I use for benchmarking my locally hosted LLMs","archived":false,"fork":false,"pushed_at":"2026-02-01T11:07:09.000Z","size":47,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-02-01T20:53:54.384Z","etag":null,"topics":["llm","ollama","statistics"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dsw7.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-16T10:35:51.000Z","updated_at":"2026-02-01T11:07:08.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/dsw7/local-llm-benchmark","commit_stats":null,"previous_names":["dsw7/local-llm-benchmark"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dsw7/local-llm-benchmark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsw7%2Flocal-llm-benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsw7%2Flocal-llm-benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsw7%2Flocal-llm-benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsw7%2Flocal-llm-benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dsw7","download_url":"https://codeload.github.com/dsw7/local-llm-benchmark/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsw7%2Flocal-llm-benchmark/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32248286,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-24T13:21:15.438Z","status":"online","status_checked_at":"2026-04-25T02:00:06.260Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llm","ollama","statistics"],"created_at":"2025-12-24T09:56:34.968Z","updated_at":"2026-04-25T02:37:50.928Z","avatar_url":"https://github.com/dsw7.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Local LLM benchmarking\nMiscellenous utilities for benchmarking locally hosted LLMs (i.e. via\n[Ollama](https://ollama.com/)) for various platform/hardware permutations.\n\n**Note that I am not interested in benchmarking the models themselves. I am\ninterested in benchmarking model inference times on my particular hardware.**\nMany projects exist for benchmarking models themselves, such as\n[SuperGLUE](https://super.gluebenchmark.com/).\n\nI use this program to benchmark my infrastructure for the following cases:\n- When running [FuncGraft](https://github.com/dsw7/FuncGraft) in [local\n  mode](https://github.com/dsw7/FuncGraft?tab=readme-ov-file#toggling-between-llm-providers)\n- When running [GPTifier](https://github.com/dsw7/GPTifier) commands via the Ollama stream\n\n## Table of Contents\n- [About](#about)\n- [Setup](#setup)\n- [Benchmarking LLM performance](#benchmarking-llm-performance)\n  - [Step 1 - Run the benchmarks](#step-1---run-the-benchmarks)\n  - [Step 2 - Generate Gaussian distributions + boxplots for inference times](#step-2---generate-gaussian-distributions--boxplots-for-inference-times)\n  - [Step 3 - Generate a LaTeX report for the measurements](#step-3---generate-a-latex-report-for-the-measurements)\n\n## About\nThis program runs a dummy prompt against a specified LLM on several machines\nand several times. The execution times are gathered from which various basic\nstatistics are computed. This allows me to get a rough estimation of how\nvariables such as GPU models, available VRAM, etc., impact the overall\nperformance of my LLMs on prem.\n\n## Setup\nCopy the example TOML file:\n```bash\ncp configs_example.toml configs.toml\n```\nThe `configs.toml` file is the \"production\" file and is excluded via\n`.gitignore`. Edit the file to match your specifications (i.e. set the dummy\nprompt and IP addresses).\n\n## Benchmarking LLM performance\n\n### Step 1 - Run the benchmarks\nSet up a Python virtual environment and run the bash script:\n```bash\n./benchmark\n```\nAnd input \u003ckbd\u003e1\u003c/kbd\u003e when prompted. The program will gather `rounds`\n(specified via `configs.toml`) number of inference times for `prompt` against\n`model` for each `host`. When complete, the program will output something akin\nto:\n```\nAll values are provided in seconds\n┌──────────────────┬───────────────┬──────────┬─────────┬──────────┬──────────┬──────────┬───────────────┐\n│ Host             │ Model         │     Mean │      SD │   Median │      Min │      Max │   Sample size │\n├──────────────────┼───────────────┼──────────┼─────────┼──────────┼──────────┼──────────┼───────────────┤\n│ localhost:11434  │ gemma3:latest │  2.18015 │ 0.16028 │  2.10775 │  2.09112 │  2.46496 │             5 │\n│ 10.0.0.115:11434 │ gemma3:latest │ 18.0551  │ 0.62221 │ 17.9943  │ 17.3745  │ 19.0215  │             5 │\n└──────────────────┴───────────────┴──────────┴─────────┴──────────┴──────────┴──────────┴───────────────┘\n```\nIf sufficient, one can stop here.\n\n### Step 2 - Generate Gaussian distributions + boxplots for inference times\nSet up a Python virtual environment as before and run the bash script:\n```bash\n./benchmark\n```\nThen input \u003ckbd\u003e2\u003c/kbd\u003e when prompted. The program will generate a set of\nGaussian distributions for the inference times obtained from each machine. For\nexample:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=600 src=./docs/example_gemma_3_n_50.svg\u003e\n\u003c/p\u003e\n\nIn this example, 50 trials were performed. The mean inference time is around\n2.15 seconds. One value appears to be more than 3 standard deviations away from\nthe mean, and this value could be interpreted as an outlier (perhaps as a\nresult of a spike in GPU demand). The program will also generate boxplots for\nthe inference times across servers in the network. This can be useful for\nevaluating the performance of individual servers with respect to the network:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=600 src=./docs/boxplot.svg\u003e\n\u003c/p\u003e\n\n### Step 3 - Generate a LaTeX report for the measurements\nAs before, run the bash script:\n```bash\n./benchmark\n```\nThen input \u003ckbd\u003e3\u003c/kbd\u003e when prompted. The program will generate a full,\ncomprehensive report of all the statistics gathered as part of the benchmarking\nprocess. Note that this requires that steps 1 and 2 be previously completed.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdsw7%2Flocal-llm-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdsw7%2Flocal-llm-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdsw7%2Flocal-llm-benchmark/lists"}