https://github.com/orion-zhen/llm-throughput-eval
evaluate llm's generation speed via API
https://github.com/orion-zhen/llm-throughput-eval
llm llm-evaluation throughput throughput-performance
Last synced: 4 months ago
JSON representation
evaluate llm's generation speed via API
- Host: GitHub
- URL: https://github.com/orion-zhen/llm-throughput-eval
- Owner: Orion-zhen
- License: gpl-3.0
- Created: 2024-10-09T07:04:57.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-10T15:05:07.000Z (about 1 year ago)
- Last Synced: 2025-06-22T09:38:46.790Z (about 1 year ago)
- Topics: llm, llm-evaluation, throughput, throughput-performance
- Language: Python
- Homepage:
- Size: 35.2 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# llm-throughput-eval
Evaluate llm's generation speed via API
## Quick Start
### Get the repository
Clone:
```shell
git clone https://github.com/Orion-zhen/llm-throughput-eval.git
```
Install dependencies:
```shell
pip install -r requirements.txt
```
### Go
```shell
python evel.py -m -u -n -c -t
```
Example:
```shell
python eval.py -m "qwq:32b"
```
This will send 16 requests with 4 concurrency to local ollama API (should be ) with model qwq:32b
Full arguments:
```shell
usage: eval.py [-h] [--concurrency CONCURRENCY] [--requests REQUESTS] [--url URL] [--model MODEL]
Async HTTP Benchmark Tool
options:
-h, --help show this help message and exit
--concurrency, -c CONCURRENCY
Maximum concurrent requests.
--requests, -n REQUESTS
Total number of requests to send.
--url, -u URL Base URL for the API (e.g., http://localhost:8000). '/v1/chat/completions' will be appended.
--model, -m MODEL Model name to use in the request payload.
--token, -t TOKEN Bearer token for API authentication.
```
## Credits
The code is inspired by [this article](https://blog.csdn.net/arkohut/article/details/139076652)