An open API service indexing awesome lists of open source software.

https://github.com/orion-zhen/llm-throughput-eval

evaluate llm's generation speed via API
https://github.com/orion-zhen/llm-throughput-eval

llm llm-evaluation throughput throughput-performance

Last synced: 4 months ago
JSON representation

evaluate llm's generation speed via API

Awesome Lists containing this project

README

          

# llm-throughput-eval

Evaluate llm's generation speed via API

## Quick Start

### Get the repository

Clone:

```shell
git clone https://github.com/Orion-zhen/llm-throughput-eval.git
```

Install dependencies:

```shell
pip install -r requirements.txt
```

### Go

```shell
python evel.py -m -u -n -c -t
```

Example:

```shell
python eval.py -m "qwq:32b"
```

This will send 16 requests with 4 concurrency to local ollama API (should be ) with model qwq:32b

Full arguments:

```shell
usage: eval.py [-h] [--concurrency CONCURRENCY] [--requests REQUESTS] [--url URL] [--model MODEL]

Async HTTP Benchmark Tool

options:
-h, --help show this help message and exit
--concurrency, -c CONCURRENCY
Maximum concurrent requests.
--requests, -n REQUESTS
Total number of requests to send.
--url, -u URL Base URL for the API (e.g., http://localhost:8000). '/v1/chat/completions' will be appended.
--model, -m MODEL Model name to use in the request payload.
--token, -t TOKEN Bearer token for API authentication.
```

## Credits

The code is inspired by [this article](https://blog.csdn.net/arkohut/article/details/139076652)