https://github.com/unifyai/aibench-llm-endpoints

Runner in charge of collecting metrics from LLM inference endpoints for the Unify Hub
https://github.com/unifyai/aibench-llm-endpoints

benchmark endpoints llm llm-inference python

Last synced: 10 months ago
JSON representation

Runner in charge of collecting metrics from LLM inference endpoints for the Unify Hub

Host: GitHub
URL: https://github.com/unifyai/aibench-llm-endpoints
Owner: unifyai
License: apache-2.0
Created: 2024-02-01T03:00:30.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-02-05T15:06:18.000Z (over 2 years ago)
Last Synced: 2024-11-14T03:11:31.995Z (over 1 year ago)
Topics: benchmark, endpoints, llm, llm-inference, python
Language: Python
Homepage: https://unify.ai/hub
Size: 9.77 KB
Stars: 17
Watchers: 4
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# AIBench LLM Endpoints

## Overview
This code provides a benchmarking runner, `AIBench-LLM`, for evaluating the performance of a large language model (LLM) inference endpoint. The benchmark measures various metrics such as Time to First Token (TTFT), End to End Latency, Inter-Token Latency (ITL), Output Tokens per Second, and more.

The AIBench Runner is in charge of collecting metrics from LLM inference endpoints for the [Unify Hub](https://unify.ai/hub). More information about the full methodology is available [here 📑](https://unify.ai/docs/hub/concepts/benchmarks.html)

Contributions and discussions around the methodology and the runner are definitely welcome, you can join the [Unify Discord](https://discord.com/invite/sXyFF8tDtm) if this sounds interesting!

## Metrics
The benchmark runner collects the following metrics:

- `load`: Number of concurrent requests.
- `input_policy`: Input policy used (short or long).
- `ttft`: Time-to-first-token for each request.
- `e2e_latency`: End-to-end latency for each request.
- `itl`: Inter-token Latency.
- `cold_start`: Cold start time (if applicable).
- `prompt_tokens`: Number of tokens in the input prompt.
- `output_tokens`: Number of tokens in the LLM output.
- `total_tokens`: Total number of tokens (input + output).
- `output_tks_per_sec`: Output tokens per second.
- `failed_queries`: Number of failed queries.

## Usage and Examples
To be added this week!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/unifyai/aibench-llm-endpoints

Awesome Lists containing this project

README