https://github.com/wanzhenchn/llm-benchmarks

LLM benchmark tools for LMDeploy, vLLM, and TensorRT-LLM.
https://github.com/wanzhenchn/llm-benchmarks

Last synced: 4 months ago
JSON representation

LLM benchmark tools for LMDeploy, vLLM, and TensorRT-LLM.

Host: GitHub
URL: https://github.com/wanzhenchn/llm-benchmarks
Owner: wanzhenchn
License: apache-2.0
Created: 2024-06-12T02:54:35.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-12-26T11:29:40.000Z (4 months ago)
Last Synced: 2024-12-26T12:22:45.228Z (4 months ago)
Language: Python
Size: 104 KB
Stars: 1
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome_ai_agents - Llm-Benchmarks - LLM benchmark tools for LMDeploy, vLLM, and TensorRT-LLM. (Building / Tools)
awesome_ai_agents - Llm-Benchmarks - LLM benchmark tools for LMDeploy, vLLM, and TensorRT-LLM. (Building / Tools)

README

        


LLM-Benchmarks

===========================

 A Benchmark Toolbox for LLM Performance (Inference and Evalution).


[![license](https://img.shields.io/badge/license-Apache%202-blue)](./LICENSE)

---



## Latest News 🔥

- [2024/07/04] Support for evaluation with [vLLM](https://github.com/vllm-project/vllm/) backend using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness).

- [2024/06/21] Added support for inference performance benchmark with [LMDeploy](https://github.com/InternLM/lmdeploy) and [vLLM](https://github.com/vllm-project/vllm/).

- [2024/06/14] Added support for inference performance benchmark with [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM).

- [2024/06/14] We officially released LLM-Benchmarks!

## LLM-Benchmarks Overview

LLM-Benchmarks is an easy-to-use toolbox for benchmarking Large Language Models (LLMs) performance on inference and evalution.

- Inference Performance: Benchmarking LLMs service deployed with inference frameworks (e.g., [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM), [lmdeploy](https://github.com/InternLM/lmdeploy) and [vLLM](https://github.com/vllm-project/vllm),) under different batch sizes and generation lengths.

- Task Evaluation: Few-shot evaluation of LLMs throuth APIs including [OpenAI](https://openai.com/), and [Triton Inference Server](https://github.com/triton-inference-server) with [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness).

## Getting Started

### Download the ShareGPT dataset

You can download the dataset by running:

```bash

wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json

```

### Prepare for Docker image and container environment

You can build Docker images locally by running:

```bash

# for tensorrt-llm

bash scripts/trt_llm/build_docker.sh all

# for lmdeploy

bash scripts/lmdeploy/build_docker.sh

# for vllm

bash scripts/vllm/build_docker.sh

```

or use the available images by `docker pull ${Image}:${Tag}`:

| Image                                                   | Tag                              |

|---------------------------------------------------------|----------------------------------|

| registry.cn-beijing.aliyuncs.com/devel-img/lmdeploy     | 0.5.3-arch_808990                |

| registry.cn-beijing.aliyuncs.com/devel-img/vllm         | 0.5.4-arch_70808990              |

| registry.cn-beijing.aliyuncs.com/devel-img/tensorrt-llm | 0.13.0.dev2024082000-arch_808990 |

### Run benchmarks

- Inference Performance

```bash

# Please confirm the version of the image used in the script

bash run_benchmark.sh model_path dataset_path sample_num device_id(like 0 or 0,1)

```

- Task Evaluation

```bash

# Build evalution image

bash scripts/evaluation/build_docker.sh vllm # (or lmdeploy or trt-llm)

# Evalution with vLLM backend

bash run_eval.sh mode(fp16, fp8-kv-fp16, fp8-kv-fp8) model_path device_id(like 0 or 0,1)"

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/wanzhenchn/llm-benchmarks

Awesome Lists containing this project

README

A Benchmark Toolbox for LLM Performance (Inference and Evalution).