https://github.com/huggingface/lighteval
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
https://github.com/huggingface/lighteval
evaluation evaluation-framework evaluation-metrics huggingface
Last synced: 20 days ago
JSON representation
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
- Host: GitHub
- URL: https://github.com/huggingface/lighteval
- Owner: huggingface
- License: mit
- Created: 2024-01-26T13:15:39.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-08T15:10:53.000Z (27 days ago)
- Last Synced: 2025-04-10T04:47:55.782Z (26 days ago)
- Topics: evaluation, evaluation-framework, evaluation-metrics, huggingface
- Language: Python
- Homepage:
- Size: 4.71 MB
- Stars: 1,398
- Watchers: 28
- Forks: 219
- Open Issues: 120
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-LLM-resourses - Lighteval - in-one toolkit for evaluating LLMs across multiple backends. (评估 Evaluation)
- StarryDivineSky - huggingface/lighteval
- awesome-llm-eval - lighteval - 02-08) | (Tools)
- Awesome-LLM - lighteval - a lightweight LLM evaluation suite that Hugging Face has been using internally. (LLM Evaluation:)
- awesome-open-source-lms - Eval Code
- awesome-production-machine-learning - LightEval - LightEval is a lightweight LLM evaluation suite. (Evaluation and Monitoring)
- trackawesomelist - LightEval (⭐655)
- awesome-llm - lighteval
README
![]()
Your go-to toolkit for lightning-fast, flexible LLM evaluation, from Hugging Face's Leaderboard and Evals Team.[](https://github.com/huggingface/lighteval/actions/workflows/tests.yaml?query=branch%3Amain)
[](https://github.com/huggingface/lighteval/actions/workflows/quality.yaml?query=branch%3Amain)
[](https://www.python.org/downloads/)
[](https://github.com/huggingface/lighteval/blob/main/LICENSE)
[](https://pypi.org/project/lighteval/)---
**Documentation**: Lighteval's Wiki
---
### Unlock the Power of LLM Evaluation with Lighteval 🚀
**Lighteval** is your all-in-one toolkit for evaluating LLMs across multiple
backends—whether it's
[transformers](https://github.com/huggingface/transformers),
[tgi](https://github.com/huggingface/text-generation-inference),
[vllm](https://github.com/vllm-project/vllm), or
[nanotron](https://github.com/huggingface/nanotron)—with
ease. Dive deep into your model’s performance by saving and exploring detailed,
sample-by-sample results to debug and see how your models stack-up.Customization at your fingertips: letting you either browse all our existing [tasks](https://huggingface.co/docs/lighteval/available-tasks) and [metrics](https://huggingface.co/docs/lighteval/metric-list) or effortlessly create your own [custom task](https://huggingface.co/docs/lighteval/adding-a-custom-task) and [custom metric](https://huggingface.co/docs/lighteval/adding-a-new-metric), tailored to your needs.
Seamlessly experiment, benchmark, and store your results on the Hugging Face
Hub, S3, or locally.## 🔑 Key Features
- **Speed**: [Use vllm as backend for fast evals](https://huggingface.co/docs/lighteval/use-vllm-as-backend).
- **Completeness**: [Use the accelerate backend to launch any models hosted on Hugging Face](https://huggingface.co/docs/lighteval/quicktour#accelerate).
- **Seamless Storage**: [Save results in S3 or Hugging Face Datasets](https://huggingface.co/docs/lighteval/saving-and-reading-results).
- **Python API**: [Simple integration with the Python API](https://huggingface.co/docs/lighteval/using-the-python-api).
- **Custom Tasks**: [Easily add custom tasks](https://huggingface.co/docs/lighteval/adding-a-custom-task).
- **Versatility**: Tons of [metrics](https://huggingface.co/docs/lighteval/metric-list) and [tasks](https://huggingface.co/docs/lighteval/available-tasks) ready to go.## ⚡️ Installation
```bash
pip install lighteval
```Lighteval allows for many extras when installing, see [here](https://huggingface.co/docs/lighteval/installation) for a complete list.
If you want to push results to the Hugging Face Hub, add your access token as
an environment variable:```shell
huggingface-cli login
```## 🚀 Quickstart
Lighteval offers the following entry points for model evaluation:
- `lighteval accelerate` : evaluate models on CPU or one or more GPUs using [🤗
Accelerate](https://github.com/huggingface/accelerate)
- `lighteval nanotron`: evaluate models in distributed settings using [⚡️
Nanotron](https://github.com/huggingface/nanotron)
- `lighteval vllm`: evaluate models on one or more GPUs using [🚀
VLLM](https://github.com/vllm-project/vllm)
- `lighteval endpoint`
- `inference-endpoint`: evaluate models on one or more GPUs using [🔗
Inference Endpoint](https://huggingface.co/inference-endpoints/dedicated)
- `tgi`: evaluate models on one or more GPUs using [🔗 Text Generation Inference](https://huggingface.co/docs/text-generation-inference/en/index)
- `openai`: evaluate models on one or more GPUs using [🔗 OpenAI API](https://platform.openai.com/)Here’s a quick command to evaluate using the Accelerate backend:
```shell
lighteval accelerate \
"model_name=gpt2" \
"leaderboard|truthfulqa:mc|0|0"
```## 🙏 Acknowledgements
Lighteval started as an extension of the fantastic [Eleuther AI
Harness](https://github.com/EleutherAI/lm-evaluation-harness) (which powers the
[Open LLM
Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard))
and draws inspiration from the amazing
[HELM](https://crfm.stanford.edu/helm/latest/) framework.While evolving Lighteval into its own standalone tool, we are grateful to the
Harness and HELM teams for their pioneering work on LLM evaluations.## 🌟 Contributions Welcome 💙💚💛💜🧡
Got ideas? Found a bug? Want to add a
[task](https://huggingface.co/docs/lighteval/adding-a-custom-task) or
[metric](https://huggingface.co/docs/lighteval/adding-a-new-metric)?
Contributions are warmly welcomed!If you're adding a new feature, please open an issue first.
If you open a PR, don't forget to run the styling!
```bash
pip install -e .[dev]
pre-commit install
pre-commit run --all-files
```
## 📜 Citation```bibtex
@misc{lighteval,
author = {Fourrier, Clémentine and Habib, Nathan and Kydlíček, Hynek and Wolf, Thomas and Tunstall, Lewis},
title = {LightEval: A lightweight framework for LLM evaluation},
year = {2023},
version = {0.8.0},
url = {https://github.com/huggingface/lighteval}
}
```