Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/huggingface/evaluate
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
https://github.com/huggingface/evaluate
evaluation machine-learning
Last synced: 6 days ago
JSON representation
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
- Host: GitHub
- URL: https://github.com/huggingface/evaluate
- Owner: huggingface
- License: apache-2.0
- Created: 2022-03-30T15:08:26.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-09-17T00:19:55.000Z (4 months ago)
- Last Synced: 2025-01-03T22:17:28.377Z (9 days ago)
- Topics: evaluation, machine-learning
- Language: Python
- Homepage: https://huggingface.co/docs/evaluate
- Size: 2.02 MB
- Stars: 2,081
- Watchers: 47
- Forks: 264
- Open Issues: 226
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Authors: AUTHORS
Awesome Lists containing this project
- awesome-ml-python-packages - Evaluate
- awesome-production-machine-learning - Evaluate - Evaluate is a library that makes evaluating and comparing models and reporting their performance easier and more standardized. (Evaluation and Monitoring)
- StarryDivineSky - huggingface/evaluate
- Awesome-RAG - Hugging Face Evaluate
- awesome-ai-papers - [evaluate - guidebook](https://github.com/huggingface/evaluation-guidebook)\]\[[Awesome-LLM-Eval](https://github.com/onejune2018/Awesome-LLM-Eval)\]\[[LLM-eval-survey](https://github.com/MLGroupJLU/LLM-eval-survey)\]\[[llm_benchmarks](https://github.com/leobeeson/llm_benchmarks)\]\[[Awesome-LLMs-Evaluation-Papers](https://github.com/tjunlp-lab/Awesome-LLMs-Evaluation-Papers)\] (NLP / 3. Pretraining)
README
> **Tip:** For more recent evaluation approaches, for example for evaluating LLMs, we recommend our newer and more actively maintained library [LightEval](https://github.com/huggingface/lighteval).
🤗 Evaluate is a library that makes evaluating and comparing models and reporting their performance easier and more standardized.
It currently contains:
- **implementations of dozens of popular metrics**: the existing metrics cover a variety of tasks spanning from NLP to Computer Vision, and include dataset-specific metrics for datasets. With a simple command like `accuracy = load("accuracy")`, get any of these metrics ready to use for evaluating a ML model in any framework (Numpy/Pandas/PyTorch/TensorFlow/JAX).
- **comparisons and measurements**: comparisons are used to measure the difference between models and measurements are tools to evaluate datasets.
- **an easy way of adding new evaluation modules to the 🤗 Hub**: you can create new evaluation modules and push them to a dedicated Space in the 🤗 Hub with `evaluate-cli create [metric name]`, which allows you to see easily compare different metrics and their outputs for the same sets of references and predictions.[🎓 **Documentation**](https://huggingface.co/docs/evaluate/)
🔎 **Find a [metric](https://huggingface.co/evaluate-metric), [comparison](https://huggingface.co/evaluate-comparison), [measurement](https://huggingface.co/evaluate-measurement) on the Hub**
[🌟 **Add a new evaluation module**](https://huggingface.co/docs/evaluate/)
🤗 Evaluate also has lots of useful features like:
- **Type checking**: the input types are checked to make sure that you are using the right input formats for each metric
- **Metric cards**: each metrics comes with a card that describes the values, limitations and their ranges, as well as providing examples of their usage and usefulness.
- **Community metrics:** Metrics live on the Hugging Face Hub and you can easily add your own metrics for your project or to collaborate with others.# Installation
## With pip
🤗 Evaluate can be installed from PyPi and has to be installed in a virtual environment (venv or conda for instance)
```bash
pip install evaluate
```# Usage
🤗 Evaluate's main methods are:
- `evaluate.list_evaluation_modules()` to list the available metrics, comparisons and measurements
- `evaluate.load(module_name, **kwargs)` to instantiate an evaluation module
- `results = module.compute(*kwargs)` to compute the result of an evaluation module# Adding a new evaluation module
First install the necessary dependencies to create a new metric with the following command:
```bash
pip install evaluate[template]
```
Then you can get started with the following command which will create a new folder for your metric and display the necessary steps:
```bash
evaluate-cli create "Awesome Metric"
```
See this [step-by-step guide](https://huggingface.co/docs/evaluate/creating_and_sharing) in the documentation for detailed instructions.## Credits
Thanks to [@marella](https://github.com/marella) for letting us use the `evaluate` namespace on PyPi previously used by his [library](https://github.com/marella/evaluate).