Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dustalov/evalica

Evalica, your favourite evaluation toolkit
https://github.com/dustalov/evalica

arena bradley-terry elo evalica evals evaluation hacktoberfest leaderboard library llm pagerank pairwise-comparison pyo3 python ranking rating rust serbia statistics winrate

Last synced: 5 days ago
JSON representation

Evalica, your favourite evaluation toolkit

Host: GitHub
URL: https://github.com/dustalov/evalica
Owner: dustalov
License: apache-2.0
Created: 2024-06-15T13:56:08.000Z (7 months ago)
Default Branch: master
Last Pushed: 2025-01-03T21:26:24.000Z (18 days ago)
Last Synced: 2025-01-06T17:53:44.782Z (15 days ago)
Topics: arena, bradley-terry, elo, evalica, evals, evaluation, hacktoberfest, leaderboard, library, llm, pagerank, pairwise-comparison, pyo3, python, ranking, rating, rust, serbia, statistics, winrate
Language: Python
Homepage: https://dustalov.github.io/evalica/
Size: 590 KB
Stars: 24
Watchers: 3
Forks: 3
Open Issues: 3
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Citation: CITATION.cff

Awesome Lists containing this project

README

        # Evalica, your favourite evaluation toolkit

[![Evalica](https://raw.githubusercontent.com/dustalov/evalica/master/Evalica.svg)](https://github.com/dustalov/evalica)

[![Tests][github_tests_badge]][github_tests_link]

[![Read the Docs][rtfd_badge]][rtfd_link]

[![PyPI Version][pypi_badge]][pypi_link]

[![Anaconda.org][conda_badge]][conda_link]

[![Codecov][codecov_badge]][codecov_link]

[![CodSpeed Badge][codspeed_badge]][codspeed_link]

[github_tests_badge]: https://github.com/dustalov/evalica/actions/workflows/test.yml/badge.svg?branch=master

[github_tests_link]: https://github.com/dustalov/evalica/actions/workflows/test.yml

[rtfd_badge]: https://readthedocs.org/projects/evalica/badge/

[rtfd_link]: https://evalica.readthedocs.io/

[pypi_badge]: https://badge.fury.io/py/evalica.svg

[pypi_link]: https://pypi.python.org/pypi/evalica

[conda_badge]: https://anaconda.org/conda-forge/evalica/badges/version.svg

[conda_link]: https://anaconda.org/conda-forge/evalica

[codecov_badge]: https://codecov.io/gh/dustalov/evalica/branch/master/graph/badge.svg

[codecov_link]: https://codecov.io/gh/dustalov/evalica

[codspeed_badge]: https://img.shields.io/endpoint?url=https://codspeed.io/badge.json

[codspeed_link]: https://codspeed.io/dustalov/evalica

**Evalica** [ɛˈʋalit͡sa] (eh-vah-lee-tsah) is a Python library that transforms pairwise comparisons into ranked lists of items. It offers convenient high-performant Rust implementations of the corresponding methods via [PyO3](https://pyo3.rs/), and additionally provides naïve Python code for most of them. Evalica is fully compatible with [NumPy](https://numpy.org/) arrays and [pandas](https://pandas.pydata.org/) data frames.

- [Tutorial](https://dustalov.github.io/evalica/) (and [Tutorial.ipynb](Tutorial.ipynb))

- [Chatbot-Arena.ipynb](Chatbot-Arena.ipynb) [![Open in Colab][colab_badge]][colab_link] [![Binder][binder_badge]][binder_link]

- [Pair2Rank](https://huggingface.co/spaces/dustalov/pair2rank)

[colab_badge]: https://colab.research.google.com/assets/colab-badge.svg

[colab_link]: https://colab.research.google.com/github/dustalov/evalica/blob/master/Chatbot-Arena.ipynb

[binder_badge]: https://mybinder.org/badge_logo.svg

[binder_link]: https://mybinder.org/v2/gh/dustalov/evalica/HEAD?labpath=Chatbot-Arena.ipynb

The logo was created using [Recraft](https://www.recraft.ai/).

> [!NOTE]

> The demonstration paper describing Evalica has been accepted at the [COLING 2025](https://coling2025.org/) conference in Abu Dhabi!

## Installation

- [pip](https://pip.pypa.io/): `pip install evalica`

- [Anaconda](https://docs.conda.io/en/latest/): `conda install conda-forge::evalica`

## Usage

Imagine that we would like to rank the different meals and have the following dataset of three comparisons produced by food experts.

| **Item X**| **Item Y** | **Winner** |

|:---:|:---:|:---:|

| `pizza` | `burger` | `x` |

| `burger` | `sushi` | `y` |

| `pizza` | `sushi` | `tie` |

Given this hypothetical example, Evalica takes these three columns and computes the outcome of the given pairwise comparison according to the chosen model. Note that the first argument is the column `Item X`, the second argument is the column `Item Y`, and the third argument corresponds to the column `Winner`.

```pycon

>>> from evalica import elo, Winner

>>> result = elo(

...     ['pizza', 'burger', 'pizza'],

...     ['burger', 'sushi', 'sushi'],

...     [Winner.X, Winner.Y, Winner.Draw],

... )

>>> result.scores

pizza     1014.972058

burger     970.647200

sushi     1014.380742

Name: elo, dtype: float64

```

As a result, we obtain [Elo scores](https://en.wikipedia.org/wiki/Elo_rating_system) of our items. In this example, `pizza` was the most favoured item, `sushi` was the runner-up, and `burger` was the least preferred item.

| **Item**| **Score** |

|---|---:|

| `pizza` | 1014.97 |

| `burger` | 970.65 |

| `sushi` | 1014.38 |

## Command-Line Interface

Evalica also provides a simple command-line interface, allowing the use of these methods in shell scripts and for prototyping.

```console

$ evalica -i food.csv bradley-terry                

item,score,rank

Tacos,2.509025136024378,1

Sushi,1.1011561298265815,2

Burger,0.8549063627182466,3

Pasta,0.7403814336665869,4

Pizza,0.5718366915548537,5

```

Refer to the [food.csv](food.csv) file as an input example.

## Web Application

Evalica has a built-in [Gradio](https://www.gradio.app/) application that can be launched as `python3 -m evalica.gradio`. Please ensure that the library was installed as `pip install evalica[gradio]`.

## Implemented Methods

| **Method** | **In Python** | **In Rust** |

|---|:---:|:---:|

| Counting | ✅ | ✅ |

| Average Win Rate | ✅ | ✅ |

| [Bradley–Terry] | ✅ | ✅ |

| [Elo] | ✅ | ✅ |

| [Eigenvalue] | ✅ | ✅ |

| [PageRank] | ✅ | ✅ |

| [Newman] | ✅ | ✅ |

[Bradley–Terry]: https://doi.org/10.2307/2334029

[Elo]: https://isbnsearch.org/isbn/9780923891275

[Eigenvalue]: https://doi.org/10.1086/228631

[PageRank]: https://doi.org/10.1016/S0169-7552(98)00110-X

[Newman]: https://jmlr.org/papers/v24/22-1086.html

## Citation

- Ustalov, D. [Reliable, Reproducible, and Really Fast Leaderboards with Evalica](https://arxiv.org/abs/2412.11314). 2024. arXiv: [2412.11314 [cs.CL]](https://arxiv.org/abs/2412.11314).

```bibtex

@misc{Ustalov:25,

  author    = {Ustalov, Dmitry},

  title     = {{Reliable, Reproducible, and Really Fast Leaderboards with Evalica}},

  year      = {2025},

  eprint    = {2412.11314},

  eprinttype = {arxiv},

  eprintclass = {cs.CL},

  language  = {english},

}

```

The code for replicating the experiments is available in the [`coling2025`](coling2025/) directory.

## Copyright

Copyright (c) 2024 [Dmitry Ustalov](https://github.com/dustalov). See [LICENSE](LICENSE) for details.