{"id":17130520,"url":"https://github.com/dustalov/evalica","last_synced_at":"2026-03-11T01:09:04.256Z","repository":{"id":247597010,"uuid":"815570279","full_name":"dustalov/evalica","owner":"dustalov","description":"Evalica, your favourite evaluation toolkit","archived":false,"fork":false,"pushed_at":"2026-02-28T13:43:38.000Z","size":863,"stargazers_count":62,"open_issues_count":0,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2026-02-28T16:35:19.974Z","etag":null,"topics":["arena","bradley-terry","elo","evalica","evals","evaluation","hacktoberfest","leaderboard","library","llm","pagerank","pairwise-comparison","pyo3","python","ranking","rating","rust","serbia","statistics","winrate"],"latest_commit_sha":null,"homepage":"https://evalica.readthedocs.io/latest/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dustalov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"dustalov"}},"created_at":"2024-06-15T13:56:08.000Z","updated_at":"2026-02-28T13:43:37.000Z","dependencies_parsed_at":null,"dependency_job_id":"d2b1248d-8aa0-4998-af84-1d98440ce16c","html_url":"https://github.com/dustalov/evalica","commit_stats":{"total_commits":311,"total_committers":2,"mean_commits":155.5,"dds":0.009646302250803873,"last_synced_commit":"b091831aa71f520e4bd14d5a3a90f77492a72636"},"previous_names":["dustalov/evalica"],"tags_count":20,"template":false,"template_full_name":null,"purl":"pkg:github/dustalov/evalica","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dustalov%2Fevalica","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dustalov%2Fevalica/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dustalov%2Fevalica/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dustalov%2Fevalica/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dustalov","download_url":"https://codeload.github.com/dustalov/evalica/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dustalov%2Fevalica/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30364974,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-10T21:41:54.280Z","status":"ssl_error","status_checked_at":"2026-03-10T21:40:59.357Z","response_time":106,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arena","bradley-terry","elo","evalica","evals","evaluation","hacktoberfest","leaderboard","library","llm","pagerank","pairwise-comparison","pyo3","python","ranking","rating","rust","serbia","statistics","winrate"],"created_at":"2024-10-14T19:12:34.950Z","updated_at":"2026-03-11T01:09:04.248Z","avatar_url":"https://github.com/dustalov.png","language":"Python","funding_links":["https://github.com/sponsors/dustalov"],"categories":["Python"],"sub_categories":[],"readme":"# Evalica, your favourite evaluation toolkit\n\n[![Evalica](https://raw.githubusercontent.com/dustalov/evalica/master/Evalica.svg)](https://github.com/dustalov/evalica)\n\n[![Tests][github_tests_badge]][github_tests_link]\n[![Read the Docs][rtfd_badge]][rtfd_link]\n[![PyPI Version][pypi_badge]][pypi_link]\n[![Anaconda.org][conda_badge]][conda_link]\n[![crates.io][crates_badge]][crates_link]\n[![Codecov][codecov_badge]][codecov_link]\n[![CodSpeed Badge][codspeed_badge]][codspeed_link]\n\n[github_tests_badge]: https://github.com/dustalov/evalica/actions/workflows/test.yml/badge.svg?branch=master\n[github_tests_link]: https://github.com/dustalov/evalica/actions/workflows/test.yml\n[rtfd_badge]: https://readthedocs.org/projects/evalica/badge/\n[rtfd_link]: https://evalica.readthedocs.io/\n[pypi_badge]: https://badge.fury.io/py/evalica.svg\n[pypi_link]: https://pypi.python.org/pypi/evalica\n[conda_badge]: https://anaconda.org/conda-forge/evalica/badges/version.svg\n[conda_link]: https://anaconda.org/conda-forge/evalica\n[crates_badge]: https://img.shields.io/crates/v/evalica\n[crates_link]: https://crates.io/crates/evalica\n[codecov_badge]: https://codecov.io/gh/dustalov/evalica/branch/master/graph/badge.svg\n[codecov_link]: https://codecov.io/gh/dustalov/evalica\n[codspeed_badge]: https://img.shields.io/endpoint?url=https://codspeed.io/badge.json\n[codspeed_link]: https://codspeed.io/dustalov/evalica\n\n**Evalica** [\u0026#x025b;\u0026#x02c8;\u0026#x028b;alit\u0026#x0361;sa] (eh-vah-lee-tsah) is an evaluation toolkit for statistical analysis, combining fast Rust implementations with Python APIs for ranking, reliability, and uncertainty estimation. Evalica is fully compatible with [NumPy](https://numpy.org/) arrays and [pandas](https://pandas.pydata.org/) data frames.\n\n- [Tutorial](docs/tutorial.ipynb)\n- [Chatbot-Arena.ipynb](Chatbot-Arena.ipynb) [![Open in Colab][colab_badge]][colab_link] [![Binder][binder_badge]][binder_link]\n- [Evalica Demo](https://huggingface.co/spaces/dustalov/evalica)\n\n[colab_badge]: https://colab.research.google.com/assets/colab-badge.svg\n[colab_link]: https://colab.research.google.com/github/dustalov/evalica/blob/master/Chatbot-Arena.ipynb\n[binder_badge]: https://mybinder.org/badge_logo.svg\n[binder_link]: https://mybinder.org/v2/gh/dustalov/evalica/HEAD?labpath=Chatbot-Arena.ipynb\n\nThe logo was created using [Recraft](https://www.recraft.ai/).\n\n## Installation\n\n- [pip](https://pip.pypa.io/): `pip install evalica`\n- [Anaconda](https://docs.conda.io/en/latest/): `conda install conda-forge::evalica`\n- [Cargo](https://crates.io/crates/evalica): `cargo add evalica`\n\n## Pairwise Comparisons\n\nImagine that we would like to rank the different meals and have the following dataset of three comparisons produced by food experts.\n\n| **Item X**| **Item Y** | **Winner** |\n|:---:|:---:|:---:|\n| `pizza` | `burger` | `x` |\n| `burger` | `sushi` | `y` |\n| `pizza` | `sushi` | `tie` |\n\nGiven this hypothetical example, Evalica takes these three columns and computes the outcome of the given pairwise comparison according to the chosen model. Note that the first argument is the column `Item X`, the second argument is the column `Item Y`, and the third argument corresponds to the column `Winner`.\n\n```pycon\n\u003e\u003e\u003e from evalica import elo, Winner\n\u003e\u003e\u003e result = elo(\n...     ['pizza', 'burger', 'pizza'],\n...     ['burger', 'sushi', 'sushi'],\n...     [Winner.X, Winner.Y, Winner.Draw],\n... )\n\u003e\u003e\u003e result.scores\npizza     1014.972058\nburger     970.647200\nsushi     1014.380742\nName: elo, dtype: float64\n```\n\nAs a result, we obtain [Elo scores](https://en.wikipedia.org/wiki/Elo_rating_system) of our items. In this example, `pizza` was the most favoured item, `sushi` was the runner-up, and `burger` was the least preferred item.\n\n| **Item**| **Score** |\n|---|---:|\n| `pizza` | 1014.97 |\n| `burger` | 970.65 |\n| `sushi` | 1014.38 |\n\n### Inter-Rater Reliability\n\nEvalica also supports computing [Krippendorff's alpha](https://en.wikipedia.org/wiki/Krippendorff%27s_alpha), a statistical measure of inter-rater reliability. Unlike pairwise comparisons, alpha accepts a matrix where rows represent raters (observers) and columns represent units (items being rated).\n\n```pycon\n\u003e\u003e\u003e import pandas as pd\n\u003e\u003e\u003e from evalica import alpha\n\u003e\u003e\u003e data = pd.DataFrame([\n...     [1, 1, None, 1],\n...     [2, 2, 3, 2],\n...     [3, 3, 3, 3],\n...     [3, 3, 3, 3],\n...     [2, 2, 2, 2],\n...     [1, 2, 3, 4],\n...     [4, 4, 4, 4],\n...     [1, 1, 2, 1],\n...     [2, 2, 2, 2],\n...     [None, 5, 5, 5],\n...     [None, None, 1, 1],\n... ]).T\n\u003e\u003e\u003e result = alpha(data, distance='nominal')\n\u003e\u003e\u003e result.alpha\n0.7434210526315788\n\u003e\u003e\u003e from evalica import alpha_bootstrap\n\u003e\u003e\u003e bootstrap_result = alpha_bootstrap(data, distance='nominal', n_resamples=1000, random_state=42)\n\u003e\u003e\u003e (bootstrap_result.low, bootstrap_result.high)\n(0.4431818181818182, 0.9411764705882353)\n```\n\nThis example demonstrates computing alpha and its [bootstrap confidence intervals](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)) with nominal distance for categorical ratings. Evalica supports multiple distance metrics: `nominal`, `ordinal`, `interval`, `ratio`, or custom distance functions.\n\n## Command-Line Interface\n\nEvalica also provides a simple command-line interface, allowing the use of these methods in shell scripts and for prototyping.\n\n### Pairwise Ranking\n\n```console\n$ evalica -i food.csv pairwise bradley-terry\nitem,score,rank\nTacos,2.509025136024378,1\nSushi,1.1011561298265815,2\nBurger,0.8549063627182466,3\nPasta,0.7403814336665869,4\nPizza,0.5718366915548537,5\n```\n\nRefer to the [food.csv](food.csv) file as an input example.\n\n### Krippendorff's Alpha\n\nFor Krippendorff's alpha, use a CSV file with ratings in a matrix format (no header):\n\n```console\n$ evalica -i codings.csv alpha --distance=nominal\nmetric,value\nalpha,0.743421052631579\nobserved,7.999999999999999\nexpected,31.179487179487182\n```\n\n## Web Application\n\nEvalica has a built-in [Gradio](https://www.gradio.app/) application that can be launched as `python3 -m evalica.gradio`. Please ensure that the library was installed as `pip install evalica[gradio]`.\n\n## Implemented Methods\n\n| **Method** | **In Python** | **In Rust** |\n|---|:---:|:---:|\n| Counting | \u0026#x2705; | \u0026#x2705; |\n| Average Win Rate | \u0026#x2705; | \u0026#x2705; |\n| [Bradley\u0026ndash;Terry] | \u0026#x2705; | \u0026#x2705; |\n| [Elo] | \u0026#x2705; | \u0026#x2705; |\n| [Eigenvalue] | \u0026#x2705; | \u0026#x2705; |\n| [PageRank] | \u0026#x2705; | \u0026#x2705; |\n| [Newman] | \u0026#x2705; | \u0026#x2705; |\n| [Krippendorff's Alpha] | \u0026#x2705; | \u0026#x2705; |\n\n\u003c!-- Present: \u0026#x2705; / Absent: \u0026#x274C; --\u003e\n\n[Bradley\u0026ndash;Terry]: https://doi.org/10.2307/2334029\n[Elo]: https://isbnsearch.org/isbn/9780923891275\n[Eigenvalue]: https://doi.org/10.1086/228631\n[PageRank]: https://doi.org/10.1016/S0169-7552(98)00110-X\n[Newman]: https://jmlr.org/papers/v24/22-1086.html\n[Krippendorff's Alpha]: https://en.wikipedia.org/wiki/Krippendorff%27s_alpha\n\n## Contributing\n\nEvalica is a mixed Rust/Python project that uses [PyO3](https://pyo3.rs/), so it requires setting up the [Maturin](https://www.maturin.rs/) build system.\n\nTo set up the environment, we recommend using the [uv](https://github.com/astral-sh/uv) package manager, as demonstrated in [our test suite](.github/workflows/test.yml):\n\n```console\n$ uv venv\n$ uv pip install maturin\n$ source .venv/bin/activate\n$ maturin develop --uv --extras dev,docs,gradio\n```\n\nIn case `uv` is not available, you can use the following workaround:\n\n```console\n$ python3 -m venv venv\n$ source venv/bin/activate\n$ pip install maturin\n$ maturin develop --extras dev,docs,gradio\n```\n\nIt is also possible to omit the Rust-accelerated routines via `pip install --no-binary evalica`.\n\nWe welcome pull requests on GitHub: \u003chttps://github.com/dustalov/evalica\u003e. To contribute, fork the repository, create a separate branch for your changes, and submit a pull request.\n\n## Citation\n\n- Ustalov, D. [Reliable, Reproducible, and Really Fast Leaderboards with Evalica](https://aclanthology.org/2025.coling-demos.6). 2025. Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations. 46\u0026ndash;53. arXiv: [2412.11314 [cs.CL]](https://arxiv.org/abs/2412.11314).\n\n```bibtex\n@inproceedings{Ustalov:25,\n  author    = {Ustalov, Dmitry},\n  title     = {{Reliable, Reproducible, and Really Fast Leaderboards with Evalica}},\n  year      = {2025},\n  booktitle = {Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations},\n  pages     = {46--53},\n  address   = {Abu Dhabi, UAE},\n  publisher = {Association for Computational Linguistics},\n  eprint    = {2412.11314},\n  eprinttype = {arxiv},\n  eprintclass = {cs.CL},\n  url       = {https://aclanthology.org/2025.coling-demos.6},\n  language  = {english},\n}\n```\n\nThe code for replicating the experiments is available in the [`coling2025`](coling2025/) directory.\n\n## Copyright\n\nCopyright (c) 2024\u0026ndash;2026 [Dmitry Ustalov](https://github.com/dustalov). See [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdustalov%2Fevalica","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdustalov%2Fevalica","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdustalov%2Fevalica/lists"}