{"id":13450950,"url":"https://github.com/huggingface/evaluate","last_synced_at":"2025-05-14T20:00:30.427Z","repository":{"id":37084890,"uuid":"475932672","full_name":"huggingface/evaluate","owner":"huggingface","description":"🤗 Evaluate: A library for easily evaluating machine learning models and datasets.","archived":false,"fork":false,"pushed_at":"2025-01-10T14:45:40.000Z","size":2108,"stargazers_count":2198,"open_issues_count":236,"forks_count":273,"subscribers_count":44,"default_branch":"main","last_synced_at":"2025-05-01T03:12:21.061Z","etag":null,"topics":["evaluation","machine-learning"],"latest_commit_sha":null,"homepage":"https://huggingface.co/docs/evaluate","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/huggingface.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-03-30T15:08:26.000Z","updated_at":"2025-04-29T16:17:42.000Z","dependencies_parsed_at":"2023-12-23T20:25:22.642Z","dependency_job_id":"467819d4-d6f4-4b35-9272-d8afbb77232c","html_url":"https://github.com/huggingface/evaluate","commit_stats":{"total_commits":904,"total_committers":128,"mean_commits":7.0625,"dds":0.872787610619469,"last_synced_commit":"344b8b45be3a5eb927bef6d897da876ba9b2f228"},"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fevaluate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fevaluate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fevaluate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fevaluate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/huggingface","download_url":"https://codeload.github.com/huggingface/evaluate/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252831125,"owners_count":21810779,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["evaluation","machine-learning"],"created_at":"2024-07-31T07:00:40.743Z","updated_at":"2025-05-07T06:43:24.799Z","avatar_url":"https://github.com/huggingface.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\r\n    \u003cbr\u003e\r\n    \u003cimg src=\"https://huggingface.co/datasets/evaluate/media/resolve/main/evaluate-banner.png\" width=\"400\"/\u003e\r\n    \u003cbr\u003e\r\n\u003c/p\u003e\r\n\r\n\u003cp align=\"center\"\u003e\r\n    \u003ca href=\"https://github.com/huggingface/evaluate/actions/workflows/ci.yml?query=branch%3Amain\"\u003e\r\n        \u003cimg alt=\"Build\" src=\"https://github.com/huggingface/evaluate/actions/workflows/ci.yml/badge.svg?branch=main\"\u003e\r\n    \u003c/a\u003e\r\n    \u003ca href=\"https://github.com/huggingface/evaluate/blob/master/LICENSE\"\u003e\r\n        \u003cimg alt=\"GitHub\" src=\"https://img.shields.io/github/license/huggingface/evaluate.svg?color=blue\"\u003e\r\n    \u003c/a\u003e\r\n    \u003ca href=\"https://huggingface.co/docs/evaluate/index\"\u003e\r\n        \u003cimg alt=\"Documentation\" src=\"https://img.shields.io/website/http/huggingface.co/docs/evaluate/index.svg?down_color=red\u0026down_message=offline\u0026up_message=online\"\u003e\r\n    \u003c/a\u003e\r\n    \u003ca href=\"https://github.com/huggingface/evaluate/releases\"\u003e\r\n        \u003cimg alt=\"GitHub release\" src=\"https://img.shields.io/github/release/huggingface/evaluate.svg\"\u003e\r\n    \u003c/a\u003e\r\n    \u003ca href=\"CODE_OF_CONDUCT.md\"\u003e\r\n        \u003cimg alt=\"Contributor Covenant\" src=\"https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg\"\u003e\r\n    \u003c/a\u003e\r\n\u003c/p\u003e\r\n\r\n\r\n\r\n\u003e **Tip:** For more recent evaluation approaches, for example for evaluating LLMs, we recommend our newer and more actively maintained library [LightEval](https://github.com/huggingface/lighteval).\r\n\r\n\r\n\r\n🤗 Evaluate is a library that makes evaluating and comparing models and reporting their performance easier and more standardized. \r\n\r\nIt currently contains:\r\n\r\n- **implementations of dozens of popular metrics**: the existing metrics cover a variety of tasks spanning from NLP to Computer Vision, and include dataset-specific metrics for datasets. With a simple command like `accuracy = load(\"accuracy\")`, get any of these metrics ready to use for evaluating a ML model in any framework (Numpy/Pandas/PyTorch/TensorFlow/JAX).\r\n- **comparisons and measurements**: comparisons are used to measure the difference between models and measurements are tools to evaluate datasets.\r\n- **an easy way of adding new evaluation modules to the 🤗 Hub**: you can create new evaluation modules and push them to a dedicated Space in the 🤗 Hub with `evaluate-cli create [metric name]`, which allows you to see easily compare different metrics and their outputs for the same sets of references and predictions.\r\n\r\n[🎓 **Documentation**](https://huggingface.co/docs/evaluate/)\r\n\r\n🔎 **Find a [metric](https://huggingface.co/evaluate-metric), [comparison](https://huggingface.co/evaluate-comparison), [measurement](https://huggingface.co/evaluate-measurement) on the Hub**\r\n\r\n[🌟 **Add a new evaluation module**](https://huggingface.co/docs/evaluate/)\r\n\r\n🤗 Evaluate also has lots of useful features like:\r\n\r\n- **Type checking**: the input types are checked to make sure that you are using the right input formats for each metric\r\n- **Metric cards**: each metrics comes with a card that describes the values, limitations and their ranges, as well as providing examples of their usage and usefulness.\r\n- **Community metrics:** Metrics live on the Hugging Face Hub and you can easily add your own metrics for your project or to collaborate with others.\r\n\r\n\r\n# Installation\r\n\r\n## With pip\r\n\r\n🤗 Evaluate can be installed from PyPi and has to be installed in a virtual environment (venv or conda for instance)\r\n\r\n```bash\r\npip install evaluate\r\n```\r\n\r\n# Usage\r\n\r\n🤗 Evaluate's main methods are:\r\n\r\n- `evaluate.list_evaluation_modules()` to list the available metrics, comparisons and measurements\r\n- `evaluate.load(module_name, **kwargs)` to instantiate an evaluation module\r\n- `results = module.compute(*kwargs)` to compute the result of an evaluation module\r\n\r\n# Adding a new evaluation module\r\n\r\nFirst install the necessary dependencies to create a new metric with the following command:\r\n```bash\r\npip install evaluate[template]\r\n```\r\nThen you can get started with the following command which will create a new folder for your metric and display the necessary steps:\r\n```bash\r\nevaluate-cli create \"Awesome Metric\"\r\n```\r\nSee this [step-by-step guide](https://huggingface.co/docs/evaluate/creating_and_sharing) in the documentation for detailed instructions.\r\n\r\n## Credits\r\n\r\nThanks to [@marella](https://github.com/marella) for letting us use the `evaluate` namespace on PyPi previously used by his [library](https://github.com/marella/evaluate).\r\n","funding_links":[],"categories":["Evaluation","Python","General","Evaluation and Monitoring","其他_机器学习与深度学习","Popular Libraries","NLP","📊 Metrics","Technical Resources","Resources","📋 Contents","Vendor Examples"],"sub_categories":["Papers/Methods","3. Pretraining","Response Evaluation Metrics","Benchmarks","Implementation Examples","📈 9. Evaluation, Benchmarks \u0026 Datasets","Other Tools"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuggingface%2Fevaluate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhuggingface%2Fevaluate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuggingface%2Fevaluate/lists"}