{"id":13689335,"url":"https://github.com/paperswithcode/sotabench-eval","last_synced_at":"2025-09-14T15:38:38.445Z","repository":{"id":51144804,"uuid":"209061479","full_name":"paperswithcode/sotabench-eval","owner":"paperswithcode","description":"Easily evaluate machine learning models on public benchmarks","archived":false,"fork":false,"pushed_at":"2024-03-20T15:45:32.000Z","size":2659,"stargazers_count":171,"open_issues_count":4,"forks_count":27,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-06-06T16:50:52.645Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/paperswithcode.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-09-17T13:27:53.000Z","updated_at":"2025-05-15T11:27:25.000Z","dependencies_parsed_at":"2025-01-21T03:00:25.874Z","dependency_job_id":null,"html_url":"https://github.com/paperswithcode/sotabench-eval","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/paperswithcode/sotabench-eval","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paperswithcode%2Fsotabench-eval","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paperswithcode%2Fsotabench-eval/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paperswithcode%2Fsotabench-eval/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paperswithcode%2Fsotabench-eval/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/paperswithcode","download_url":"https://codeload.github.com/paperswithcode/sotabench-eval/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paperswithcode%2Fsotabench-eval/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275125930,"owners_count":25410019,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-14T02:00:10.474Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T15:01:43.992Z","updated_at":"2025-09-14T15:38:38.401Z","avatar_url":"https://github.com/paperswithcode.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\u003cimg width=500 src=\"/docs/docs/img/sotabencheval.png\"\u003e\u003c/p\u003e\n\n--------------------------------------------------------------------------------\n\n[![PyPI version](https://badge.fury.io/py/sotabencheval.svg)](https://badge.fury.io/py/sotabencheval) [![Generic badge](https://img.shields.io/badge/Documentation-Here-\u003cCOLOR\u003e.svg)](https://paperswithcode.github.io/sotabench-eval/)\n\n`sotabencheval` is a framework-agnostic library that contains a collection of deep learning benchmarks you can use to benchmark your models. It can be used in conjunction with the [sotabench](https://www.sotabench.com) service to record results for models, so the community can compare model performance on different tasks, as well as a continuous integration style service for your repository to benchmark your models on each commit.\n\n## Benchmarks Supported\n\n- [ADE20K](https://paperswithcode.github.io/sotabench-eval/ade20k/) (Semantic Segmentation)\n- [COCO](https://paperswithcode.github.io/sotabench-eval/coco/) (Object Detection)\n- [ImageNet](https://paperswithcode.github.io/sotabench-eval/imagenet/) (Image Classification)\n- [SQuAD](https://paperswithcode.github.io/sotabench-eval/squad/) (Question Answering)\n- [WikiText-103](https://paperswithcode.github.io/sotabench-eval/wikitext103/) (Language Modelling)\n- [WMT](https://paperswithcode.github.io/sotabench-eval/wmt/) (Machine Translation)\n\nPRs welcome for further benchmarks! \n\n## Installation\n\nRequires Python 3.6+. \n\n```bash\npip install sotabencheval\n```\n\n## Get Benching! 🏋️\n\nYou should read the [full documentation here](https://paperswithcode.github.io/sotabench-eval/index.html), which contains guidance on getting started and connecting to [sotabench](https://www.sotabench.com).\n\nIntegration is lightweight. For example, if you are evaluating an ImageNet model, you initialize an Evaluator object and (optionally) link to any linked paper:\n\n```python\nfrom sotabencheval.image_classification import ImageNetEvaluator\nevaluator = ImageNetEvaluator(\n             model_name='FixResNeXt-101 32x48d',\n             paper_arxiv_id='1906.06423')\n```\n\nThen for each batch of predictions your model makes on ImageNet, pass a dictionary of keys as image IDs and values as a `np.ndarray`s of logits to the `evaluator.add` method:\n\n```python\nevaluator.add(output_dict=dict(zip(image_ids, batch_output)))\n```\n\nThe evaluation logic just needs to be written in a `sotabench.py` file and sotabench will run it on each commit and record the results:\n\n\u003ca href=\"https://sotabench.com/user/htvr/repos/TouvronHugo/FixRes#latest-results\"\u003e\u003cimg width=500 src=\"/docs/docs/img/results.png\"\u003e\u003c/a\u003e\n\n## Contributing\n\nAll contributions welcome!\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaperswithcode%2Fsotabench-eval","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpaperswithcode%2Fsotabench-eval","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaperswithcode%2Fsotabench-eval/lists"}