{"id":13738147,"url":"https://github.com/bethgelab/model-vs-human","last_synced_at":"2025-12-29T23:13:05.181Z","repository":{"id":46053281,"uuid":"373810483","full_name":"bethgelab/model-vs-human","owner":"bethgelab","description":"Benchmark your model on out-of-distribution datasets with carefully collected human comparison data (NeurIPS 2021 Oral)","archived":false,"fork":false,"pushed_at":"2025-04-17T13:23:10.000Z","size":27410,"stargazers_count":343,"open_issues_count":3,"forks_count":54,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-04-18T04:15:11.684Z","etag":null,"topics":["benchmark","pytorch","robustness","tensorflow","toolbox"],"latest_commit_sha":null,"homepage":"https://openreview.net/forum?id=QkljT4mrfs","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bethgelab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"licenses/CODE_LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-06-04T10:49:20.000Z","updated_at":"2025-04-17T13:23:15.000Z","dependencies_parsed_at":"2022-08-12T12:40:28.493Z","dependency_job_id":"e8fa20c6-885a-4ad1-a5b8-293a8dacd6a0","html_url":"https://github.com/bethgelab/model-vs-human","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bethgelab%2Fmodel-vs-human","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bethgelab%2Fmodel-vs-human/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bethgelab%2Fmodel-vs-human/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bethgelab%2Fmodel-vs-human/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bethgelab","download_url":"https://codeload.github.com/bethgelab/model-vs-human/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253096472,"owners_count":21853606,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","pytorch","robustness","tensorflow","toolbox"],"created_at":"2024-08-03T03:02:12.430Z","updated_at":"2025-12-29T23:13:05.172Z","avatar_url":"https://github.com/bethgelab.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"![header](./assets/header/header.png \"header\")\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#trophy-benchmark\"\u003eBenchmark\u003c/a\u003e •\n  \u003ca href=\"#wrench-installation\"\u003eInstallation\u003c/a\u003e •\n  \u003ca href=\"#microscope-user-experience\"\u003eUser experience\u003c/a\u003e •\n  \u003ca href=\"#camel-model-zoo\"\u003eModel zoo\u003c/a\u003e •\n  \u003ca href=\"#file_folder-datasets\"\u003eDatasets\u003c/a\u003e •\n  \u003ca href=\"#credit_card-credit\"\u003eCredit \u0026 citation\u003c/a\u003e\n\u003c/p\u003e\n\n# modelvshuman: Does your model generalise better than humans?\n\n``modelvshuman`` is a Python toolbox to benchmark the gap between human and machine vision. Using this library, both PyTorch and TensorFlow models can be evaluated on 17 out-of-distribution datasets with high-quality human comparison data.\n\n## :trophy: Benchmark\n\nThe top-10 models are listed here; training dataset size is indicated in brackets. Additionally, standard ResNet-50 is included as the last entry of the table for comparison. Model ranks are calculated across the full range of 52 models that we tested. If your model scores better than some (or even all) of the models here, please open a pull request and we'll be happy to include it here!\n\n### Most human-like behaviour\nwinner            | model                           | accuracy difference \u0026#8595;  | observed consistency \u0026#8593; | error consistency \u0026#8593;     | mean rank \u0026#8595;      |\n:----------------:|  ------------------------------ |-----------------------------:|-----------------------------:|------------------------------:|-----------------------:|\n:1st_place_medal: |  [ViT-22B-384](https://arxiv.org/abs/2302.05442): ViT-22B (4B)      |                     **.018** |                     **.783** |                          .258 |                 **1.67**|\n:2nd_place_medal: |  [CLIP](https://arxiv.org/abs/2103.00020): ViT-B (400M)             |                         .023 |                         .758 |                      **.281** |                        3|\n:3rd_place_medal: |  [ViT-22B-560](https://arxiv.org/abs/2302.05442): ViT-22B (4B)      |                         .022 |                         .739 |                      **.281** |                     3.33|\n:clap: |  [SWSL](https://arxiv.org/abs/1905.00546): ResNeXt-101 (940M)                  |                         .028 |                         .752 |                          .237 |                        6|\n:clap: |  [BiT-M](https://arxiv.org/abs/1912.11370): ResNet-101x1 (14M)                 |                         .034 |                         .733 |                          .252 |                        7|\n:clap:            |  [BiT-M](https://arxiv.org/abs/1912.11370): ResNet-152x2 (14M)      |                         .035 |                         .737 |                          .243 |                     7.67|\n:clap:            |  [ViT-L](https://openreview.net/forum?id=YicbFdNTTy) (1M)           |                         .033 |                         .738 |                          .222 |                     9.33|\n:clap:            |  [BiT-M](https://arxiv.org/abs/1912.11370): ResNet-152x4 (14M)      |                         .035 |                         .732 |                          .233 |                    10.33|\n:clap:            |  [BiT-M](https://arxiv.org/abs/1912.11370): ResNet-50x3 (14M)       |                         .040 |                         .726 |                          .228 |                       12|\n:clap:            |  [ViT-L](https://openreview.net/forum?id=YicbFdNTTy) (14M)          |                         .035 |                         .744 |                          .206 |                       12|\n...               |  standard [ResNet-50](https://arxiv.org/abs/1502.01852) (1M)        |                         .087 |                         .665 |                          .208 |                    31.33|\n\n### Highest OOD (out-of-distribution) distortion robustness\n\nwinner            |  model                                                                       |   OOD accuracy \u0026#8593;    |   rank \u0026#8595;    |\n:----------------:|  ----------------------------------------------------------------------------| -------------------------:|------------------:|\n:1st_place_medal: |  [ViT-22B-224](https://arxiv.org/abs/2302.05442): ViT-22B (4B)               |                  **.837** |              **1**|\n:2nd_place_medal: |  [Noisy Student](https://arxiv.org/abs/1911.04252): EfficientNet-L2 (300M)   |                      .829 |                  2|\n:3rd_place_medal: |  [ViT-22B-384](https://arxiv.org/abs/2302.05442): ViT-22B (4B)               |                      .798 |                  3|\n:clap:            |  [ViT-L](https://openreview.net/forum?id=YicbFdNTTy) (14M)                   |                      .733 |                  4|\n:clap:            |  [CLIP](https://arxiv.org/abs/2103.00020): ViT-B (400M)                      |                      .708 |                  5|\n:clap:            |  [ViT-L](https://openreview.net/forum?id=YicbFdNTTy) (1M)                    |                      .706 |                  6|\n:clap:            |  [SWSL](https://arxiv.org/abs/1905.00546): ResNeXt-101 (940M)                |                      .698 |                  7|\n:clap:            |  [BiT-M](https://arxiv.org/abs/1912.11370): ResNet-152x2 (14M)               |                      .694 |                  8|\n:clap:            |  [BiT-M](https://arxiv.org/abs/1912.11370): ResNet-152x4 (14M)               |                      .688 |                  9|\n:clap:            |  [BiT-M](https://arxiv.org/abs/1912.11370): ResNet-101x3 (14M)               |                      .682 |                 10|\n...               |  standard [ResNet-50](https://arxiv.org/abs/1502.01852) (1M)                 |                      .559 |                 34|\n\n## :wrench: Installation\n\nSimply clone the repository to a location of your choice and follow these steps (requires ``python3.8``):\n\n\n1. Set the repository home path by running the following from the command line:\n\n    ```\n    export MODELVSHUMANDIR=/absolute/path/to/this/repository/\n    ```\n\n2. Within the cloned repository, install package:\n\n    ```\n    pip install -e .\n    ```\n    \n    (The -e option makes sure that changes to the code are reflected in the package, which is important e.g. if you add your own model or make any other changes)\n\n## :microscope: User experience\n\nSimply edit ``examples/evaluate.py`` as desired. This will test a list of models on out-of-distribution datasets, generating plots. If you then compile ``latex-report/report.tex``, all the plots will be included in one convenient PDF report.\n\n\n\n## :camel: Model zoo\n\nThe following models are currently implemented:\n\n- [x] 20+ standard supervised models from the [torchvision model zoo](https://pytorch.org/docs/1.4.0/torchvision/models.html)\n- [x] 5 self-supervised contrastive models (InsDis, MoCo, MoCoV2, InfoMin, PIRL) from the [pycontrast repo](https://github.com/HobbitLong/PyContrast/)\n- [x] 3 self-supervised contrastive SimCLR model variants (simclr_resnet50x1, simclr_resnet50x2, simclr_resnet50x4) from the [ptrnet repo](https://github.com/sacadena/ptrnets)\n- [x] 3 vision transformer variants (vit_small_patch16_224, vit_base_patch16_224 and vit_large_patch16_224) from the [pytorch-image-models repo](https://github.com/rwightman/pytorch-image-models)\n- [x] 10 adversarially \"robust\" models from [robust-models-transfer](https://arxiv.org/abs/2007.08489) implemented via the [ptrnet repo](https://github.com/sacadena/ptrnets)\n- [x] 3 \"ShapeNet\" ResNet-50 models with different degree of stylized training from the [texture-vs-shape repo](https://github.com/rgeirhos/texture-vs-shape)\n- [x] 3 BagNets models from the [BagNets repo](https://github.com/wielandbrendel/bag-of-local-features-models#bagnets)\n- [x] 1 semi-supervised ResNet-50 model pre-trained on 940M images from the [semi-supervised-ImageNet1K-models repo](https://github.com/facebookresearch/semi-supervised-ImageNet1K-models)\n- [x] 6 Big Transfer models from the [pytorch-image-models repo](https://github.com/rwightman/pytorch-image-models)\n\nIf you e.g. add/implement your own model, please make sure to compute the ImageNet accuracy as a sanity check.\n\n\n##### How to load a model\nIf you just want to load a model from the model zoo, this is what you can do:\n\n```python\n    # loading a PyTorch model from the zoo\n    from modelvshuman.models.pytorch.model_zoo import InfoMin\n    model = InfoMin(\"InfoMin\")\n\n    # loading a Tensorflow model from the zoo\n    from modelvshuman.models.tensorflow.model_zoo import efficientnet_b0\n    model = efficientnet_b0(\"efficientnet_b0\")\n```\n\nThen, if you have a custom set of images that you want to evaluate the model on, load those (in the example below, called ``images``) and evaluate via:\n\n```python\n    output_numpy = model.forward_batch(images)\n    \n    # by default, type(output) is numpy.ndarray, which can be converted to a tensor via:\n    output_tensor = torch.tensor(output_numpy)\n```\n\nHowever, if you simply want to run a model through the generalisation datasets provided by the toolbox, we recommend to check the section on User experience.\n\n##### How to list all available models\n\nAll implemented models are registered by the model registry, which can then be used to list all available models of a certain framework with the following method:\n\n```python\n    from modelvshuman import models\n    \n    print(models.list_models(\"pytorch\"))\n    print(models.list_models(\"tensorflow\"))\n```\n\n##### How to add a new model\nAdding a new model is possible for standard PyTorch and TensorFlow models. Depending on the framework (pytorch / tensorflow), open ``modelvshuman/models/\u003cframework\u003e/model_zoo.py``. Here, you can add your own model with a few lines of code - similar to how you would load it usually. If your model has a custom model definition, create a new subdirectory called ``modelvshuman/models/\u003cframework\u003e/my_fancy_model/fancy_model.py`` which you can then import from ``model_zoo.py`` via ``from .my_fancy_model import fancy_model``.\n\n\n## :file_folder: Datasets\nIn total, 17 datasets with human comparison data collected under highly controlled laboratory conditions in the [Wichmannlab](http://www.wichmannlab.org) are available.\n\nTwelve datasets correspond to parametric or binary image distortions. Top row: colour/grayscale, contrast, high-pass, low-pass (blurring), phase noise, power equalisation. Bottom row: opponent colour, rotation, Eidolon I, II and III, uniform noise.\n![noise-stimuli](./assets/stimuli_visualizations/noise-stimuli-figure/all_noise-generalisation_stimuli.png  \"noise-stimuli\")\n\nThe remaining five datasets correspond to the following nonparametric image manipulations: sketch, stylized, edge, silhouette, texture-shape cue conflict.\n![nonparametric-stimuli](./assets/stimuli_visualizations/nonparametric-stimuli-figure/all_nonparametric_stimuli.png  \"nonparametric-stimuli\")\n\n##### How to load a dataset\nSimilarly, if you're interested in just loading a dataset, you can do this via:\n```python\n   from modelvshuman.datasets import sketch      \n   dataset = sketch(batch_size=16, num_workers=4)\n```\nNote that the datasets aren't available after installing the toolbox just yet. Instead, they are automatically downloaded the first time a model is evaluated on the dataset (see ``examples/evaluate.py``).\n\n##### How to list all available datasets\n```python\n    from modelvshuman import datasets\n    \n    print(list(datasets.list_datasets().keys()))\n```\n\n##### Download raw test images\nIf you'd like to download the test images yourself, they are availabel [here](https://github.com/bethgelab/model-vs-human/releases/tag/v0.1).\n\n## :credit_card: Credit\n\nPsychophysical data were collected by us in the vision laboratory of the [Wichmannlab](http://www.wichmannlab.org).\n\nThat said, we used existing image dataset sources. 12 datasets were obtained from [Generalisation in humans and deep neural networks](http://papers.nips.cc/paper/7982-generalisation-in-humans-and-deep-neural-networks.pdf). 4 datasets were obtained from [ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness](https://openreview.net/forum?id=Bygh9j09KX). Additionally, we used 1 dataset from [Learning Robust Global Representations by Penalizing Local Predictive Power](https://arxiv.org/abs/1905.13549) (sketch images from ImageNet-Sketch).\n\nWe thank all model authors and repository maintainers for providing the models described above.\n\n### Citation\n\n    @inproceedings{geirhos2021partial,\n      title={Partial success in closing the gap between human and machine vision},\n      author={Geirhos, Robert and Narayanappa, Kantharaju and Mitzkus, Benjamin and Thieringer, Tizian and Bethge, Matthias and Wichmann, Felix A and Brendel, Wieland},\n      booktitle={{Advances in Neural Information Processing Systems 34}},\n      year={2021},\n    }\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbethgelab%2Fmodel-vs-human","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbethgelab%2Fmodel-vs-human","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbethgelab%2Fmodel-vs-human/lists"}