{"id":16832403,"url":"https://github.com/yanndubs/ssl-risk-decomposition","last_synced_at":"2025-04-11T04:32:45.708Z","repository":{"id":59653655,"uuid":"452885353","full_name":"YannDubs/SSL-Risk-Decomposition","owner":"YannDubs","description":"Benchmark and analysis of 165 pretrained SSL models. Code for \"Evaluating Self-Supervised Learning via Risk Decomposition\".","archived":false,"fork":false,"pushed_at":"2023-07-26T21:30:43.000Z","size":155806,"stargazers_count":14,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-25T02:39:57.534Z","etag":null,"topics":["benchmark","deep-learning","evaluation","machine-learning","model-zoo","pytorch","representation-learning","self-supervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/YannDubs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-28T00:01:01.000Z","updated_at":"2025-03-14T02:03:24.000Z","dependencies_parsed_at":"2025-02-18T18:34:53.412Z","dependency_job_id":"cdad9781-d63c-48a5-b359-ce5b0eb504fb","html_url":"https://github.com/YannDubs/SSL-Risk-Decomposition","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YannDubs%2FSSL-Risk-Decomposition","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YannDubs%2FSSL-Risk-Decomposition/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YannDubs%2FSSL-Risk-Decomposition/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YannDubs%2FSSL-Risk-Decomposition/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/YannDubs","download_url":"https://codeload.github.com/YannDubs/SSL-Risk-Decomposition/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248345202,"owners_count":21088231,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","deep-learning","evaluation","machine-learning","model-zoo","pytorch","representation-learning","self-supervised-learning"],"created_at":"2024-10-13T11:48:49.619Z","updated_at":"2025-04-11T04:32:45.641Z","avatar_url":"https://github.com/YannDubs.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Evaluating Self-Supervised Learning via Risk Decomposition [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/YannDubs/lossyless/blob/main/LICENSE) [![Python 3.8+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/)\n\nThis repository contains:\n- [simple API to load](#all-pretrained-models-hyperparameters-results) 169 pretrained SSL models, their pretraining hyperparameters, and all results.\n- [simple code](#computing-the-loss-decomposition) to compute the loss decomposition of any SSL model.\n- [the code](#reproducing-results) to reproduce all results and figures from [Evaluating Self-Supervised Learning via Risk Decomposition](https://arxiv.org/abs/2302.03068)\n\n  Other resources:\n  - [ICML 2023 Oral](https://youtu.be/otDrub-x5KY)\n  - [Paper](https://arxiv.org/abs/2302.03068)\n  - [Tweet](https://twitter.com/yanndubs/status/1684314179019087872)\n\n## All pretrained models, hyperparameters, results\n\nWe release all pretrained weights, hyperparameters, and results on `torch.hub`, which can be loaded using:\n\n```python\nimport torch\n\n# loads the desired pretrained model and preprocessing pipeline\nname = \"dino_rn50\" # example\nmodel, preprocessor = torch.hub.load('YannDubs/SSL-Risk-Decomposition:main', name, trust_repo=True)\n\n# gets all available models \navailable_names = torch.hub.list('YannDubs/SSL-Risk-Decomposition:main')\n\n# gets all results and hyperparameters as a dataframe \nresults_df = torch.hub.load('YannDubs/SSL-Risk-Decomposition:main', \"results_df\")\n```\n\nThe necessary dependencies are: \n- for **most models**: `pip install torch torchvision tqdm timm pandas`\n- for **all models**: `pip install torch torchvision tqdm timm dill open_clip_torch git+https://github.com/openai/CLIP.git`\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eDetails\u003c/b\u003e\u003c/summary\u003e\n    \n- `timm`: for any ViT architecture\n- `pandas`: for results_df, metadata_df\n- `dill`: for BYOL\n- `open-clip-torch`: for OpenCLIP\n- `git+https://github.com/openai/CLIP.git`: for CLIP \n\n\u003c/details\u003e\n\n## Computing the loss decomposition\nHere's a minimal code to compute the loss decomposition. \n```python\n\ndef compute_risk_components(model_ssl, D_train, D_test, model_sup=None, n_sub=10000, **kwargs):\n    \"\"\"Computes the SSL risk decomposition for `model_ssl` using a given training and testing set.\n    \n    If we are given a supervised `model_sup` of the same architecture as model_ssl, we compute the \n    approximation error. Else we merge it with usability error given that approx error is neglectable.\n    \"\"\"\n    errors = dict()\n    \n    # featurize data to make probing much faster. Optional.\n    D_train = featurize_data(model_ssl, D_train)\n    D_test = featurize_data(model_ssl, D_test)\n    \n    D_comp, D_sub = data_split(D_train, n=n_sub)\n    \n    r_A_F = train_eval_probe(D_train, D_train, **kwargs)\n    r_A_S = train_eval_probe(D_comp, D_sub, **kwargs)\n    r_U_S = train_eval_probe(D_train, D_test, **kwargs)\n    \n    if model_sup is not None:\n        D_train_sup = featurize_data(model_sup, D_train)\n        errors[\"approx\"] = train_eval_probe(D_train_sup, D_train_sup, **kwargs)\n        errors[\"usability\"] = r_A_F - errors[\"approx\"]\n    else:\n        errors[\"usability\"] = r_A_F # merges both errors but approx is neglectable\n        \n    errors[\"probe_gen\"] = r_A_S - r_A_F\n    errors[\"encoder_gen\"] = r_U_S - r_A_S \n    errors[\"agg_risk\"] = r_U_S\n    return errors\n\ndef featurize_data(model, dataset):\n    \"\"\"Featurize a dataset using the model.\"\"\"\n    ...\n\n\ndef train_eval_probe(D_train, D_test, **kwargs):\n    \"\"\"Trains a model (encoder and probe) on D_train and evaluates it on D_test\"\"\"\n    ...\n\ndef data_split(dataset, n):\n    \"\"\"Split a dataset into a set of size n and its complement\"\"\"\n    ...\n```\n\nFor a minimal notebook computing the loss decomposition and a specific implementations for the above functions see: [![Minimal training of DISSL](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/YannDubs/SSL-Risk-Decomposition/blob/main/notebooks/minimal.ipynb).\n\nFor the actual code that we used (includes hyperparameter tuning) see: [main_fullshot.py](https://github.com/YannDubs/SSL-Risk-Decomposition/blob/main/main_fullshot.py)\n\n## Reproducing results\n\nSteps to reproduce all the paper:\n0. Install `requirements_running.txt` (pip) or `environment_running.yml` (conda) to compute all risk components.\n1. to recompute all risk components run: `scripts/run_all.sh`. To recompute specific models the corresponding script in `scripts/` with the correct server (see `config/server`). E.g. `scripts/simsiam.sh -s nlprun`\n2. to recompute all few shot evaluation run: `script_sk/run_all.sh` (we use sklearn instead of pytorch for that).\n3. Install `requirements_analyzing.txt` (pip) or `environment_analyzing.yml` (conda) to analyze all results.\n4. to reproduce all the analysis and plot from the main_paper run `notebooks/main_paper.ipynb`\n5. to reproduce all the analysis and plot from the appendices run `notebooks/appcs.ipynb`\n\n## Contributing\n\nIf you have a pretrained model that you would like to add, please open a PR with the following:\n1. In `hub/` the files and code to load your model. Then in `hubconf.py` add a one line function that loads the desired model. The name of that function will be the name of the model in `torch.hub`. Make sure that you load everything from `hub/` using a learning underscore. Follow previous examples.\n2. Add all the hyperparameters and metadata in `metadata.yaml`. Documentation of every field can be found at the top of that file. Follow previous examples.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyanndubs%2Fssl-risk-decomposition","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyanndubs%2Fssl-risk-decomposition","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyanndubs%2Fssl-risk-decomposition/lists"}