{"id":48577092,"url":"https://github.com/rgklab/detectron","last_synced_at":"2026-04-08T15:46:22.281Z","repository":{"id":65850811,"uuid":"600479817","full_name":"rgklab/detectron","owner":"rgklab","description":"Official repository for the ICLR 2023 paper \"A Learning Based Hypothesis Test for Harmful Covariate Shift\"","archived":false,"fork":false,"pushed_at":"2024-01-22T20:19:39.000Z","size":18526,"stargazers_count":10,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-05-15T09:47:48.954Z","etag":null,"topics":["distribution-shift","machine-learning","pytroch","two-sample-test","xgboost"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rgklab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-02-11T16:16:43.000Z","updated_at":"2024-04-14T17:54:44.000Z","dependencies_parsed_at":"2024-01-22T22:09:06.666Z","dependency_job_id":null,"html_url":"https://github.com/rgklab/detectron","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/rgklab/detectron","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rgklab%2Fdetectron","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rgklab%2Fdetectron/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rgklab%2Fdetectron/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rgklab%2Fdetectron/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rgklab","download_url":"https://codeload.github.com/rgklab/detectron/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rgklab%2Fdetectron/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31562696,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T14:31:17.711Z","status":"ssl_error","status_checked_at":"2026-04-08T14:31:17.202Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distribution-shift","machine-learning","pytroch","two-sample-test","xgboost"],"created_at":"2026-04-08T15:46:21.623Z","updated_at":"2026-04-08T15:46:22.268Z","avatar_url":"https://github.com/rgklab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![](media/logo.svg)\n___\n**Official implementation of the ICLR 2023 paper [A Learning Based Hypothesis Test for Harmful Covariate Shift\n](https://arxiv.org/abs/2212.02742)**\n\n![](media/dark_figure.png#gh-dark-mode-only)\n![](media/figure.png#gh-light-mode-only)\n\n## Intro\nWe introduce the **Detectron**, a learning based hypothesis test for harmful covariate shift. Given a pretrained model $f: X\\to Y$ and an unlabeled dataset $Q=\\\\{x\\\\}_{i=1}^n$ Detectron aims to automatically decide if $Q$ is similar enough to the $f(x)$'s training domain such that we can trust it to make reliable predictions.  \n\nThe algorithm works in two major steps:\n\nFirst, we estimate the distribution of the test statistic $\\phi$ which is computed as the *empirical disagreement rate* of a classifier $g(x)$ trained to explicitly disagree with a pretrained model $f(x)$ on i.i.d samples from the training set.  In practice, we create $g(x)$ by finetuning $f(x)$ using the _diagreement cross entropy_ defined formally in the paper. It is also important to limit the hypothesis space for $g(x)$ by forcing it to agree with $f(x)$ on the original training set while giving it a limited compute budget to prevent overfitting. Conceptually we can interpret $\\phi$ as the degree of underspecification $f(x)$ admits on its training domain.\n\n![](media/gif1.gif)\n\nNext, we train another classifier $g^\\star(x)$ in the exact same way as $g(x)$ but we use the unlabeled data $Q$. We detect covariate shift at a significance level $\\alpha$ by comparing the empirical disagreement rate of $g^\\star(x)$ on $Q$ (denoted $\\phi^\\star$) to the estimated distribution of $\\phi$.\n\n![](media/gif2.gif)\n\nIn our paper, we further show how to boost the power of the test using emsembling and by replacing the disagreement statistic $\\phi$ with the related predictive entropy.  \n\n## Benchmarks \nTest power at $5\\%$ significance level for Detectron and baselines. We use a very small sample size of $|Q|=10$. Results for other samples sizes can be found in the paper.\n\n| | CIFAR 10.1 [[Recht et al.]](https://arxiv.org/abs/1806.00451) |\tCamelyon 17 |\tUCI Heart Disease |\n|---| :---: | :---: | :---: |\n|Black Box Shift Detection [[Lipton et al.]](https://arxiv.org/abs/1802.03916)\t|$.07\\pm.03$ | $.05 \\pm .02$ | $.12 \\pm .03$ |\n| Rel. Mahalanobis Distance [[Ren et al.]](https://arxiv.org/abs/2106.09022) | $.05 \\pm .02$ | $.03 \\pm .03$ | $.04 \\pm .02$ |\n|Deep Ensemble (Disagreement) [Ablation]\t| $.05 \\pm .02$ | $.03 \\pm .03$ | $.04 \\pm .02$ |\n|Deep Ensemble (Entropy) [Ablation]\t| $\\mathit{.33 \\pm .05}$ | $\\mathit{.52 \\pm .05}$ | $.68 \\pm .05$ |\n|Classifier Two Sample Test (CTST) [[Lopez-Paz et al.]](https://arxiv.org/abs/1610.06545)|\t $.03 \\pm .02$  |  $.04 \\pm .02$  |   $.04 \\pm .02$ |\n|Deep Kernel MMD [[Liu et al.]](https://arxiv.org/abs/2002.09116)\t| $.24 \\pm .04$ |  $.10 \\pm .03$ |  $.05 \\pm .02$ |\n|H-Divergence [[Zhao et al.]](https://openreview.net/forum?id=KB5onONJIAU)|\t$.02\\pm .01$   |  $.05\\pm .02$ |  $.04\\pm .02$ |\n|**Detectron (Disagreement)** [[Ours]](https://arxiv.org/abs/2212.02742) | $\\mathbf{.37 \\pm .05}$  |  $\\underline{.54 \\pm .05}$  |   $.83 \\pm .04$ |\n|**Detectron (Entropy)** [[Ours]](https://arxiv.org/abs/2212.02742) | $\\underline{.35 \\pm .05}$  |  $\\mathbf{.56 \\pm .05}$  |   $\\mathbf{.92 \\pm .03}$|\n\n The **best** result for each column is bolded, results that are within \u003cins\u003e2% of the best\u003c/ins\u003e are underlined and the _best baseline_ method is italicized.\n\n## Setup\n\n### Environment\n\n`detectron` requires a working build of `pytorch` with the cudatoolkit enabled.\nA simple environment setup using `conda` is provided below.\n\n```shell\n# create and activate conda environment using a python version \u003e= 3.9\nconda create -n detectron python=3.9\nconda activate detectron\n\n# install the latest stable release of pytorch (tested for \u003e= 1.9.0)\nconda install pytorch torchvision cudatoolkit=11.3 -c pytorch\n\n# install additional dependencies with pip\npip install -r requirements.txt\n```\n\n### Datasets\n\nWe provide a simple config system to store dataset path mappings in the file `detectron/config.yml`\n\n```yaml\ndatasets:\n  default: /datasets\n  cifar10_1: /datasets/cifar-10-1\n  camelyon17: /datasets/camelyon17\n```\n\nfor more information on downloading datasets see `detectron/data/sample_data/README.md`.\n\n### Running Detectron\n\nThere is work in progress to package Detectron in a robust and easy to deploy system.\nFor now, all the code needed to reproduce our experiments is in located in the `experiments` directory\nand can be run like the following example.\n\n```shell\n# run the cifar experiment using the standard config\n# use python experiments.detectron_cifar --help for a documented list of options\n❯ python -m experiments.detectron_cifar --run_name cifar\n```\n\n### Evaluating Detectron\n\nThe scratch files will write the output for each seed to a `.pt` file in a directory named `results/\u003crun_name\u003e`.\n\nThe script in `experiments/analysis.py` will read these files and produce a summary of the results for each test\ndescribed in the paper.\n\n```shell\n❯ python -m experiments.analysis --run_name cifar\n# Output\n→ 600 runs loaded\n→ Running Disagreement Test\nN = 10, 20, 50\nTPR: .37 ± .05 AUC: 0.799 | TPR: .54 ± .05 AUC: 0.902 | TPR: .83 ± .04 AUC: 0.981\n→ Running Entropy Test\nN = 10, 20, 50\nTPR: .35 ± .05 AUC: 0.712 | TPR: .56 ± .05 AUC: 0.866 | TPR: .92 ± .03 AUC: 0.981\n\n```\n\n## Citation\n\nPlease use the following citation if you use this code or methods in your own work.\n\n```bibtex\n@inproceedings{\n    ginsberg2023a,\n    title = {A Learning Based Hypothesis Test for Harmful Covariate Shift},\n    author = {Tom Ginsberg and Zhongyuan Liang and Rahul G Krishnan},\n    booktitle = {The Eleventh International Conference on Learning Representations },\n    year = {2023},\n    url = {https://openreview.net/forum?id=rdfgqiwz7lZ}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frgklab%2Fdetectron","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frgklab%2Fdetectron","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frgklab%2Fdetectron/lists"}