{"id":13737559,"url":"https://github.com/RuntianZ/doro","last_synced_at":"2025-05-08T14:33:03.970Z","repository":{"id":74128266,"uuid":"375755201","full_name":"RuntianZ/doro","owner":"RuntianZ","description":"Distributional and Outlier Robust Optimization (ICML 2021)","archived":false,"fork":false,"pushed_at":"2021-07-10T01:58:49.000Z","size":673,"stargazers_count":27,"open_issues_count":0,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-11-15T06:32:04.637Z","etag":null,"topics":["distributional-shift","fairness","machine-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RuntianZ.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-06-10T16:00:43.000Z","updated_at":"2024-04-17T01:20:48.000Z","dependencies_parsed_at":"2024-01-27T23:43:10.164Z","dependency_job_id":"34e9e3b1-e4d3-47bd-9499-2300a454b8a9","html_url":"https://github.com/RuntianZ/doro","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RuntianZ%2Fdoro","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RuntianZ%2Fdoro/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RuntianZ%2Fdoro/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RuntianZ%2Fdoro/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RuntianZ","download_url":"https://codeload.github.com/RuntianZ/doro/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253085766,"owners_count":21851696,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distributional-shift","fairness","machine-learning"],"created_at":"2024-08-03T03:01:53.314Z","updated_at":"2025-05-08T14:33:03.104Z","avatar_url":"https://github.com/RuntianZ.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# DORO: Distributional and Outlier Robust Optimization\n**Runtian Zhai\\*, Chen Dan\\*, J. Zico Kolter, Pradeep Ravikumar**  \nIn ICML 2021  \nPaper: [Link](http://proceedings.mlr.press/v139/zhai21a/zhai21a.pdf)\n\n## Table of Contents\n- [Quick Start](#quick-start)\n- [Introduction](#introduction)\n- [DRO is Sensitive to Outliers](#dro-is-sensitive-to-outliers)\n- [DORO](#doro)\n  - [CelebA](#celeba)\n  - [CivilComments-Wilds](#civilcomments-wilds)\n- [Citation and Contact](#citation-and-contact)\n\n## Quick Start\nFor a demonstration on the sensitivity of DRO to outliers, see [this Jupyter notebook](https://drive.google.com/file/d/1z-ugawAr-2rFYPavMksHohEVN_scZwSR/view?usp=sharing) (you can view it online with Google Colab).\n\nTo install the required packages, use\n```shell\npip install -r requirements.txt\n```\nTo run experiments on the CivilComments-Wilds dataset, you need to manually install `torch-scatter` and `torch-geometric` (see instructions [here](#civilcomments-wilds)). \n\nThe algorithms we implement are included in `dro.py`. To run these algorithms on CelebA, use\n```shell\npython celeba.py --data_root [ROOT] --alg [ALG] --alpha [ALPHA] --eps [EPSILON] --seed [SEED] --download\n```\nHere `[ROOT]` is the path to the dataset. `[ALG]` is the algorithm (`erm`, `cvar`, `cvar_doro`, `chisq` or `chisq_doro`). `[ALPHA]` and `[EPSILON]` are the hyperparameters described in the paper. `[SEED]` is the random seed.\n\n\n## Introduction\nWhile DRO has been proved to be effective against subpopulation shift, its performance is significantly downgraded by the outliers existing in the dataset. DORO enhances the outlier-robustness of DRO by filtering out a small fraction of instances with high training loss that are potentially outliers. First we show that DRO is sensitive to outliers with some intriguing experimental results on COMPAS. Then we conduct large-scale experiments on COMPAS, CelebA and CivilComments-Wilds. Our strong theoretical and empirical results demonstrate the effectiveness of DORO.\n\n\n## DRO is Sensitive to Outliers\nIn Section 3 of our paper, we use experimental results on the COMPAS dataset to demonstrate that the original DRO algorithms are not robust to outliers that widely exist in real datasets. We have prepared a [Jupyter notebook](https://drive.google.com/file/d/1z-ugawAr-2rFYPavMksHohEVN_scZwSR/view?usp=sharing) that includes all experiments in this section, which you can view online with Google Colab.\n\n\n## DORO\n\nIn Section 6, we conduct large-experiments on modern datasets. Here we describe how to run the experiments on CelebA and CivilComments-Wilds.\n\n### CelebA\nCelebA official website: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html\n\nWe run the experiments on one NVIDIA GTX 1080ti. To reproduce the results in the paper, please use the following command with the hyperparameters listed in Appendix B.3 of our paper:\n```shell\npython celeba.py --data_root [ROOT] --alg [ALG] --alpha [ALPHA] --eps [EPSILON] --seed [SEED]\n```\nPlease use `--download` to download the dataset if you are running for the first time.\n\n### CivilComments-Wilds\nWe use the CivilComments dataset from the `wilds` package. Please follow the instructions on https://wilds.stanford.edu/get_started/ to use this dataset. Our codes are included in the `wilds-exp` folder, which is based on https://github.com/p-lambda/wilds/tree/main/examples.\n\nWe run the experiments on four NVIDIA Tesla V100s. To reproduce the results in the paper, please use the following command with the hyperparameters listed in Appendix B.3 of our paper:\n```shell\ncd wilds-exp\npython run_expt.py --dataset civilcomments --algorithm doro --root_dir [ROOT] --doro_alg [ALG] --alpha [ALPHA] --eps [EPSILON] --batch_size 128 --data_parallel --evaluate_steps 500 --seed [SEED]\n```\nPlease use `--download` to download the dataset if you are running for the first time.\n\n## Citation and Contact\nPlease use the following BibTex entry to cite this paper:\n```\n\n@InProceedings{pmlr-v139-zhai21a,\n  title = \t {DORO: Distributional and Outlier Robust Optimization},\n  author =       {Zhai, Runtian and Dan, Chen and Kolter, Zico and Ravikumar, Pradeep},\n  booktitle = \t {Proceedings of the 38th International Conference on Machine Learning},\n  pages = \t {12345--12355},\n  year = \t {2021},\n  editor = \t {Meila, Marina and Zhang, Tong},\n  volume = \t {139},\n  series = \t {Proceedings of Machine Learning Research},\n  month = \t {18--24 Jul},\n  publisher =    {PMLR},\n  pdf = \t {http://proceedings.mlr.press/v139/zhai21a/zhai21a.pdf},\n  url = \t {http://proceedings.mlr.press/v139/zhai21a.html}\n}\n\n```\n\nTo contact us, please email to the following address:\n`Runtian Zhai \u003crzhai@cmu.edu\u003e`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRuntianZ%2Fdoro","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FRuntianZ%2Fdoro","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRuntianZ%2Fdoro/lists"}