{"id":19603892,"url":"https://github.com/divelab/good","last_synced_at":"2025-04-12T14:58:49.482Z","repository":{"id":37264675,"uuid":"500164429","full_name":"divelab/GOOD","owner":"divelab","description":"GOOD: A Graph Out-of-Distribution Benchmark [NeurIPS 2022 Datasets and Benchmarks]","archived":false,"fork":false,"pushed_at":"2025-02-21T23:23:52.000Z","size":17732,"stargazers_count":194,"open_issues_count":0,"forks_count":19,"subscribers_count":2,"default_branch":"GOODv1","last_synced_at":"2025-04-12T14:58:44.832Z","etag":null,"topics":["deep-learning","distribution-shift","graph-neural-networks","graph-ood","invariant-learning","out-of-distribution-generalization","pytorch","pytorch-geometric"],"latest_commit_sha":null,"homepage":"https://good.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/divelab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-06-05T17:24:58.000Z","updated_at":"2025-04-10T04:10:26.000Z","dependencies_parsed_at":"2024-11-15T07:29:42.049Z","dependency_job_id":"6a92be20-969f-4800-804f-fcdf867be61c","html_url":"https://github.com/divelab/GOOD","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/divelab%2FGOOD","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/divelab%2FGOOD/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/divelab%2FGOOD/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/divelab%2FGOOD/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/divelab","download_url":"https://codeload.github.com/divelab/GOOD/tar.gz/refs/heads/GOODv1","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248586249,"owners_count":21128997,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","distribution-shift","graph-neural-networks","graph-ood","invariant-learning","out-of-distribution-generalization","pytorch","pytorch-geometric"],"created_at":"2024-11-11T09:33:26.646Z","updated_at":"2025-04-12T14:58:49.416Z","avatar_url":"https://github.com/divelab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# :sparkles: GOOD: A Graph Out-of-Distribution Benchmark :sparkles:\n\n[license-url]: https://github.com/divelab/GOOD/blob/main/LICENSE\n[license-image]:https://img.shields.io/badge/license-GPL3.0-green.svg\n[contributing-image]:https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat\n[contributing-url]:https://good.readthedocs.io/en/latest/contributing.html\n\n![Last Commit](https://img.shields.io/github/last-commit/divelab/DIG)\n[![License][license-image]][license-url]\n[![codecov](https://codecov.io/gh/divelab/GOOD/branch/main/graph/badge.svg?token=W41HSP0XCY)](https://codecov.io/gh/divelab/GOOD)\n[![CircleCI](https://circleci.com/gh/divelab/GOOD/tree/main.svg?style=svg)](https://circleci.com/gh/divelab/GOOD/tree/main)\n[![GOOD stars](https://img.shields.io/github/stars/divelab/GOOD?style=social)](https://github.com/divelab/GOOD)\n[![Contributing][contributing-image]][contributing-url]\n\n[**Documentation**](https://good.readthedocs.io) | [**NeurIPS 2022 Paper**](https://openreview.net/forum?id=8hHg-zs_p-h) | [Preprint](https://arxiv.org/abs/2206.08452) \n\u003c!-- \u003e We are actively building the document. --\u003e\n\n\u003c!-- [**GOOD: A Graph Out-of-Distribution Benchmark.**](https://arxiv.org/abs/2206.08452) Shurui Gui*, Xiner Li*, Limei Wang, and Shuiwang Ji. --\u003e\n\n\u003c!-- :fire:**New! The GOOD is now also parts of the software library [DIG](https://github.com/divelab/DIG)! If you wish to use the GOOD datasets with DIG features, you can directly use the [DIG](https://github.com/divelab/DIG) library!** --\u003e\n\nThis repo maintains and updates GOOD benchmark which is accepted by NeurIPS 2022 Datasets and Benchmarks Track. :smile:\n\n\u003c!-- For the original code used in the paper, please check branch [GOOD version 0](https://github.com/divelab/GOOD/tree/GOODv0). All new features, datasets and methods will be updated in this branch. --\u003e\n\n## News\n- Algorithm GIL added: [Learning Invariant Graph Representations for Out-of-Distribution Generalization (NeurIPS 2022)](https://openreview.net/forum?id=acKK8MQe2xc) [Mar 11th, 2024]\n- Our new graph OOD work on graph-level tasks: [Joint Learning of Label and Environment Causal Independence for Graph Out-of-Distribution Generalization (NeurIPS 2023)](https://github.com/divelab/LECI).\n\n## Roadmap\n\n### Tutorial\n- [x] More detailed tutorial to add new algorithms. Please refer to [Add a new algorithm](#add-a-new-algorithm).\n### Algorithms\n\n\\* denotes the method is reproduced by its authors.\n\n- [x] [Beta: feedback is welcome] [Learning Invariant Graph Representations for Out-of-Distribution Generalization](https://openreview.net/forum?id=acKK8MQe2xc)\n- [x] [Learning Causally Invariant Representations for Out-of-Distribution Generalization on Graphs](https://arxiv.org/pdf/2202.05441.pdf) [[the official implementation](https://github.com/LFhase/CIGA)]*\n- [x] [Interpretable and Generalizable Graph Learning via Stochastic Attention Mechanism](https://arxiv.org/abs/2201.12987)\n\n### Datasets\nWe are planning to include more graph out-of-distribution datasets for your convenience.\n- [x] Twitter from [this survey](https://ieeexplore.ieee.org/abstract/document/9875989/citations?tabFilter=papers#citations), GOOD style splits shared by [LECI](https://github.com/divelab/LECI).\n- [x] Parts of [DrugOOD](https://github.com/tencent-ailab/DrugOOD) (Task: LBAP, Noise level: core)\n\n### Features\n\n- [x] Updated final result output for an easier result gathering. [Feb 20th updates]\n\n### Leaderboard [Feb 20th updates]\n- [ ] The leaderboard 1.1.0 on latest datasets will have **larger hyperparameter spaces** and **more runs for hyperparameter sweeping**.\n- [ ] Results will be posted on this [leaderboard](https://good.readthedocs.io/en/latest/leaderboard.html) gradually.\n\n## Table of contents\n\n* [Overview](#overview)\n* [Why GOOD?](#why-good)\n* [Installation](#installation)\n* [Quick tutorial](#quick-tutorial)\n* [Add a new algorithm](#add-a-new-algorithm)\n* [Citing GOOD](#citing-good)\n* [License](#license)\n* [Contact](#contact)\n\n## Overview\n\n**GOOD** (Graph OOD) is a graph out-of-distribution (OOD) algorithm benchmarking library depending on PyTorch and PyG\nto make develop and benchmark OOD algorithms easily.\n\nCurrently, GOOD contains 11 datasets with 17 domain selections. When combined with covariate, concept, and no shifts, we obtain 51 different splits.\nWe provide performance results on 12 commonly used baseline methods (ERM, IRM, VREx, GroupDRO, Coral, DANN, MixupForGraph, DIR, GSAT, CIGA, EERM,SRGNN) including 6 graph specific methods with 10 random runs.\n\nThe GOOD dataset summaries are shown in the following figure.\n\n![Dataset](/../../blob/main/docs/source/imgs/Datasets.png)\n\n## Why GOOD?\n\nWhether you are an experienced researcher of graph out-of-distribution problems or a first-time learner of graph deep learning, \nhere are several reasons to use GOOD as your Graph OOD research, study, and development toolkit.\n\n* **Easy-to-use APIs:** GOOD provides simple APIs for loading OOD algorithms, graph neural networks, and datasets so that you can take only several lines of code to start.\n* **Flexibility:** Full OOD split generalization code is provided for extensions and any new graph OOD dataset contributions.\nOOD algorithm base class can be easily overwritten to create new OOD methods.\n* **Easy-to-extend architecture:** In addition to playing as a package, GOOD is also an integrated and well-organized project ready to be further developed.\nAll algorithms, models, and datasets can be easily registered by `register` and automatically embedded into the designed pipeline like a breeze!\nThe only thing the user needs to do is write your own OOD algorithm class, your own model class, or your new dataset class.\nThen you can compare your results with the leaderboard.\n* **Easy comparisons with the leaderboard:** We provide insightful comparisons from multiple perspectives. Any research and studies can use\nour leaderboard results for comparison. Note that this is a growing project, so we will include new OOD algorithms gradually.\nBesides, if you hope to include your algorithms in the leaderboard, please contact us or contribute to this project. A big welcome!\n\n\n## Installation \n\n- Unbuntu \u003e= 18.04\n\n### Conda dependencies\n\nGOOD depends on [PyTorch (\u003e=1.6.0)](https://pytorch.org/get-started/previous-versions/), [PyG (\u003e=2.0)](https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html), and\n[RDKit (\u003e=2020.09.5)](https://www.rdkit.org/docs/Install.html). For more details: [conda environment](/../../blob/main/environment.yml)\n\n\u003e Note that we currently test on PyTorch (==1.10.1), PyG (==2.0.4), RDKit (==2020.09.5); thus we strongly encourage to install these versions.\n\n\u003e **Warning**: Please install with cuda \u003e= 11.3 to avoid unexpected cuda errors.\n\nRecommended installation examples:\n- PyTorch 1.10.1, PyG 2.0.4, RDKit 2020.09.5, CUDA 11.3\n```shell\n# Create your own conda environment, then...\nconda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge\nconda install pyg -c pyg\nconda install -c conda-forge rdkit==2020.09.5\n```\n- PyTorch 2.1.2, PyG 2.5.0, RDKit 2020.09.5, CUDA 11.8\n```shell\nconda install -y pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=11.8 -c pytorch -c nvidia\nconda install -y  pyg -c pyg\npip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.1.0+cu118.html\nconda install -c conda-forge rdkit==2020.09.5 # If newer version is used, the dataset generation may fail or results will be different.\n```\n\n### Pip\n\n#### Installation for Project usages (recommended)\n\n```shell\ngit clone https://github.com/divelab/GOOD.git \u0026\u0026 cd GOOD\npip install -e .\n```\n\n## Quick Tutorial\n\n### Run an algorithm\n\nIt is a good beginning to make it work directly. Here, we provide the CLI `goodtg` (GOOD to go) to \naccess the main function located at `GOOD.kernel.main:goodtg`.\nChoosing a config file in `configs/GOOD_configs`, we can start a task:\n\n```shell\ngoodtg --config_path GOOD_configs/GOODCMNIST/color/concept/DANN.yaml\n```\n\n### Hyperparameter sweeping\n\nTo perform automatic hyperparameter sweeping and job launching, you can use `goodtl` (GOOD to launch):\n\n```shell\ngoodtl --sweep_root sweep_configs --launcher MultiLauncher --allow_datasets GOODMotif --allow_domains basis --allow_shifts covariate --allow_algs GSAT --allow_devices 0 1 2 3\n```\n\n* `--sweep_root` is a config fold located at `configs/sweep_configs`, where we provide a GSAT algorithm hyperparameter sweeping setting example (on GOODMotif dataset, basis domain, and covariate shift). \n  * Each hyperparameter searching range is specified by a list of values. [Example](/../../blob/GOODv1/configs/sweep_configs/GSAT/base.yaml)\n  * These hyperparameter configs will be transformed to be CLI argument combinations.\n  * Note that hyperparameters in inner config files will overwrite the outer ones.\n* `--launcher` denotes the chosen job launcher. Available launchers:\n  * `Launcher`: Dummy launcher, only print.\n  * `SingleLauncher`: Sequential job launcher. Choose the first device in `--allow_devices`.\n  * `MultiLauncher`: Multi-gpu job launcher. Launch on all gpus specified by `--allow_devices`.\n* `--allow_XXX` denotes the job scale. Note that for each \"allow\" combination (e.g. GSAT GOODMotif basis covariate),\nthere should be a corresponding sweeping config: `GSAT/GOODMotif/basis/covaraite/base.yaml` in the fold specified\nby `--sweep_root`.\n* `--allow_devices` specifies the gpu devices used to launch jobs.\n\n### Sweeping result collection and config update.\n\nTo harvest all fruits you have grown (collect all results you have run), please use `goodtl` with a special launcher `HarvestLauncher`:\n\n```shell\ngoodtl --sweep_root sweep_configs --final_root final_configs --launcher HarvestLauncher --allow_datasets GOODMotif --allow_domains basis --allow_shifts covariate --allow_algs GSAT\n```\n\n* `--sweep_root`: We still need it to specify the experiments that can be harvested.\n* `--final_root`: A config store place that will store the best config settings. \nWe will update the best configurations (according to the sweeping) into the config files in it.\n\n(Experimental function.)\n\nThe output numpy array:\n* Rows: In-distribution train/In-distribution test/Out-of-distribution train/Out-of-distribution test/Out-of-distribution validation\n* Columns: Mean/Std.\n\n### Final runs\n\nIt is sometimes not practical to run 10 rounds for hyperparameter sweeping, especially when the searching space is huge.\nTherefore, we can generally run hyperparameter sweeping for 2~3 rounds, then perform all rounds after selecting the best hyperparameters.\nNow, remove the `--sweep_root`, set `--config_root` to your updated best config saving location, and set the `--allow_rounds`.\n\n```shell\ngoodtl --config_root final_configs --launcher MultiLauncher --allow_datasets GOODMotif --allow_domains basis --allow_shifts covariate --allow_algs GSAT --allow_devices 0 1 2 3 --allow_rounds 1 2 3 4 5 6 7 8 9 10\n```\n\nNote that the results are valid only after 3+ rounds experiments in this benchmark.\n\n### Final result collection\n\n```shell\ngoodtl --config_root final_configs --launcher HarvestLauncher --allow_datasets GOODMotif --allow_domains basis --allow_shifts covariate --allow_algs GSAT --allow_rounds 1 2 3 4 5 6 7 8 9 10\n```\n\nOutput: \n**Markdown format table.** (This table is also saved in the file: \u003cProject_root\u003e/result_table.md).\n\nYou can customize your own launcher at `GOOD/kernel/launchers/`.\n\n## Add a new algorithm\n\nPlease follow [this documentation](https://good.readthedocs.io/en/latest/custom.html#practical-steps-to-add-a-new-ood-algorithm) to add a new algorithm.\n\nAny contributions are welcomed! Please refer to [contributing](http://localhost:63342/GOOD/docs/build/contributing.html) for adding your algorithm into GOOD.\n\n[//]: # (## Test)\n\n[//]: # ()\n[//]: # (### Dataset regeneration test)\n\n[//]: # ()\n[//]: # (This test regenerates all datasets again and compares them with the datasets used in the original training process locates.)\n\n[//]: # (Test details can be found at [test_regenerate_datasets.py]\u0026#40;/../../blob/main/test/test_reproduce_full/test_regenerate_datasets.py\u0026#41;.)\n\n[//]: # (For a quick review, we provide a [full regeneration test report]\u0026#40;https://drive.google.com/file/d/1jIShh3eBXAQ_oQCFL9AVU3OpUlVprsbo/view?usp=sharing\u0026#41;.)\n\n[//]: # ()\n[//]: # (### Sampled tests)\n\n[//]: # ()\n[//]: # (In order to keep the validity of our code all the time, we link our project with circleci service and provide several )\n\n[//]: # (sampled tests to go through \u0026#40;because of the limitation of computational resources in CI platforms\u0026#41;.)\n\n## Leaderboard\n\nThe initial leaderboard results are listed in the paper. And the validation of these results is described [here](/../../tree/GOODv0#reproducibility).\n\nLeaderboard 1.1.0 with updated datasets will be available [here](https://good.readthedocs.io/en/latest/leaderboard.html).\n\n## Citing GOOD\nIf you find this repository helpful, please cite our [paper](https://arxiv.org/abs/2206.08452).\n```\n@inproceedings{\ngui2022good,\ntitle={{GOOD}: A Graph Out-of-Distribution Benchmark},\nauthor={Shurui Gui and Xiner Li and Limei Wang and Shuiwang Ji},\nbooktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},\nyear={2022},\nurl={https://openreview.net/forum?id=8hHg-zs_p-h}\n}\n```\n\n## License\n\nThe GOOD datasets are under [MIT license](https://drive.google.com/file/d/1xA-5q3YHXLGLz7xV2tT69a9dcVmiJmiV/view?usp=sharing).\nThe GOOD code are under [GPLv3 license](https://github.com/divelab/GOOD/blob/main/LICENSE).\n\n## Discussion\n\nPlease submit [new issues](/../../issues/new) or start [a new discussion](/../../discussions/new) for any technical or other questions.\n\n## Contact\n\nPlease feel free to contact [Shurui Gui](mailto:shurui.gui@tamu.edu), [Xiner Li](mailto:lxe@tamu.edu), or [Shuiwang Ji](mailto:sji@tamu.edu)!\n\n## Acknowledgements\n\nWe thank Jundong Li and Jing Ma for insightful discussions. This work was supported in part by National Science Foundation grants IIS-1955189, IIS-1908198, and IIS-1908220.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdivelab%2Fgood","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdivelab%2Fgood","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdivelab%2Fgood/lists"}