{"id":15688053,"url":"https://github.com/zhiningliu1998/bat","last_synced_at":"2025-05-05T21:13:12.266Z","repository":{"id":238950273,"uuid":"797477627","full_name":"ZhiningLiu1998/BAT","owner":"ZhiningLiu1998","description":"[ICML'24] BAT: 🚀 Boost Class-imbalanced Node Classification with \u003c10 lines of Code | 从拓扑视角出发10行代码改善类别不平衡节点分类","archived":false,"fork":false,"pushed_at":"2024-11-27T22:27:59.000Z","size":125,"stargazers_count":25,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-05T21:13:11.366Z","etag":null,"topics":["data-augmentation","graph-algorithms","graph-machine-learning","graph-mining","imbalanced-data","imbalanced-learning","machine-learning","node-classification"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ZhiningLiu1998.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-05-07T23:09:53.000Z","updated_at":"2025-04-23T12:26:04.000Z","dependencies_parsed_at":"2024-06-04T23:43:36.480Z","dependency_job_id":"027fc624-34f3-43e2-9c76-9241c43e8c15","html_url":"https://github.com/ZhiningLiu1998/BAT","commit_stats":null,"previous_names":["zhiningliu1998/bat"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZhiningLiu1998%2FBAT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZhiningLiu1998%2FBAT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZhiningLiu1998%2FBAT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZhiningLiu1998%2FBAT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ZhiningLiu1998","download_url":"https://codeload.github.com/ZhiningLiu1998/BAT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252577022,"owners_count":21770721,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-augmentation","graph-algorithms","graph-machine-learning","graph-mining","imbalanced-data","imbalanced-learning","machine-learning","node-classification"],"created_at":"2024-10-03T17:54:11.318Z","updated_at":"2025-05-05T21:13:12.245Z","avatar_url":"https://github.com/ZhiningLiu1998.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![](https://raw.githubusercontent.com/ZhiningLiu1998/figures/master/bat/header.png)\r\n\r\n\u003cp align=\"center\"\u003e\r\n  \u003ca href=\"https://github.com/ZhiningLiu1998/BAT\"\u003e\r\n    \u003cimg src=\"https://img.shields.io/badge/ICML-2024-orange\"\u003e\r\n  \u003c/a\u003e\r\n  \u003ca href=\"https://github.com/ZhiningLiu1998/BAT/blob/master/LICENSE\"\u003e\r\n    \u003cimg src=\"https://img.shields.io/github/license/ZhiningLiu1998/BAT\"\u003e\r\n  \u003c/a\u003e\r\n  \u003ca href=\"https://github.com/ZhiningLiu1998/BAT/issues\"\u003e\r\n    \u003cimg src=\"https://img.shields.io/github/issues/ZhiningLiu1998/BAT\"\u003e\r\n  \u003c/a\u003e\r\n  \u003ca href=\"https://github.com/ZhiningLiu1998/BAT/stargazers\"\u003e\r\n    \u003cimg src=\"https://img.shields.io/github/stars/ZhiningLiu1998/BAT\"\u003e\r\n  \u003c/a\u003e\r\n  \u003ca href=\"https://github.com/ZhiningLiu1998/BAT/network/members\"\u003e\r\n    \u003cimg src=\"https://img.shields.io/github/forks/ZhiningLiu1998/BAT\"\u003e\r\n  \u003c/a\u003e\r\n\u003c/p\u003e\r\n\r\n\u003ch2 align=\"center\"\u003e\r\nBAT: BAlanced Topological augmentation\r\n\u003c/h2\u003e\r\n\u003ch4 align=\"center\"\u003e\r\n\"Class-Imbalanced Graph Learning without Class Rebalancing\" [ICML'24]\u003cbr\u003e\r\nLinks: [\u003ca href=\"https://arxiv.org/abs/2308.14181\"\u003earXiv\u003c/a\u003e] [\u003ca href=\"https://arxiv.org/pdf/2308.14181\"\u003ePDF\u003c/a\u003e]\r\n\u003c/h4\u003e\r\n\r\n🦇 **BAT** (BAlanced Topological augmentation) is a **lightweight, plug-and-play** augmentation technique for **class-imbalanced node classification**. It mitigates the **class-imbalance bias** introduced by ambivalent and distant message-passing on graph topology with PURE topological manipulation. \r\n\r\nBeing model-agnostic and orthogonal to class-balancing techniques (e.g., reweighting/resampling), 🦇 **BAT** can be seamlessly integrated with various GNN architectures and BOOST regular class-balancing-based imbalance-handling methods.\r\n\r\n### 🌈 **BAT Key Features:**\r\n\r\n- 📈 **Scalability**: Linear complexity w.r.t. number of nodes/edges.\r\n- 🔌 **Plug-and-play**: Directly integrates into the training loop with ~10 lines of code.\r\n- 🚀 **Performance**: Up to 46.27% performance boosting and 72.74% predictive bias reduction.\r\n- 🪐 **Versatility**: Work with various GNN backbones ON TOP OF other imbalance-handling techniques.\r\n- 🧑‍💻 **Ease-of-use**: Unified, concise, and extensible API design. No additional hyperparameter.\r\n\r\n### ✂️ **Intergrate [`BatAugmenter`](https://github.com/ZhiningLiu1998/BAT/blob/main/bat.py#L170) (BAT) into your training loop with \u003c10 lines of code:**\r\n```python\r\nfrom bat import BatAugmenter\r\n\r\naugmenter = BatAugmenter().init_with_data(data) # Initialize with graph data\r\n\r\nfor epoch in range(epochs):\r\n    # Augmentation\r\n    x, edge_index, _ = augmenter.augment(model, x, edge_index)\r\n    y, train_mask = augmenter.adapt_labels_and_train_mask(y, train_mask)\r\n    # Original training code\r\n    model.update(x, y, edge_index, train_mask)\r\n```\r\n\r\n### 🤗 Citing BAT\r\nWe appreciate your citation if you find our work helpful!🍻 BibTeX entry:\r\n```bibtex\r\n@misc{liu2024classimbalanced,\r\n      title={Class-Imbalanced Graph Learning without Class Rebalancing}, \r\n      author={Zhining Liu and Ruizhong Qiu and Zhichen Zeng and Hyunsik Yoo and David Zhou and Zhe Xu and Yada Zhu and Kommy Weldemariam and Jingrui He and Hanghang Tong},\r\n      year={2024},\r\n      eprint={2308.14181},\r\n      archivePrefix={arXiv},\r\n      primaryClass={cs.LG}\r\n}\r\n```\r\n\r\n## Table of Contents\r\n- [Table of Contents](#table-of-contents)\r\n- [Usage Example](#usage-example)\r\n  - [Python Scripts](#python-scripts)\r\n  - [Jupyter Notebook](#jupyter-notebook)\r\n  - [Combining BAT with other CIGL baselines](#combining-bat-with-other-cigl-baselines)\r\n- [API reference](#api-reference)\r\n  - [class: `BatAugmenter`](#class-bataugmenter)\r\n    - [Parameters](#parameters)\r\n    - [Methods](#methods)\r\n  - [class: `NodeClassificationTrainer`](#class-nodeclassificationtrainer)\r\n    - [Methods](#methods-1)\r\n- [Emprical Results](#emprical-results)\r\n  - [Experimental Setup](#experimental-setup)\r\n  - [BAT brings significant and universal performance BOOST](#bat-brings-significant-and-universal-performance-boost)\r\n  - [BAT is robust to extreme class-imbalance](#bat-is-robust-to-extreme-class-imbalance)\r\n- [References](#references)\r\n\r\n## Usage Example\r\n\r\n### Python Scripts\r\n\r\n[`train.py`](https://github.com/ZhiningLiu1998/BAT/blob/main/train.py) provides a simple way to test BAT under different settings: datasets, imbalance types, imbalance ratios, GNN architectures, etc. For example, to test BAT's effectiveness on the Cora dataset with a 10:1 step imbalance ratio using the GCN architecture, simply run:\r\n```bash\r\npython train.py --dataset cora --imb_type step --imb_ratio 10 --gnn_arch GCN --bat_mode all\r\n```\r\n\r\nExample Output:\r\n```\r\n================= Dataset [Cora] - StepIR [10] - BAT [dummy] =================\r\nBest Epoch:   97 | train/val/test | ACC: 100.0/67.20/67.50 | BACC: 100.0/61.93/60.55 | MACRO-F1: 100.0/59.65/59.29 | upd/aug time: 4.67/0.00ms | node/edge ratio: 100.00/100.00% \r\nBest Epoch:   67 | train/val/test | ACC: 100.0/65.20/65.00 | BACC: 100.0/60.04/57.70 | MACRO-F1: 100.0/57.21/55.09 | upd/aug time: 3.36/0.00ms | node/edge ratio: 100.00/100.00% \r\nBest Epoch:  131 | train/val/test | ACC: 100.0/66.80/67.90 | BACC: 100.0/63.78/61.71 | MACRO-F1: 100.0/62.26/60.08 | upd/aug time: 3.37/0.00ms | node/edge ratio: 100.00/100.00% \r\nBest Epoch:   60 | train/val/test | ACC: 100.0/66.40/66.30 | BACC: 100.0/61.60/60.74 | MACRO-F1: 100.0/58.04/59.09 | upd/aug time: 3.34/0.00ms | node/edge ratio: 100.00/100.00% \r\nBest Epoch:  151 | train/val/test | ACC: 100.0/63.40/63.70 | BACC: 100.0/58.00/55.99 | MACRO-F1: 100.0/53.57/51.88 | upd/aug time: 3.19/0.00ms | node/edge ratio: 100.00/100.00% \r\nAvg Test Performance (5 runs):  | ACC: 66.08 ± 0.70 | BACC: 59.34 ± 0.96 | MACRO-F1: 57.09 ± 1.40\r\n\r\n================== Dataset [Cora] - StepIR [10] - BAT [bat1] ==================\r\nBest Epoch:   72 | train/val/test | ACC: 100.0/72.00/72.20 | BACC: 100.0/69.65/68.93 | MACRO-F1: 100.0/66.88/67.10 | upd/aug time: 3.12/4.10ms | node/edge ratio: 100.26/101.43% \r\nBest Epoch:  263 | train/val/test | ACC: 100.0/72.80/71.70 | BACC: 100.0/72.59/69.01 | MACRO-F1: 100.0/72.05/68.70 | upd/aug time: 3.51/4.10ms | node/edge ratio: 100.26/101.75% \r\nBest Epoch:  186 | train/val/test | ACC: 100.0/74.00/73.70 | BACC: 100.0/74.37/73.10 | MACRO-F1: 100.0/71.61/71.04 | upd/aug time: 3.36/4.15ms | node/edge ratio: 100.26/101.56% \r\nBest Epoch:   71 | train/val/test | ACC: 100.0/72.40/72.10 | BACC: 100.0/69.50/67.75 | MACRO-F1: 100.0/68.11/66.80 | upd/aug time: 3.31/4.12ms | node/edge ratio: 100.26/101.55% \r\nBest Epoch:   77 | train/val/test | ACC: 100.0/76.20/77.60 | BACC: 100.0/78.03/77.92 | MACRO-F1: 100.0/75.06/76.42 | upd/aug time: 3.34/4.10ms | node/edge ratio: 100.26/101.58% \r\nAvg Test Performance (5 runs):  | ACC: 73.46 ± 0.97 | BACC: 71.34 ± 1.68 | MACRO-F1: 70.01 ± 1.58\r\n```\r\nWe can observe great performance gain brought about by BAT:\r\n\r\n| Metric  | Accuracy | Balanced Accuracy | Macro-F1 Score |\r\n| ------- | -------- | ----------------- | -------------- |\r\n| w/o BAT | 66.08    | 59.34             | 57.09          |\r\n| w/ BAT  | 73.46    | 71.34             | 70.01          |\r\n| Gain    | +7.38    | +12.00            | +12.92         |\r\n\r\n\r\nFull argument list of [`train.py`](https://github.com/ZhiningLiu1998/BAT/blob/main/train.py) and descriptions are as follows:\r\n\r\n```\r\n--gpu_id | int, default=0\r\n    Specify which GPU to use for training. Set to -1 to use the CPU.\r\n\r\n--seed | int, default=42\r\n    Random seed for reproducibility in training.\r\n\r\n--n_runs | int, default=5\r\n    The number of independent runs for training.\r\n\r\n--debug | bool, default=False\r\n    Enable debug mode if set to True.\r\n\r\n--dataset | str, default=\"cora\"\r\n    Name of the dataset to use for training.\r\n    Supports \"cora,\" \"citeseer,\" \"pubmed,\" \"cs\", \"physics\".\r\n\r\n--imb_type | str, default=\"step\", choices=[\"step\", \"natural\"]\r\n    Type of imbalance to handle in the dataset. Choose from \"step\" or \"natural\".\r\n\r\n--imb_ratio | int, default=10\r\n    Imbalance ratio for handling imbalanced datasets.\r\n\r\n--gnn_arch | str, default=\"GCN\", choices=[\"GCN\", \"GAT\", \"SAGE\"]\r\n    Graph neural network architecture to use. Choose from \"GCN,\" \"GAT,\" or \"SAGE.\"\r\n\r\n--n_layer | int, default=3\r\n    The number of layers in the GNN architecture.\r\n\r\n--hid_dim | int, default=256\r\n    Hidden dimension size for the GNN layers.\r\n\r\n--lr | float, default=0.01\r\n    Initial learning rate for training.\r\n\r\n--weight_decay | float, default=5e-4\r\n    Weight decay for regularization during training.\r\n\r\n--epochs | int, default=2000\r\n    The number of training epochs.\r\n\r\n--early_stop | int, default=200\r\n    Patience for early stopping during training.\r\n\r\n--tqdm | bool, default=False\r\n    Enable a tqdm progress bar during training if set to True.\r\n\r\n--bat_mode | str, default=\"all\", choices=[\"dummy\", \"pred\", \"topo\", \"all\"]\r\n    Mode of the BAT. Choose from \"dummy,\" \"pred,\" \"topo,\" or \"all.\"\r\n    if \"dummy,\" BAT is disabled.\r\n    if \"pred,\" BAT is enabled with only prediction-based augmentation.\r\n    if \"topo,\" BAT is enabled with only topology-based augmentation.\r\n    if \"all,\" will run all modes and report the result for comparison.\r\n```\r\n\r\n### Jupyter Notebook\r\n\r\nWe also provide an example jupyter notebook [train_example.ipynb](https://github.com/ZhiningLiu1998/BAT/blob/main/train_example.ipynb) with more experimental results on:\r\n- 3 Datasets:        ['cora', 'citeseer', 'pubmed']\r\n- 3 BAT modes:      ['dummy', 'pred', 'topo']\r\n- 4 Imbalance type-rate combinations: \r\n  - 'step': [10, 20]\r\n  - 'natural': [50, 100]\r\n\r\n### Combining BAT with other CIGL baselines\r\n\r\nWe developed the experiment code that combine BAT and CIGL techniques based on the official implementation of [GraphENS](https://github.com/JoonHyung-Park/GraphENS) and [GraphSMOTE](https://github.com/TianxiangZhao/GraphSmote). The code is available in the `dev` folder, and the main script is [`experiment_example.ipynb`](https://github.com/ZhiningLiu1998/BAT/blob/main/dev/experiment_example.ipynb).\r\n\r\nNote: due to the tightly coupled implementation design of many baselines, it would take much effort to decouple them and integrate BAT with different baselines with a clean and concise API. There is much room for improvement in the code design. Please use the dev code as a reference to integrate BAT with other CIGL baselines, and all kinds of feedback are welcome.\r\n\r\n\r\n## API reference\r\n\r\n### class: `BatAugmenter`\r\n\r\nhttps://github.com/ZhiningLiu1998/BAT/blob/main/bat.py#L170\r\n\r\nMain class that implements the BAT augmentation algorithm, inheriting from [`BaseGraphAugmenter`](https://github.com/ZhiningLiu1998/BAT/blob/main/bat.py#L11).\r\n\r\n```python\r\nclass BatAugmenter(BaseGraphAugmenter):\r\n    \"\"\"\r\n    Balanced Topological (BAT) augmentation for graph data.\r\n\r\n    Parameters:\r\n    - mode: str, optional (default: \"bat1\")\r\n        The augmentation mode. Must be one of [\"dummy\", \"bat0\", \"bat1\"].\r\n        - 'dummy': no augmentation.\r\n        - 'bat0': BAT with 0th order posterior likelihood estimation, linear to #nodes.\r\n        - 'bat1': BAT with 1st order posterior likelihood estimation, linear to #edges\r\n           and generally performs better (recommended).\r\n    - random_state: int or None, optional (default: None)\r\n        Random seed for reproducibility.\r\n    \"\"\"\r\n```\r\n\r\n#### Parameters\r\n- `mode`: str, optional (default: \"bat1\")\r\n  - The augmentation mode. Must be one of [\"dummy\", \"bat0\", \"bat1\"].\r\n    - 'dummy': no augmentation.\r\n    - 'bat0': BAT with 0th order posterior likelihood estimation, linear to #nodes.\r\n    - 'bat1': BAT with 1st order posterior likelihood estimation, linear to #edges and generally performs better (recommended).\r\n- `random_state`: int or None, optional (default: None)\r\n  - Random seed for reproducibility.\r\n\r\n#### Methods\r\n- `init_with_data(data)`: initialize the augmenter with graph data.\r\n  - Parameters: \r\n    - `data` : PyG data object. Expected attributes: `x`, `edge_index`, `y`, `train_mask`, `val_mask`, `test_mask`.\r\n  - Return: \r\n    - `self` : TopoBalanceAugmenter\r\n- `augment(model, x, edge_index)`: perform topology-aware graph augmentation.\r\n  - Parameters: \r\n    - `model` : torch.nn.Module, node classification model\r\n    - `x` : torch.Tensor, node feature matrix\r\n    - `edge_index` : torch.Tensor, sparse edge index\r\n  - Return: \r\n    - `x_aug` : torch.Tensor, augmented node feature matrix\r\n    - `edge_index_aug`: torch.Tensor, augmented sparse edge index\r\n    - `info` : dict, augmentation info\r\n- `adapt_labels_and_train_mask(y, train_mask)`: adapt labels and training mask after augmentation.\r\n  - Parameters: \r\n    - `y` : torch.Tensor, node label vector\r\n    - `train_mask` : torch.Tensor, training mask\r\n  - Return: \r\n    - `new_y` : torch.Tensor, adapted node label vector\r\n    - `new_train_mask` : torch.Tensor, adapted training mask\r\n\r\n### class: `NodeClassificationTrainer`\r\n\r\nhttps://github.com/ZhiningLiu1998/BAT/blob/main/trainer.py#L14\r\n\r\nTrainer class for node classification tasks, centralizing the training workflow: \r\n- (1) model preparation and selection\r\n- (2) performance evaluation\r\n- (3) BAT data augmentation\r\n- (4) verbose logging.\r\n\r\n```python\r\nclass NodeClassificationTrainer:\r\n    \"\"\"\r\n    A trainer class for node classification with Graph Augmenter.\r\n\r\n    Parameters:\r\n    -----------\r\n    - model: torch.nn.Module\r\n        The node classification model.\r\n    - data: pyg.data.Data\r\n        PyTorch Geometric data object containing graph data.\r\n    - device: str or torch.device\r\n        Device to use for computations (e.g., 'cuda' or 'cpu').\r\n    - augmenter: BaseGraphAugmenter, optional\r\n        Graph augmentation strategy.\r\n    - learning_rate: float, optional\r\n        Learning rate for optimization.\r\n    - weight_decay: float, optional\r\n        Weight decay (L2 penalty) for optimization.\r\n    - train_epoch: int, optional\r\n        Number of training epochs.\r\n    - early_stop_patience: int, optional\r\n        Number of epochs with no improvement to trigger early stopping.\r\n    - eval_freq: int, optional\r\n        Frequency of evaluation during training.\r\n    - eval_metrics: dict, optional\r\n        Dictionary of evaluation metrics and associated functions.\r\n    - verbose_freq: int, optional\r\n        Frequency of verbose logging.\r\n    - verbose_config: dict, optional\r\n        Configuration for verbose logging.\r\n    - save_model_dir: str, optional\r\n        Directory to save model checkpoints.\r\n    - save_model_name: str, optional\r\n        Name of the saved model checkpoint.\r\n    - enable_tqdm: bool, optional\r\n        Whether to enable tqdm progress bar.\r\n    - random_state: int, optional\r\n        Seed for random number generator.\r\n    \"\"\"\r\n```\r\n\r\n#### Methods\r\n\r\n- `train`: train the node classification model and perform evaluation.\r\n  - Parameters:\r\n    - `train_epoch`: int, optional. Number of training epochs.\r\n    - `eval_freq`: int, optional. Frequency of evaluation during training.\r\n    - `verbose_freq`: int, optional. Frequency of verbose logging.\r\n  - Return:\r\n    - `model`: torch.nn.Module, trained node classification model.\r\n- `print_best_results`: print the evaluation results of the best model.\r\n\r\n## Emprical Results\r\n\r\n### Experimental Setup\r\n\r\nTo fully validate **BAT**'s performance and compatibility with existing (graph) imbalance-handling techniques and GNN backbones, we test 6 imbalance-handling methods with 5 popular GNN backbone architectures in our experiments, and apply BAT with them under all possible combinations:\r\n\r\n- **Datasets**: Cora, Citeseer, Pubmed, CS, Physics\r\n- **Imbalance-handling techniques**: \r\n  - Reweighting [1]\r\n  - ReNode [2]\r\n  - Oversample [3]\r\n  - SMOTE [4]\r\n  - GraphSMOTE [5]\r\n  - GraphENS [6]\r\n- **GNN backbones**:\r\n  - GCN [7]\r\n  - GAT [8]\r\n  - SAGE [9]\r\n  - APPNP [10]\r\n  - GPRGNN [11]\r\n- **Imbalance types \u0026 ratios**: \r\n  - **Step imbalance**: 10:1, 20:1\r\n  - **Natural imbalance**: 50:1, 100:1\r\n\r\nFor more details on the experimental setup, please refer to our paper: https://arxiv.org/abs/2308.14181.\r\n\r\n### BAT brings significant and universal performance BOOST\r\n\r\nWe first report the detailed empirical results of applying **BAT** with 6 IGL baselines and 5 GNN backbones on 3 imbalanced graphs (Cora, CiteSeer, and PubMed) with IR=10 in Table 1. We highlight the improvement brought about by BAT to the average/best test performane of the 6 IGL baselines. \r\n\r\nResults show that **BAT brought significant and universal performance boost** to all IGL baselines and GNN backbones. In addition to the superior performance in classification, **BAT** also greatly reduces the model predictive bias.\r\n\r\n![](https://raw.githubusercontent.com/ZhiningLiu1998/figures/master/bat/results.png)\r\n\r\n### BAT is robust to extreme class-imbalance\r\n\r\nWe now test **BAT**'s robustness to varying types of extreme class-imbalance. In this experiment, we consider a more challenging scenario with IR = 20, as well as the natural (long-tail) class imbalance that is commonly observed in real-world graphs with IR of 50 and 100. Datasets from (*CS, Physics*) are also included to test **BAT**'s performance on large-scale tasks. Results show that **BAT** consistently demonstrates superior performance in boosting classification and reducing predictive bias.\r\n\r\n![](https://raw.githubusercontent.com/ZhiningLiu1998/figures/master/bat/results_varyimb.png)\r\n\r\nPlease refer to our paper: https://arxiv.org/abs/2308.14181 for more details.\r\n\r\n## References\r\n\r\n| #    | Reference                                                                                                                                                                                                                                  |\r\n| ---- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |\r\n| [1]  | Nathalie Japkowicz and Shaju Stephen. The class imbalance problem: A systematic study. Intelligent data analysis, 6(5):429–449, 2002.                                                                                                      |\r\n| [2]  | Deli Chen, Yankai Lin, Guangxiang Zhao, Xuancheng Ren, Peng Li, Jie Zhou, and Xu Sun. Topology-imbalance learning for semi-supervised node classification. Advances in Neural Information Processing Systems, 34:29885–29897, 2021.        |\r\n| [3]  | Nathalie Japkowicz and Shaju Stephen. The class imbalance problem: A systematic study. Intelligent data analysis, 6(5):429–449, 2002.                                                                                                      |\r\n| [4]  | Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002.                                               |\r\n| [5]  | Tianxiang Zhao, Xiang Zhang, and Suhang Wang. Graphsmote: Imbalanced node classification on graphs with graph neural networks. In Proceedings of the 14th ACM international conference on web search and data mining, pages 833–841, 2021. |\r\n| [6]  | Joonhyung Park, Jaeyun Song, and Eunho Yang. Graphens: Neighbor-aware ego network synthesis for class-imbalanced node classification. In International Conference on Learning Representations, 2022.                                       |\r\n| [7]  | Max Welling and Thomas N Kipf. Semi-supervised classification with graph convolutional networks. In J. International Conference on Learning Representations (ICLR 2017), 2016.                                                             |\r\n| [8]  | Petar Veliˇckovi ́c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. In International Conference on Learning Representations, 2018.                                            |\r\n| [9]  | Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017.                                                                             |\r\n| [10] | Johannes Gasteiger, Aleksandar Bojchevski, and Stephan Günnemann. Predict then propagate: Graph neural networks meet personalized pagerank. arXiv preprint arXiv:1810.05997, 2018.                                                         |\r\n| [11] | Eli Chien, Jianhao Peng, Pan Li, and Olgica Milenkovic. Adaptive universal generalized pagerank graph neural network. arXiv preprint arXiv:2006.07988, 2020.                                                                               |\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzhiningliu1998%2Fbat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzhiningliu1998%2Fbat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzhiningliu1998%2Fbat/lists"}