https://github.com/lirongwu/graphmixup

Code for ECML-PKDD 2022 paper "GraphMixup: Improving Class-Imbalanced Node Classification by Reinforcement Mixup and Self-supervised Context Prediction"
https://github.com/lirongwu/graphmixup

graph-algorithms graph-self-supervised-learning imbalanced-classification imbalanced-data reinforcement-learning

Last synced: 11 months ago
JSON representation

Code for ECML-PKDD 2022 paper "GraphMixup: Improving Class-Imbalanced Node Classification by Reinforcement Mixup and Self-supervised Context Prediction"

Host: GitHub
URL: https://github.com/lirongwu/graphmixup
Owner: LirongWu
License: mit
Created: 2022-06-21T03:28:14.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2023-06-07T07:29:54.000Z (about 3 years ago)
Last Synced: 2025-04-13T16:56:28.745Z (about 1 year ago)
Topics: graph-algorithms, graph-self-supervised-learning, imbalanced-classification, imbalanced-data, reinforcement-learning
Language: Python
Homepage:
Size: 30.9 MB
Stars: 23
Watchers: 2
Forks: 6
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# GraphMixup

This is a PyTorch implementation of the GraphMixup, and the code includes the following modules:

* Dataset Loader (Cora, BlagCatalog, and Wiki-CS)

* Various Architectures (GCN, SAGE, GAT, and SEM)

* Five compared baselines (Origin, Over-Sampling, Re-weight, SMOTE, and Embed-SMOTE)

* Training paradigm (joint learning, pre-training, and fine-tuning) for node classification on three datasets

* Visualization and evaluation metrics

## Main Requirements

* networkx==2.5
* numpy==1.19.2
* scikit-learn==0.24.1
* scipy==1.5.2
* torch==1.6.0

## Description

* train.py
* train() -- Train a new model for node classification task on the *Cora, BlagCatalog, and Wiki-CS* datasets
* test() -- Test the learned model for node classification task on the *Cora, BlagCatalog, and Wiki-CS* datasets
* save_model() -- Save the pre-trained model
* load_model() -- Load model for fine-tuning
* data_load.py

* load_cora() -- Load Cora Dataset
* load_BlogCatalog() -- Load BlogCatalog Dataset
* load_wiki_cs() -- Load Wiki-CS Dataset
* models.py

* GraphConvolution() -- GCN Layer
* SageConv() -- SAGE Layer
* SemanticLayer() -- Semantic Feature Layer
* GraphAttentionLayer() -- GAT Layer
* PairwiseDistance() -- Perform self-supervised Local-Path Prediction
* DistanceCluster() -- Perform self-supervised Global-Path Prediction
* utils.py
* src_upsample() -- Perform interpolation in the input space
* src_smote() -- Perform interpolation in the embedding space
* mixup() -- Perform mixup in the semantic relation space
* QLearning.py
* GNN_env() -- Calculate rewards, perform actions, and update states
* isTerminal() -- Determine whether the termination conditions have been met

## Running the code

1. Install the required dependency packages

3. To get the results on a specific *dataset*, first run with proper hyperparameters to perform pre-training

```
python train.py --dataset data_name --setting pre-train
```

where the *data_name* is one of the 3 datasets (CCora, BlagCatalog, and Wiki-CS). The pre-trained model will be saved to the corresponding checkpoint folder in **./checkpoint** for evaluation.

3. To fine-tune the pre-trained model, run

```
python train.py --dataset data_name --setting fine-tune --load model_path
```

where the *model_path* is the path where the pre-trained model is saved.

4. We provide five compared baselines in this code. They can be configured via the '--setting' arguments:

- Origin: Vanilla backbone models with *'--setting raw'*
- Over-Sampling: Repeat nodes in the minority classes with *'--setting over-sampling'*
- Re-weight: Give samples from minority classes a larger weight when calculating the loss with *'--setting re-weight'*
- SMOTE: Interpolation in the input space with *'--setting smote'*
- Embed-SMOTE: Perform SMOTE in the intermediate embedding space with *'--setting embed_smote'*

Use *Embed-SMOTE* as an example:

```
python train.py --dataset cora --setting embed_smote
```

## Citation

If you find this project useful for your research, please use the following BibTeX entry.

```
@inproceedings{wu2023graphmixup,
title={Graphmixup: Improving class-imbalanced node classification by reinforcement mixup and self-supervised context prediction},
author={Wu, Lirong and Xia, Jun and Gao, Zhangyang and Lin, Haitao and Tan, Cheng and Li, Stan Z},
booktitle={Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19--23, 2022, Proceedings, Part IV},
pages={519--535},
year={2023},
organization={Springer}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lirongwu/graphmixup

Awesome Lists containing this project

README