https://github.com/yzhuoning/deepauc_ogb_challenge

Top1 Solution on OGB Challenge (Graph Property Prediction on HIV dataset)
https://github.com/yzhuoning/deepauc_ogb_challenge

auc-optimization auroc drug-discovery gcn molhiv ogb

Last synced: 4 months ago
JSON representation

Top1 Solution on OGB Challenge (Graph Property Prediction on HIV dataset)

Host: GitHub
URL: https://github.com/yzhuoning/deepauc_ogb_challenge
Owner: yzhuoning
Created: 2021-10-08T06:55:30.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2021-10-20T06:45:36.000Z (almost 4 years ago)
Last Synced: 2025-06-22T11:02:25.782Z (4 months ago)
Topics: auc-optimization, auroc, drug-discovery, gcn, molhiv, ogb
Language: Python
Homepage:
Size: 159 KB
Stars: 10
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Deep AUC Maximization on Graph Property Prediction

This repo contains code submission for OGB challenge. Here, we focus on [**ogbg-molhiv**](https://ogb.stanford.edu/docs/leader_graphprop/), which is a binary classification task to predict target molecular property, e.g, whether a molecule inhibits HIV virus replication or not. The evaluation metric is **AUROC**. To our best knowledge, this is the first solution to directly optimize AUC score in this task. Our [**AUC-Margin loss**](https://arxiv.org/abs/2012.03173) improves baseline (DeepGCN) to **0.8159** and achieves SOTA performance **0.8352** when jointly training with Neural FingerPrints. Our approaches are implemented in **[LibAUC](https://github.com/Optimization-AI/LibAUC)**, which is a ML library for AUC optimization.

## Results on ogbg-molhiv

**Our method ranks 1st place as of 10/11/2021 on the leaderboard!** We present our results on the ogbg-molhiv dataset with some strong baselines as below:

| Method             |Test AUROC    |Validation AUROC  | Parameters    | Hardware |

| ------------------ |------------------- | ----------------- | -------------- |----------|

| DeepGCN            | 0.7858±0.0117 | 0.8427±0.0063 | 531,976   | Tesla V100 (32GB) |

| DeeperGCN+FLAG     | 0.7942±0.0120 | 0.8425±0.0061 | 531,976   | Tesla V100 (32GB) |

| Neural FingerPrints| 0.8232±0.0047 | 0.8331±0.0054 | 2,425,102 | Tesla V100 (32GB) |

| Graphormer         | 0.8051±0.0053 | 0.8310±0.0089 | 47,183,040 | Tesla V100 (16GB) |

| **DeepAUC (Ours)**           | **0.8159±0.0059** | 0.8054±0.0080 | 1,019,407  | Tesla V100 (32GB) |

| **DeepAUC+FPs (Ours)**     | **0.8352±0.0054** | 0.8238±0.0061 | 1,019,407**  | Tesla V100 (32GB) |

- Note that this number** doesn't count the parameters of Random Forest model.

## Requirements

1. Install base packages:

    ```bash

    Python>=3.7

    Pytorch>=1.9.0

    tensorflow>=2.0.0

    pytorch_geometric>=1.6.0

    ogb>=1.3.2 

    dgl>=0.5.3 

    numpy==1.20.3

    pandas==1.2.5

    scikit-learn==0.24.2

    deep_gcns_torch

    ```   

2. Install [**LibAUC**](https://github.com/Optimization-AI/LibAUC) (using **AUC-Margin** loss and **PESG** optimizer):

    ```bash

    pip install LibAUC

    ```

    

## Training

The training process has two steps: 1) we train a DeepGCN model using our **[AUC-margin loss](https://arxiv.org/abs/2012.03173)** from scratch. 2) we jointly finetuning the pretrained model from (1) with FingerPrints models. 

### Training from scratch using AUC-margin loss:

- Train [DeepGCN](https://github.com/lightaime/deep_gcns_torch) model with AUC-Margin loss and PESG optimizer by default parameters

```

python main.py --use_gpu --conv_encode_edge --num_layers 14 --block res+ --gcn_aggr softmax --t 1.0 --learn_t --dropout 0.2 \

            --dataset ogbg-molhiv \

	    --loss auroc \

            --optimizer pesg \

            --batch_size 512 \

	    --lr 0.1 \

            --gamma 500 \

            --margin 1.0 \

            --weight_decay 1e-5 \

            --random_seed 0 \

            --epochs 300

```

### Jointly traininig with FingerPrints Model

- Extract fingerprints and train Random Forest by following [PaddleHelix](https://github.com/PaddlePaddle/PaddleHelix/tree/dev/competition/ogbg_molhiv)

```

python extract_fingerprint.py

python random_forest.py

```

- Finetuning pretrained model with FingerPrints model using **[AUC-margin loss](https://arxiv.org/abs/2012.03173)** by default parameters

```

python finetune.py --use_gpu --conv_encode_edge --num_layers 14 --block res+ --gcn_aggr softmax --t 1.0 --learn_t --dropout 0.2 \

            --dataset ogbg-molhiv \

	    --loss auroc \

	    --optimizer pesg \

            --batch_size 512 \

	    --lr 0.01 \

            --gamma 300 \

            --margin 1.0 \

            --weight_decay 1e-5 \

            --random_seed 0 \

            --epochs 100

```

## Results

The results (1) improves the original baseline (DeepGCN) to **0.8159**, which is ~**3%** improvement. The result (2) achieves a higher SOTA performance **0.8352**, which is ~**1%** improvement over previous baselines. For each stage, we train model by 10 times using different random seeds, e.g., 0 to 9. 

Citation

---------

If you have any questions, please open an new issue in this repo or contact us @ [Zhuoning Yuan](https://homepage.divms.uiowa.edu/~zhuoning/) [yzhuoning@gmail.com]. If you find this work useful, please cite the following paper for our method and library: 

```

@inproceedings{yuan2021robust,

	title={Large-scale Robust Deep AUC Maximization: A New Surrogate Loss and Empirical Studies on Medical Image Classification},

	author={Yuan, Zhuoning and Yan, Yan and Sonka, Milan and Yang, Tianbao},

	booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},

	year={2021}

	}

```

Reference 

---------

- https://libauc.org/

- https://github.com/Optimization-AI/LibAUC

- https://github.com/PaddlePaddle/PaddleHelix/tree/dev/competition/ogbg_molhiv

- https://github.com/lightaime/deep_gcns_torch/

- https://ogb.stanford.edu/

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/yzhuoning/deepauc_ogb_challenge

Awesome Lists containing this project

README