Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/camille-004/gnn-long-range

Exploring and visualizing limitations of message-passing paradigm for GNNs. 📉
https://github.com/camille-004/gnn-long-range

gnn graph-neural-networks graphs oversmoothing oversquashing pytorch

Last synced: 5 days ago
JSON representation

Exploring and visualizing limitations of message-passing paradigm for GNNs. 📉

Awesome Lists containing this project

README

        

# gnn-long-range
This project contains customizable baseline SoTA message passing graph neural network (GNN) architectures. Training functionality is provided by `pytorch_lightning`, and logging by [Weights & Biases](https://wandb.ai/site). It also consists of experiments attempting to visualize assessments of **oversmoothing** (Rayleigh constant, Dirichlet energy) and **oversquashing** (embedding Jacobian) for node classification. The main purpose of this repository is to explore the limitations of MPNNs for long-range interactions.

## Project Structure

```shell
├── config # --> Contains all config files.
├── logs # --> Contains results.csv from experiments.
├── notebooks # --> Notebooks, to be used for presentation.
├── references # --> List of papers and codebases referenced.
├── reports # --> Reports and figures generated by run script.
├── scripts # --> Run shell scripts for experiments.
└── src
└── data
├── add_edges.py # --> Custom PyG transform for data augmentation.
└── data_module.py # --> NodeDataModule definition.
└── models
├── train.py # --> Training script, based on PyTorch Lightning's `Trainer`.
└── utils.py # --> Utility functions for computing oversmoothing and oversquashing metrics.
├── data_module.py # --> Definitions of graph and node `LightningDataModule`s.
├── utils.py # --> Utility function for loading a configuration.
└── visualize.py # --> Model graphs to save to reports/figures.

```

## Prerequisites

This project is built on `conda` and Python 3.10.

**GPU tools:** These models are built using `torch` v1.13.0 and CUDA v11.7, and this is reflected in `environment_gpu.yaml`. You may change your CUDA and `torch` versions in `environment_gpu.yaml`.

To install all necessary dependencies, create a new `conda` environment:
```shell
conda env create -f environment_cpu.yaml # CPU environment
conda env create -f environment_gpu.yaml # GPU environment
```

In case the `pip` installations hang when running the above, run the following after all conda dependencies are installed.
```shell
pip install -r requirements_cpu.txt # For CPU environment
pip install -r requirements_gpu.txt # For GPU environment
```

## Usage

To run the experiments from the report, simply execute `scripts/run.sh`. The script starts by emptying `logs/results.csv` and `reports/figures`, in which oversmoothing and oversquashing plots will be stored. Note that the results will differ slightly from those presented in the report, as the experiment has been updated to address the footnotes.
Alternatively, to run your own, execute `run.py` with the following parameters:
- `model` - Name of chosen model. gin_jk only supported by the graph classification task.
- `-e, --max_epochs` - *optional*, Maximum number of epochs to run model, if early stopping not converged.
- `-d, --dataset` - *optional*, Name of dataset on which to train model.
- `-a, --activation` - *optional*, Activation function used by neural network.
- `-nh, --n_hidden_layers` - *optional*, Number of hidden layers to include in neural network.
- `-t, --add_edges_thres` - *optional*, Threshold, as a percentage of original edge cardinality, for amount of new random edges to add
- `--n_heads` - *optional*, Number of heads for multi-head attention. GATs only!
- `--jk_mode` - *optional*, Mode of jumping knowledge for graph classification gin_jk.
- `--plot_energy` - *optional*, Plot Dirichlet energy of each layer.
- `--plot_rayleigh` - *optional*, Plot Rayleigh quotient of each layer.
- `--plot_influence` - *optional*, Plot up to r-th-order neighborhood influence on a random node.

You may edit any model hyperparameters, or data and training parameters in the files in the `config` directory.
**Note**: If you get an empty Jacobian when getting the influence scores (resulting in an empty plot), you most likely randomly chose an isolated node. For now, a simple fix would be to change the `seed` in `global_config`.

### Example
```shell
python run.py gin -d pubmed -nh 2 --plot_energy --plot_rayleigh
```
will log performance to `logs/results.csv` and save the following graphs to `reports/figures`: