Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/amazon-science/tgl
https://github.com/amazon-science/tgl
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/amazon-science/tgl
- Owner: amazon-science
- License: apache-2.0
- Created: 2022-02-20T19:55:34.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-25T09:27:39.000Z (9 months ago)
- Last Synced: 2024-04-30T19:32:23.603Z (5 months ago)
- Language: Python
- Size: 21.8 MB
- Stars: 173
- Watchers: 8
- Forks: 31
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# TGL: A General Framework for Temporal Graph Training on Billion-Scale Graphs
## Overview
This repo is the open-sourced code for our work *TGL: A General Framework for Temporal Graph Training on Billion-Scale Graphs*.
## Requirements
- python >= 3.6.13
- pytorch >= 1.8.1
- pandas >= 1.1.5
- numpy >= 1.19.5
- dgl >= 0.6.1
- pyyaml >= 5.4.1
- tqdm >= 4.61.0
- pybind11 >= 2.6.2
- g++ >= 7.5.0
- openmp >= 201511Our temporal sampler is implemented using C++, please compile the sampler first with the following command
> python setup.py build_ext --inplace## Dataset
[2022/06/29] We noticed that we uploaded the wrong version of the GDELT dataset and have uploaded the correct version. Please re-download all the files in the GDELT folder. Sorry of any inconvenience created.
The four datasets used in our paper are available to download from AWS S3 bucket using the `down.sh` script. The total download size is around 350GB.
To use your own dataset, you need to put the following files in the folder `\DATA\\\`
1. `edges.csv`: The file that stores temporal edge informations. The csv should have the following columns with the header as `,src,dst,time,ext_roll` where each of the column refers to edge index (start with zero), source node index (start with zero), destination node index, time stamp, extrapolation roll (0 for training edges, 1 for validation edges, 2 for test edges). The CSV should be sorted by time ascendingly.
2. `ext_full.npz`: The T-CSR representation of the temporal graph. We provide a script to generate this file from `edges.csv`. You can use the following command to use the script
>python gen_graph.py --data \
3. `edge_features.pt` (optional): The torch tensor that stores the edge featrues row-wise with shape (num edges, dim edge features). *Note: at least one of `edge_features.pt` or `node_features.pt` should present.*
4. `node_features.pt` (optional): The torch tensor that stores the node featrues row-wise with shape (num nodes, dim node features). *Note: at least one of `edge_features.pt` or `node_features.pt` should present.*
5. `labels.csv` (optional): The file contains node labels for dynamic node classification task. The csv should have the following columns with the header as `,node,time,label,ext_roll` where each of the column refers to node label index (start with zero), node index (start with zero), time stamp, node label, extrapolation roll (0 for training node labels, 1 for validation node labels, 2 for test node labels). The CSV should be sorted by time ascendingly.## Configuration Files
We provide example configuration files for five temporal GNN methods: JODIE, DySAT, TGAT, TGN and TGAT. The configuration files for single GPU training are located at `/config/` while the multiple GPUs training configuration files are located at `/config/dist/`.
The provided configuration files are all tested to be working. If you want to use your own network architecture, please refer to `/config/readme.yml` for the meaining of each entry in the yaml configuration file. As our framework is still under development, it possible that some combination of the confiruations will lead to bug.
## Run
Currently, our framework only supports extrapolation setting (inference for the future).
### Single GPU Link Prediction
>python train.py --data \ --config \### MultiGPU Link Prediction
>python -m torch.distributed.launch --nproc_per_node=\ train_dist.py --data \ --config \ --num_gpus \### Dynamic Node Classification
Currenlty, TGL only supports performing dynamic node classification using the dynamic node embedding generated in link prediction.
For Single GPU models, directly run
>python train_node.py --data \ --config \ --model \For multi-GPU models, you need to first generate the dynamic node embedding
>python -m torch.distributed.launch --nproc_per_node=\ extract_node_dist.py --data \ --config \ --num_gpus \ --model \After generating the node embeding for multi-GPU models, run
>python train_node.py --data \ --model \## Security
See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
## Cite
If you use TGL in a scientific publication, we would appreciate citations to the following paper:
```
@article{zhou2022tgl,
title={{TGL}: A General Framework for Temporal GNN Training on Billion-Scale Graphs},
author={Zhou, Hongkuan and Zheng, Da and Nisa, Israt and Ioannidis, Vasileios and Song, Xiang and Karypis, George},
year = {2022},
journal = {Proc. VLDB Endow.},
volume = {15},
number = {8},
}
```## License
This project is licensed under the Apache-2.0 License.