{"id":22229794,"url":"https://github.com/graph-com/surel","last_synced_at":"2025-07-27T19:31:53.246Z","repository":{"id":40710911,"uuid":"454108165","full_name":"Graph-COM/SUREL","owner":"Graph-COM","description":"[VLDB'22] SUREL is a novel walk-based computation framework for efficient subgraph-based graph representation learning.","archived":false,"fork":false,"pushed_at":"2023-01-28T04:26:54.000Z","size":47,"stargazers_count":19,"open_issues_count":2,"forks_count":4,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-04T07:02:13.101Z","etag":null,"topics":["graph-representation-learning","open-graph-benchmark"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Graph-COM.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-01-31T17:44:03.000Z","updated_at":"2024-10-16T07:35:17.000Z","dependencies_parsed_at":"2023-02-15T14:31:47.985Z","dependency_job_id":null,"html_url":"https://github.com/Graph-COM/SUREL","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Graph-COM/SUREL","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Graph-COM%2FSUREL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Graph-COM%2FSUREL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Graph-COM%2FSUREL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Graph-COM%2FSUREL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Graph-COM","download_url":"https://codeload.github.com/Graph-COM/SUREL/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Graph-COM%2FSUREL/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267413733,"owners_count":24083484,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-27T02:00:11.917Z","response_time":82,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["graph-representation-learning","open-graph-benchmark"],"created_at":"2024-12-03T01:12:27.396Z","updated_at":"2025-07-27T19:31:53.237Z","avatar_url":"https://github.com/Graph-COM.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eSUREL: \u003cins\u003eSu\u003c/ins\u003ebgraph-based Graph \u003cins\u003eRe\u003c/ins\u003epresentation \u003cins\u003eL\u003c/ins\u003eearning Framework\u003c/h1\u003e\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://www.vldb.org/pvldb/vol15/p2788-yin.pdf\"\u003e\u003cimg src=\"https://img.shields.io/badge/arXiv-2202.13538-b31b1b.svg\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/Graph-COM/SUREL\"\u003e\u003cimg src=\"https://img.shields.io/badge/-Github-grey?logo=github\" alt=\"Github\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/Graph-COM/SUREL/blob/main/LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-BSD%202--Clause-red.svg\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://ogb.stanford.edu/docs/leader_linkprop/\"\u003e\u003cimg src=\"https://img.shields.io/badge/OGB-LinkPred-blue\" alt=\"OGBL\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/Graph-COM/SUREL/tree/main/subg_acc\"\u003e\u003cimg src=\"https://img.shields.io/badge/SubGAcc-v1.1-orange\" alt=\"Version\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\nSUREL is a novel walk-based computation framework for efficient large-scale subgraph-base graph representation learning (SGRL). Details on how SUREL works can be found in our VLDB'22 paper [Algorithm and System Co-design for Efficient Subgraph-based Graph Representation Learning](https://arxiv.org/pdf/2202.13538.pdf).\n\nCurrently, we support:\n- Large-scale graph ML tasks: link prediction / relation-type prediction / higher-order pattern prediction\n- Preprocessing and training of datasets in OGB format\n- Python API ([SubG Library](https://github.com/VeritasYin/subg_acc)) for subgraph sampling and joining procedures\n- Single GPU training and evaluation\n- Structural (Relative Position) Encoding + Node Features\n- [VesselGraph](https://paperswithcode.com/dataset/vesselgraph) Dataset\n\nWe are working on expanding the functionality of SUREL to include:\n- Multi-GPU training\n\n## Requirements ##\n(Other versions may work, but are untested)\n* Ubuntu 20.04\n* CUDA \u003e= 10.2\n* python \u003e= 3.8\n* 1.8 \u003c= pytorch \u003c= 1.12\n\n## Datasets\n\nSGRL datasets (`mag-write (P-A)`, `mag-cite (P-P)`, `tags-math`, `DBLP-coauthor`) for relation and higher-order prediction can be accessed via [Zenodo](https://zenodo.org/records/15186012). \n\n## SGRL Environment Setup ##\n\nRequirements: Python \u003e= 3.8, [Anaconda3](https://www.anaconda.com/)\n\n- Update conda:\n```bash\nconda update -n base -c defaults conda\n```\n\n- Install basic dependencies to virtual environment and activate it: \n```bash\nconda env create -f environment.yml\nconda activate sgrl-env\n```\n- **SUREL** now support PyTorch 1.12.1 and PyG 2.2.0. To install them, simply run\n```bash\nconda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch\nconda install pyg -c pyg\n```\nFor more details, please refer to the [PyTorch](https://pytorch.org/) and [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html).\nThe code of this repository is lately tested with Python 3.10.9 + PyTorch 1.12.1 (CUDA 11.3) + torch-geometric 2.2.0.\n\n- Example commends of installation for PyTorch 1.8.0 (CUDA 10.2) and torch-geometric 1.6.3:\n```bash\nconda install pytorch==1.8.0 torchvision torchaudio cudatoolkit=10.2\npip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.8.0+cu102.html\npip install torch-sparse -f https://pytorch-geometric.com/whl/torch-1.8.0+cu102.html\npip install torch-cluster -f https://pytorch-geometric.com/whl/torch-1.8.0+cu102.html\npip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.8.0+cu102.html\npip install torch-geometric==1.6.3\n```\n\n## Quick Start\n\n1. Install required version of PyTorch that is compatible with your CUDA driver\n\n2. Clone the repository `git clone https://github.com/Graph-COM/SUREL`\n\n3. Build and install the [SubG](https://github.com/Graph-COM/SUREL/tree/main/subg_acc) library (v1.1) `cd subg_acc;python3 setup.py install`\n\n- To train **SUREL** for link prediction on Collab:\n```bash\npython main.py --dataset ogbl-collab --metric hit --num_step 4 --num_walk 200 --use_val\n```\n\n- To train **SUREL** for link prediction on Citation2:\n```bash\npython main.py --dataset ogbl-citation2 --metric mrr --num_step 4 --num_walk 100\n```\n\n- To train **SUREL** for relation prediction on MAG(A-P):\n```bash\npython main_hetro.py --dataset mag --relation write --metric mrr --num_step 3 --num_walk 100 --k 10\n```\n\n- To train **SUREL** for higher-order prediction on DBLP:\n```bash\npython main_horder.py --dataset DBLP-coauthor --metric mrr --num_step 3 --num_walk 100\n```\n\n- All detailed training logs can be found at `\u003clog_dir\u003e/\u003cdataset\u003e/\u003ctraining-time\u003e.log`.\n\n## Result Reproduction\nThis section supplements our SUREL paper accepted in VLDB'22. To reproduce the results of SUREL reported in Tables 3 and 4, use the following command:\n* OGBL - Link Prediction\n```bash\npython3 main.py --dataset \u003cdataset\u003e --metric \u003cmetric\u003e --num_step \u003cnum_step\u003e --num_walk \u003cnum_walk\u003e --k \u003ck\u003e\n```\nwhere `dataset` can be either of `ogbl-citation2`, `ogbl-collab` and `ogbl-ppa`; `metric` can be either `mrr` or `hit`.\n* Relation Type Prediction\n```bash\npython main_hetro.py --dataset mag --relation \u003crelation\u003e --metric mrr --num_step \u003cnum_step\u003e --num_walk \u003cnum_walk\u003e --k \u003ck\u003e\n```\nwhere `relation` can be either `write` or `cite`. \n* Higher-order Pattern Prediction\n```bash\npython main_horder.py --dataset \u003cdataset\u003e --metric mrr --num_step \u003cnum_step\u003e --num_walk \u003cnum_walk\u003e --k \u003ck\u003e\n```\nwhere `dataset` can be either `DBLP-coauthor` or `tags-math`.\n\nThe detailed parameter configurations are provided in Table 8, Appendix D of the [arxiv version](https://arxiv.org/abs/2202.13538) of this work. For the profiling of SUREL in Table 4 and Fig. 4 (a-b), please use the parameter setting provided in Appendix D.3. \n\nTo test the scaling performance of Walk Sampler and RPE Joining, functions 'run_walk' and 'sjoin' can be imported and called from the module `surel_gacc`. Please adjust the parameter values of `num_walk`, `num_step` and `nthread` accordingly as Fig. 4 (c-d) shown.\n\nTo perform hyper-parameter analysis of the number of walks 𝑀, the step of walks 𝑚, and the hidden dimension 𝑑, please adjust the parameter values of `num_walk`, `num_step` and `hidden_dim` accordingly as Fig. 5 shown. \n\n\u003cdetails\u003e\n  \u003csummary\u003eSample Output\u003c/summary\u003e\n  \n```text\n2022-03-25 15:57:16,677 - root - INFO - Create log file at ./log/ogbl-citation2/032522_155716.log\n2022-03-25 15:57:16,677 - root - INFO - Command line executed: python main.py --gpu 2 --patience 5 --hidden_dim 64 --seed 0\n2022-03-25 15:57:16,677 - root - INFO - Full args parsed:\n2022-03-25 15:57:16,677 - root - INFO - Namespace(B_size=1500, batch_num=2000, batch_size=32, data_usage=1.0, dataset='ogbl-citation2', debug=False, directed=False, dropout=0.1, eval_steps=100, gpu_id=2, hidden_dim=64, k=50, l2=0.0, layers=2, load_dict=False, load_model=False, log_dir='./log/', lr=0.001, memo=None, metric='mrr', model='RNN', norm='all', nthread=16, num_step=4, num_walk=100, optim='adam', patience=5, repeat=1, res_dir='./dataset/save', rtest=499, save=False, seed=0, stamp='032522_155716', summary_file='result_summary.log', test_ratio=1.0, train_ratio=0.05, use_degree=False, use_feature=False, use_htype=False, use_val=False, use_weight=False, valid_ratio=0.1, x_dim=0)\n2022-03-25 15:57:16,727 - root - INFO - torch num_threads 16\n2022-03-25 15:57:26,536 - root - INFO - eval metric                                                            mrr\ntask type                                                  link prediction\ndownload_name                                                  citation-v2\nversion                                                                  1\nurl                      http://snap.stanford.edu/ogb/data/linkproppred...\nadd_inverse_edge                                                     False\nhas_node_attr                                                         True\nhas_edge_attr                                                        False\nsplit                                                                 time\nadditional node files                                            node_year\nadditional edge files                                                 None\nis hetero                                                            False\nbinary                                                               False\nName: ogbl-citation2, dtype: object\nKeys: ['x', 'edge_index', 'node_year']\n2022-03-25 15:57:26,536 - root - INFO - node size 2927963, feature dim 128, edge size 30387995 with mask ratio 0.05\n2022-03-25 15:57:26,536 - root - INFO - use_weight False, use_coalesce False, use_degree False, use_val False\n2022-03-25 15:57:45,775 - root - INFO - Sparsity of loaded graph 6.727197221716796e-06\n2022-03-25 15:57:45,782 - root - INFO - Observed subgraph with 2918932 nodes and 28836021 edges;\n2022-03-25 15:57:45,789 - root - INFO - Training subgraph with 1394162 nodes and 1519315 edges.\n2022-03-25 15:57:50,400 - root - INFO - #Model Params 79617\n2022-03-25 15:59:14,643 - root - INFO - Samples: valid 8659 by 1000 test 86596 by 1000 metric: mrr\n2022-03-25 15:59:15,405 - root - INFO - Running Round 1\n2022-03-25 15:59:29,229 - root - INFO - Batch 1\tW1502/D1394162\tLoss: 0.1971, AUC: 0.5049\n2022-03-25 15:59:42,266 - root - INFO - Batch 2\tW2991/D1394162\tLoss: 0.1097, AUC: 0.4975\n2022-03-25 15:59:56,187 - root - INFO - Batch 3\tW4431/D1394162\tLoss: 0.1024, AUC: 0.4976\n2022-03-25 16:00:09,070 - root - INFO - Batch 4\tW5761/D1394162\tLoss: 0.1030, AUC: 0.4980\n2022-03-25 16:00:23,285 - root - INFO - Batch 5\tW7215/D1394162\tLoss: 0.1013, AUC: 0.5053\n...\n```\n\u003c/details\u003e\n\n## Usage\n```\nusage: Interface for SUREL framework [-h]\n                                     [--dataset {ogbl-ppa,ogbl-citation2,ogbl-collab,mag,DBLP-coauthor,tags-math}]\n                                     [--model {RNN,MLP,Transformer,GNN}]\n                                     [--layers LAYERS]\n                                     [--hidden_dim HIDDEN_DIM] [--x_dim X_DIM]\n                                     [--data_usage DATA_USAGE]\n                                     [--train_ratio TRAIN_RATIO]\n                                     [--valid_ratio VALID_RATIO]\n                                     [--test_ratio TEST_RATIO]\n                                     [--metric {auc,mrr,hit}] [--seed SEED]\n                                     [--gpu_id GPU_ID] [--nthread NTHREAD]\n                                     [--B_size B_SIZE] [--num_walk NUM_WALK]\n                                     [--num_step NUM_STEP] [--k K]\n                                     [--directed DIRECTED] [--use_feature]\n                                     [--use_weight] [--use_degree]\n                                     [--use_htype] [--use_val] [--norm NORM]\n                                     [--optim OPTIM] [--rtest RTEST]\n                                     [--eval_steps EVAL_STEPS]\n                                     [--batch_size BATCH_SIZE]\n                                     [--batch_num BATCH_NUM] [--lr LR]\n                                     [--dropout DROPOUT] [--l2 L2]\n                                     [--patience PATIENCE] [--repeat REPEAT]\n                                     [--log_dir LOG_DIR] [--res_dir RES_DIR]\n                                     [--stamp STAMP]\n                                     [--summary_file SUMMARY_FILE] [--debug]\n                                     [--abs] [--save] [--load_dict]\n                                     [--load_model] [--memo MEMO]\n```\n\n\u003cdetails\u003e\n  \u003csummary\u003eOptional Arguments\u003c/summary\u003e\n\n```\noptional arguments:\n  -h, --help            show this help message and exit\n  --dataset {mag}       dataset name\n  --relation {write,cite}\n                        relation type\n  --model {RNN,MLP,Transformer,GNN}\n                        base model to use\n  --layers LAYERS       number of layers\n  --hidden_dim HIDDEN_DIM\n                        hidden dimension\n  --x_dim X_DIM         dim of raw node features\n  --data_usage DATA_USAGE\n                        use partial dataset\n  --train_ratio TRAIN_RATIO\n                        mask partial edges for training\n  --valid_ratio VALID_RATIO\n                        use partial valid set\n  --test_ratio TEST_RATIO\n                        use partial test set\n  --metric {auc,mrr,hit}\n                        metric for evaluating performance\n  --seed SEED           seed to initialize all the random modules\n  --gpu_id GPU_ID       gpu id\n  --nthread NTHREAD     number of thread\n  --B_size B_SIZE       set size of train sampling\n  --num_walk NUM_WALK   total number of random walks\n  --num_step NUM_STEP   total steps of random walk\n  --k K                 number of paired negative queries\n  --directed DIRECTED   whether to treat the graph as directed\n  --use_feature         whether to use raw features as input\n  --use_weight          whether to use edge weight as input\n  --use_degree          whether to use node degree as input\n  --use_htype           whether to use node type as input\n  --use_val             whether to use val as input\n  --norm NORM           method of normalization\n  --optim OPTIM         optimizer to use\n  --rtest RTEST         step start to test\n  --eval_steps EVAL_STEPS\n                        number of steps to test\n  --batch_size BATCH_SIZE\n                        mini-batch size (train)\n  --batch_num BATCH_NUM\n                        mini-batch size (test)\n  --lr LR               learning rate\n  --dropout DROPOUT     dropout rate\n  --l2 L2               l2 regularization (weight decay)\n  --patience PATIENCE   early stopping steps\n  --repeat REPEAT       number of training instances to repeat\n  --log_dir LOG_DIR     log directory\n  --res_dir RES_DIR     resource directory\n  --stamp STAMP         time stamp\n  --summary_file SUMMARY_FILE\n                        brief summary of training results\n  --debug               whether to use debug mode\n  --save                whether to save RPE to files\n  --load_dict           whether to load RPE from files\n  --load_model          whether to load saved model from files\n  --memo MEMO           notes\n```\n\u003c/details\u003e\n\n## Citation\nPlease cite our paper if you are interested in our work.\n```\n@article{yin2022algorithm,\n  title={Algorithm and System Co-design for Efficient Subgraph-based Graph Representation Learning},\n  author={Yin, Haoteng and Zhang, Muhan and Wang, Yanbang and Wang, Jianguo and Li, Pan},\n  journal={Proceedings of the VLDB Endowment},\n  volume={15},\n  number={11},\n  pages={2788-2796},\n  year={2022}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraph-com%2Fsurel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgraph-com%2Fsurel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraph-com%2Fsurel/lists"}