{"id":29248858,"url":"https://github.com/hkuds/opengraph","last_synced_at":"2025-10-07T17:08:24.018Z","repository":{"id":224914892,"uuid":"761026112","full_name":"HKUDS/OpenGraph","owner":"HKUDS","description":"[EMNLP'2024] \"OpenGraph: Towards Open Graph Foundation Models\"","archived":false,"fork":false,"pushed_at":"2024-10-11T04:10:40.000Z","size":63293,"stargazers_count":314,"open_issues_count":0,"forks_count":34,"subscribers_count":15,"default_branch":"main","last_synced_at":"2025-07-04T00:09:15.344Z","etag":null,"topics":["data-augmentation","fundation-models","graph-algorithms","graph-fundation-models","graph-generation","graph-learning","graph-neural-networks","large-langage-models","large-language-models","transformer"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2403.01121","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HKUDS.png","metadata":{"files":{"readme":"README.md","changelog":"History/pretrn_gen0.his","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-21T05:11:26.000Z","updated_at":"2025-07-01T01:06:37.000Z","dependencies_parsed_at":"2024-06-05T17:27:13.127Z","dependency_job_id":null,"html_url":"https://github.com/HKUDS/OpenGraph","commit_stats":null,"previous_names":["hkuds/opengraph"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/HKUDS/OpenGraph","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUDS%2FOpenGraph","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUDS%2FOpenGraph/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUDS%2FOpenGraph/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUDS%2FOpenGraph/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HKUDS","download_url":"https://codeload.github.com/HKUDS/OpenGraph/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUDS%2FOpenGraph/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278811851,"owners_count":26050183,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-07T02:00:06.786Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-augmentation","fundation-models","graph-algorithms","graph-fundation-models","graph-generation","graph-learning","graph-neural-networks","large-langage-models","large-language-models","transformer"],"created_at":"2025-07-04T00:09:15.414Z","updated_at":"2025-10-07T17:08:24.013Z","avatar_url":"https://github.com/HKUDS.png","language":"Python","readme":"# OpenGraph: Towards Open Graph Foundation Models\n\n\u003cdiv align='center'\u003e\n\u003ca href='https://arxiv.org/abs/2403.01121'\u003e\u003cimg src='https://img.shields.io/badge/Paper-green' /\u003e\u003c/a\u003e\n\u003ca href='https://mp.weixin.qq.com/s/nughdr2OQUGevdDzAQphjw'\u003e\u003cimg src='https://img.shields.io/badge/公众号-blue' /\u003e\u003c/a\u003e\n\u003ca href='https://blog.csdn.net/weixin_43902773/article/details/136680880'\u003e\u003cimg src='https://img.shields.io/badge/CSDN-orange' /\u003e\u003c/a\u003e\n\u003cimg src=\"https://badges.pufler.dev/visits/hkuds/opengraph?style=flat-square\u0026logo=github\"\u003e\n\u003cimg src='https://img.shields.io/github/stars/hkuds/opengraph?color=green\u0026style=social' /\u003e\n\n\u003ca href='https://akaxlh.github.io/'\u003eLianghao Xia\u003c/a\u003e, \u003ca href='https://scholar.google.com/citations?user=TwSParMAAAAJ'\u003eBen Kao\u003c/a\u003e, and \u003ca href='https://sites.google.com/view/chaoh/group-join-us'\u003eChao Huang*\u003c/a\u003e (*Correspondence)\n\n\u003cimg src='imgs/opengraph_article_cover_full.png' /\u003e\n\n\n\nPresenting OpenGraph, a foundation graph model \u003cb\u003e\u003ci\u003edistilling zero-shot graph generalizability from LLMs\u003c/i\u003e\u003c/b\u003e.\n\n\u003cimg src='imgs/intro.png' width=60% /\u003e\n\n\u003c/div\u003e\n\nTo achieve this goal, OpenGraph addresses several key technical challenges:\n- We propose a unified graph tokenizer to adapt our graph model to generalize well on unseen graph data, even when the underlying graph properties differ significantly from those encountered during training. \n- We develop a scalable graph transformer as the foundational encoder, which effectively and efficiently captures node-wise dependencies within the global topological context. \n- We introduce a data augmentation mechanism enhanced by a large language model (LLM) to alleviate the limitations of data scarcity in real-world scenarios.\n\n\u003cimg src='imgs/framework.png' /\u003e\n\nExtensive experiments validate the effectiveness of our framework. By adapting OpenGraph to new graph characteristics and comprehending the nuances of diverse graphs, our approach achieves remarkable zero-shot graph learning performance across various settings and domains.\n\n## Environment Setup\nYou need to unzip some of the data files in `datasets/`. Download the pre-trained models using the link in `Models/readme`. Our experiments were conducted with the following package versions:\n* python==3.10.13\n* torch==1.13.0\n* numpy==1.23.4\n* scipy==1.9.3\n\n## Brief Code Structure\nHere is a brief overview of the code structures. The explanations for each directory are enclosed in quotes (##...##). For a more detailed version, please refer to the full version listed at the end of this readme.\n```\n./\n│   └── README.md\n│   ├── History/ ## Training history of pre-trained models ##\n│   ├── Models/ ## Pre-trained models ##\n│   ├── datasets/\n│   ├── graph_generation/ ## Code and examples for graph generation ##\n│   ├── imgs/ ## Images used in readme ##\n│   ├── link_prediction/ ## code for link prediction and pre-training ##\n│   │   ├── data_handler.py\n│   │   ├── main.py\n│   │   ├── model.py\n│   │   └── params.py\n│   │   ├── Utils/\n│   │   │   └── TimeLogger.py\n│   ├── node_classification/ ## code for testing on node classification ##\n│   │   ├── data_handler.py\n│   │   ├── main.py\n│   │   ├── model.py\n│   │   └── params.py\n│   │   ├── Utils/\n│   │   │   └── TimeLogger.py\n```\n\n## Usage\n#### To reproduce the test performance reported in the paper, run the following command lines:\n```\ncd link_prediction/\npython main.py --load pretrn_gen1 --epoch 0 # test on OGBL-Collab, ML-1M, ML-10M\npython main.py --load pretrn_gen0 --tstdata amazon-book --epoch 0 # test on Amazon-Book\npython main.py --load pretrn_gen2 --tstdata ddi --epoch 0 # test on OGBL-ddi\ncd ../node_classification/\npython main.py --load pretrn_gen1 --tstdata cora # test on Cora\npython main.py --load pretrn_gen1 --tstdata citeseer # test on Citeseer\npython main.py --load pretrn_gen1 --tstdata pubmed # test on Pubmed\n```\n\n#### To re-pretrain OpenGraph by yourself, run the following command lines:\n```\ncd ../link_prediction/\npython main.py --save pretrn_gen1\npython main.py --trndata gen0 --tstdata amazon-book --save pretrn_gen0\npython main.py --trndata gen2 --tstdata ddi --save pretrn_gen2\n```\n\n#### To explore pretraining with multiple different pre-training and testing datasets, modify `trn_datasets` and `tst_datasets` in line 241 of `link_prediction/main.py`.\n\n## Graph Data Generation\nThe graph generation code is in `graph_generation/`. A toy dataset of small size is given. You need to fill in your OpenAI key in `Utils.py` and `itemCollecting_dfsIterator.py` first. To generate your dataset, modify the `descs` and `hyperparams` dicts, and follow the following procedure:\n```\ncd graph_generation/\npython itemCollecting_dfsIterator.py\npython instance_number_estimation_hierarchical.py\npython embedding_generation.py\npython human_item_generation_gibbsSampling_embedEstimation.py\npython make_adjs.py\n```\n\nBelow shows our prompt template, as well as examples for prompt configurations and generated nodes.\n\n\u003cimg src='imgs/prompt.png' width=60% /\u003e\n\n## Evaluation Results\n\n### Overall Generalization Performance\nOpenGraph achives best performance under the 0-shot setting, compared to baselines trained/tuned with 1-shot and 5-shot data.\n\u003cimg src='imgs/performance.png' /\u003e\n\n### Pre-training Dataset Study\nWe studied the influence of using different pre-training datasets. Results below indicate that:\n- The generation techniques (Norm, Loc, Topo) have positive effects on performance.\n- Real-world datasets (Yelp2018, Gowalla) may yield worse results compared to our generated ones.\n- A relevant pre-training dataset (ML-10M for test data ML-1M and ML-10M) results in superior performance.\n\n\u003cimg src='imgs/impact of datasets.png' width=60% /\u003e\n\n### Graph Tokenizer Study\nWe tuned configurations of our unified graph tokenizer, by adjusting the order of graph smoothing, and replacing our topology-aware projection with alternatives. Our findings include:\n- **Adjacency smoothing is important**, as OpenGraph with 0-order smoothing yields inferior performance.\n- **Topology-aware projection is superior in performance**. Alternatives include *One-hot* which learns a big and unified representation table for all datasets, *Random* which holds no assumption for the node-wise relations and distributes them uniformly, *Degree* which is a widely-used method for non-attributed graphs and seems applicable for cross-graph scenario.\n\n\u003cimg src='imgs/graph tokenizer.png' width=60% /\u003e\n\n### Sampling Techniques Study\nWe ablated the two sampling techniques in the graph transformer, and show their positive effects on both memory and time costs below. Suprisingly, token sequence sampling shows a positive effect over the model performance.\n\n\u003cimg src='imgs/sampling.png' width=60% /\u003e\n\n## Citation\nIf you find this work useful for your research, please consider citing our paper:\n```\n@inproceedings{xia2024opengraph,\n  title={OpenGraph: Towards Open Graph Foundation Models},\n  author={Xia, Lianghao and Kao, Ben and Huang, Chao},\n  booktitle={EMNLP},\n  year={2024}\n}\n```\n\n## Detailed Code Structures\n```\n./\n│   └── README.md\n│   ├── History/ ## Training history of pre-trained models ##\n│   │   ├── pretrn_gen0.his\n│   │   ├── pretrn_gen2.his\n│   │   └── pretrn_gen1.his\n│   ├── Models/ ## Pre-trained models ##\n│   │   └── readme ## Download pre-trained models using the link inside ##\n│   ├── datasets/\n│   │   ├── amazon-book/\n│   │   │   ├── fewshot_mat_1.pkl\n│   │   │   ├── trn_mat.pkl.zip ## Unzip it manually ##\n│   │   │   ├── tst_mat.pkl\n│   │   │   └── fewshot_mat_5.pkl\n│   │   ├── citeseer/\n│   │   │   ├── adj_-1.pkl\n│   │   │   ├── adj_1.pkl\n│   │   │   ├── adj_5.pkl\n│   │   │   ├── feats.pkl.zip ## Unzip it manually ##\n│   │   │   ├── label.pkl\n│   │   │   ├── mask_-1.pkl\n│   │   │   ├── mask_1.pkl\n│   │   │   └── mask_5.pkl\n│   │   ├── collab/\n│   │   │   ├── fewshot_mat_5.pkl\n│   │   │   ├── trn_mat.pkl.zip ## Unzip it manually ##\n│   │   │   ├── tst_mat.pkl\n│   │   │   ├── val_mat.pkl\n│   │   │   └── fewshot_mat_1.pkl\n│   │   ├── cora/\n│   │   │   ├── adj_-1.pkl\n│   │   │   ├── adj_1.pkl\n│   │   │   ├── adj_5.pkl\n│   │   │   ├── feats.pkl\n│   │   │   ├── label.pkl\n│   │   │   ├── mask_-1.pkl\n│   │   │   ├── mask_1.pkl\n│   │   │   └── mask_5.pkl\n│   │   ├── ddi/\n│   │   │   ├── fewshot_mat_1.pkl\n│   │   │   ├── trn_mat.pkl.zip ## Unzip it manually ##\n│   │   │   ├── tst_mat.pkl\n│   │   │   ├── val_mat.pkl\n│   │   │   └── fewshot_mat_5.pkl\n│   │   ├── gen0/\n│   │   │   ├── trn_mat.pkl\n│   │   │   ├── val_mat.pkl\n│   │   │   └── tst_mat.pkl\n│   │   ├── gen1/\n│   │   │   ├── trn_mat.pkl\n│   │   │   ├── tst_mat.pkl\n│   │   │   └── val_mat.pkl\n│   │   ├── gen2/\n│   │   │   ├── trn_mat.pkl\n│   │   │   ├── val_mat.pkl\n│   │   │   └── tst_mat.pkl\n│   │   ├── ml10m/\n│   │   │   ├── fewshot_mat_1.pkl\n│   │   │   ├── trn_mat.pkl.zip ## Unzip it manually ##\n│   │   │   ├── tst_mat.pkl.zip ## Unzip it manually ##\n│   │   │   └── fewshot_mat_5.pkl\n│   │   ├── ml1m/\n│   │   │   ├── fewshot_mat_5.pkl\n│   │   │   ├── trn_mat.pkl\n│   │   │   ├── tst_mat.pkl\n│   │   │   └── fewshot_mat_1.pkl\n│   │   ├── pubmed/\n│   │   │   ├── adj_-1.pkl\n│   │   │   ├── adj_1.pkl\n│   │   │   ├── feats.pkl.zip ## Unzip it manually ##\n│   │   │   ├── label.pkl\n│   │   │   ├── mask_-1.pkl\n│   │   │   ├── mask_1.pkl\n│   │   │   ├── mask_5.pkl\n│   │   │   └── adj_5.pkl\n│   ├── graph_generation/ ## Code and examples for graph generation ##\n│   │   ├── embedding_generation.py ## Node embedding generation ##\n│   │   ├── human_item_generation_gibbsSampling_embedEstimation.py ## Edge generation ##\n│   │   ├── instance_number_estimation_hierarchical.py ## Estimate amount for each node. Not mentioned in the paper. ##\n│   │   ├── itemCollecting_dfsIterator.py ## Node generation ##\n│   │   ├── make_adjs.py ## Making datasets for generated gaphs ##\n│   │   └── Utils.py\n│   │   ├── Exp_Utils/\n│   │   │   ├── Emailer.py ## A tool to send warning email for experiments ##\n│   │   │   └── TimeLogger.py\n│   │   ├── gen_results/\n│   │   │   ├── tree_wInstanceNum_products_e-commerce platform like Amazon.pkl ## Tree data structure ##\n│   │   │   └── products_e-commerce platform like Amazon.txt ## Node list ##\n│   │   │   ├── datasets/\n│   │   │   │   ├── gen_data_ecommerce/\n│   │   │   │   │   ├── embedding_dict.pkl\n│   │   │   │   │   ├── item_list.pkl\n│   │   │   │   │   └── interaction_base-0_iter-0.pkl ## generated edges ##\n│   │   │   │   │   ├── res/\n│   │   │   │   │   │   ├── iter-0_imap.pkl ## Id map for nodes ##\n│   │   │   │   │   │   ├── iter-0_test.pkl\n│   │   │   │   │   │   ├── iter-0_train.pkl\n│   │   │   │   │   │   ├── iter-0_valid.pkl\n│   │   │   │   │   │   └── interaction_fuse_iter-0.pkl\n│   │   │   ├── tem/ ## Temporary files for node generation ##\n│   │   │   │   ├── e-commerce platform like Amazon_depth1_products\n│   │   │   │   ├── e-commerce platform like Amazon_depth2_products, Automotive\n│   │   │   │   ├── e-commerce platform like Amazon_depth2_products, Baby\n│   │   │   │   ├── e-commerce platform like Amazon_depth2_products, Beauty\n│   │   │   │   ├── e-commerce platform like Amazon_depth2_products, Books\n│   │   │   │   ├── e-commerce platform like Amazon_depth2_products, Clothing\n│   │   │   │   ├── e-commerce platform like Amazon_depth2_products, Electronics\n│   │   │   │   ├── e-commerce platform like Amazon_depth2_products, Handmade\n│   │   │   │   ├── e-commerce platform like Amazon_depth2_products, Health and Personal Care\n│   │   │   │   ├── e-commerce platform like Amazon_depth2_products, Home Improvement\n│   │   │   │   ├── e-commerce platform like Amazon_depth2_products, Industrial and Scientific\n│   │   │   │   ├── e-commerce platform like Amazon_depth2_products, Jewelry\n│   │   │   │   ├── e-commerce platform like Amazon_depth2_products, Musical Instruments\n│   │   │   │   ├── e-commerce platform like Amazon_depth2_products, Office Products\n│   │   │   │   ├── e-commerce platform like Amazon_depth2_products, Pet Supplies\n│   │   │   │   ├── e-commerce platform like Amazon_depth2_products, Tools and Home Improvement\n│   │   │   │   ├── e-commerce platform like Amazon_depth2_products, Toys\n│   │   │   │   └── e-commerce platform like Amazon_depth2_products, Sports and Outdoors\n│   ├── imgs/ ## Images used in readme ##\n│   │   ├── framework.png\n│   │   ├── intro.png\n│   │   ├── performance.png\n│   │   └── article cover.jpg\n│   ├── link_prediction/ ## code for link prediction and pre-training ##\n│   │   ├── data_handler.py\n│   │   ├── main.py\n│   │   ├── model.py\n│   │   └── params.py\n│   │   ├── Utils/\n│   │   │   └── TimeLogger.py\n│   ├── node_classification/ ## code for testing on node classification ##\n│   │   ├── data_handler.py\n│   │   ├── main.py\n│   │   ├── model.py\n│   │   └── params.py\n│   │   ├── Utils/\n│   │   │   └── TimeLogger.py\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhkuds%2Fopengraph","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhkuds%2Fopengraph","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhkuds%2Fopengraph/lists"}