{"id":29248867,"url":"https://github.com/hkuds/graphedit","last_synced_at":"2026-02-12T06:32:57.272Z","repository":{"id":223614320,"uuid":"761024374","full_name":"HKUDS/GraphEdit","owner":"HKUDS","description":"\"GraphEdit: Large Language Models for Graph Structure Learning\"","archived":false,"fork":false,"pushed_at":"2024-06-24T12:47:40.000Z","size":2426,"stargazers_count":134,"open_issues_count":3,"forks_count":15,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-04T00:09:16.651Z","etag":null,"topics":["graph-learning","graph-neural-networks","graph-structure-learning","instruction-tuning","large-language-models"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2402.15183","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HKUDS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-21T05:05:09.000Z","updated_at":"2025-05-20T06:01:14.000Z","dependencies_parsed_at":"2024-04-15T11:53:54.820Z","dependency_job_id":null,"html_url":"https://github.com/HKUDS/GraphEdit","commit_stats":null,"previous_names":["hkuds/graphedit"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/HKUDS/GraphEdit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUDS%2FGraphEdit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUDS%2FGraphEdit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUDS%2FGraphEdit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUDS%2FGraphEdit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HKUDS","download_url":"https://codeload.github.com/HKUDS/GraphEdit/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUDS%2FGraphEdit/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29360644,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-12T01:03:07.613Z","status":"online","status_checked_at":"2026-02-12T02:00:06.911Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["graph-learning","graph-neural-networks","graph-structure-learning","instruction-tuning","large-language-models"],"created_at":"2025-07-04T00:09:16.566Z","updated_at":"2026-02-12T06:32:57.267Z","avatar_url":"https://github.com/HKUDS.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# **GraphEdit: Large Language Models for Graph Structure Learning**\n\n\n\u003cimg src='GraphEdit_article_cover.png' /\u003e\n\n\u003ca href='https://github.com/HKUDS/GraphEdit'\u003e\u003cimg src='https://img.shields.io/badge/Project-Page-Green'\u003e\u003c/a\u003e\n\u003ca href='https://arxiv.org/abs/2402.15183'\u003e\u003cimg src='https://img.shields.io/badge/arXiv-2402.15183-b31b1b'\u003e\u003c/a\u003e\n![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/b9f51e64959e4999ad469c2ca437373a.png#pic_center)\n## Code Structure\n```\n.\n├── README.md\n├── GNN\n│   ├── GNNs\n│   │   ├── GCN\n│   │   │   └── model.py\n│   │   ├── MLP\n│   │   │   └── model.py\n│   │   ├── RevGAT\n│   │   │   ├── eff_gcn_modules/rev\n│   │   │   │   ├── __init__.py\n│   │   │   │   ├── gcn_revop.py\n│   │   │   │   ├── memgcn.py\n│   │   │   │   └── rev_layer.py\n│   │   │   ├── __init__.py\n│   │   │   └── model.py\n│   │   ├── SAGE\n│   │   │   └── model.py\n│   │   ├── gnn_trainer.py\n│   │   └── gnn_utils.py\n│   ├── datasets\n│   │   ├── dataset.py\n│   │   ├── load.py\n│   │   ├── load_citeseer.py\n│   │   ├── load_cora.py\n│   │   ├── load_pubmed.py\n│   │   └── utils.py\n│   ├── main.py\n│   ├── predict_edge.py\n│   ├── train_edge_predictor.py\n│   └── utils.py\n└── LLM\n    ├── graphedit\n    │   ├── data\n    │   │   ├──__init__.py\n    │   │   ├──clean_sharegpt.py\n    │   │   ├──convert_alpaca.py\n    │   │   ├──extract_gpt4_only.py\n    │   │   ├──extract_single_round.py\n    │   │   ├──filter_wrong_format.py\n    │   │   ├──get_stats.py\n    │   │   ├──hardcoded_questions.py\n    │   │   ├──inspect_data.py\n    │   │   ├──merge.py\n    │   │   ├──optional_clean.py\n    │   │   ├──optional_replace.py\n    │   │   ├──prepare_all.py\n    │   │   ├──pretty_json.py\n    │   │   ├──sample.py\n    │   │   ├──split_long_conversation.py\n    │   │   └── split_train_test.py\n    │   ├── eval   \n    │   │   └── eval_model.py\n    │   ├── model\n    │   │   ├── GraphEdit.py\n    │   │   ├── __init__.py\n    │   │   ├── apply_delta.py\n    │   │   ├── apply_lora.py\n    │   │   ├── compression.py\n    │   │   ├── convert_fp16.py\n    │   │   ├── llama_condense_monkey_patch.py\n    │   │   ├── make_delta.py\n    │   │   ├── model_adapter.py\n    │   │   ├── model_chatglm.py\n    │   │   ├── model_codet5p.py\n    │   │   ├── model_exllama.py\n    │   │   ├── model_falcon.py\n    │   │   ├── model_registry.py\n    │   │   ├── monkey_patch_non_inplace.py\n    │   │   ├── rwkv_model.py\n    │   │   └── upload_hub.py\n    │   ├── modules\n    │   │   ├── __init__.py\n    │   │   ├── awq.py\n    │   │   ├── exllama.py\n    │   │   └── gptq.py\n    │   ├── protocol\n    │   │   ├── api_protocol.py\n    │   │   └── openai_api_protocol.py\n    │   ├── serve\n    │   │   ├── gateway\n    │   │   │   ├── README.md\n    │   │   │   └── nginx.conf\n    │   │   ├── monitor\n    │   │   │   ├── dataset_release_scripts\n    │   │   │   │   ├── arena_33k\n    │   │   │   │   │   ├── count_unique_users.py\n    │   │   │   │   │   ├── filter_bad_conv.py\n    │   │   │   │   │   ├── merge_field.py\n    │   │   │   │   │   ├── sample.py\n    │   │   │   │   │   └── upload_hf_dataset.py\n    │   │   │   │   └── lmsys_chat_1m\n    │   │   │   │       ├── approve_all.py\n    │   │   │   │       ├── compute_stats.py\n    │   │   │   │       ├── filter_bad_conv.py\n    │   │   │   │       ├── final_post_processing.py\n    │   │   │   │       ├── instructions.md\n    │   │   │   │       ├── merge_oai_tag.py\n    │   │   │   │       ├── process_all.sh\n    │   │   │   │       ├── sample.py\n    │   │   │   │       └── upload_hf_dataset.py\n    │   │   │   ├── basic_stats.py\n    │   │   │   ├── clean_battle_data.py\n    │   │   │   ├── clean_chat_data.py\n    │   │   │   ├── elo_analysis.py\n    │   │   │   ├── inspect_conv.py\n    │   │   │   ├── intersect_conv_file.py\n    │   │   │   ├── leaderboard_csv_to_html.py\n    │   │   │   ├── monitor.py\n    │   │   │   ├── summarize_cluster.py\n    │   │   │   ├── tag_openai_moderation.py\n    │   │   │   └── topic_clustering.py\n    │   │   ├── __init__.py\n    │   │   ├── api_provider.py\n    │   │   ├── base_model_worker.py\n    │   │   ├── cli.py\n    │   │   ├── controller.py\n    │   │   ├── gradio_block_arena_anony.py\n    │   │   ├── gradio_block_arena_named.py\n    │   │   ├── gradio_web_server.py\n    │   │   ├── gradio_web_server_multi.py\n    │   │   ├── huggingface_api.py\n    │   │   ├── huggingface_api_worker.py\n    │   │   ├── inference.py\n    │   │   ├── launch_all_serve.py\n    │   │   ├── model_worker.py\n    │   │   ├── multi_model_worker.py\n    │   │   ├── openai_api_server.py\n    │   │   ├── register_worker.py\n    │   │   ├── shutdown_serve.py\n    │   │   ├── test_message.py\n    │   │   ├── test_throughput.py\n    │   │   └── vllm_worker.py\n    │   ├── train\n    │   │   ├── GraphEdit_trainer.py\n    │   │   ├── llama2_flash_attn_monkey_patch.py\n    │   │   ├── llama_flash_attn_monkey_patch.py\n    │   │   ├── llama_xformers_attn_monkey_patch.py\n    │   │   ├── train.py\n    │   │   ├── train_baichuan.py\n    │   │   ├── train_flant5.py\n    │   │   ├── train_lora.py\n    │   │   ├── train_lora_t5.py\n    │   │   ├── train_mem.py\n    │   │   └── train_xformers.py\n    │   ├── __init__.py\n    │   ├── constants.py\n    │   ├── conversation.py\n    │   └── utils.py\n    ├── playground\n    │   ├── test_embedding\n    │   │   ├── README.md\n    │   │   ├── test_classification.py\n    │   │   ├── test_semantic_search.py\n    │   │   └── test_sentence_similarity.py\n    │   ├── deepspeed_config_s2.json\n    │   └── deepspeed_config_s3.json\n    ├── scripts\n    │   ├── apply_lora.py\n    │   ├── create_ins.py\n    │   ├── eval.sh\n    │   ├── get_embs.py\n    │   ├── result2np.py\n    │   └── train_lora.sh\n    ├── tests\n    │   ├── killall_python.sh    \n    │   ├── launch_openai_api_test_server.py\n    │   ├── test_cli.py\n    │   ├── test_cli_inputs.txt\n    │   ├── test_openai_api.py\n    │   └── test_openai_langchain.py\n    ├── .pylintrc\n    ├── LICENSE\n    ├── format.sh\n    └── pyproject.toml\n```\n## 0. Python Environment Setup\n* Packed conda environment is provided [here](https://drive.google.com/file/d/1eeLKFiDU4CbOjb3uzl1Ur0jHXAEUyh5j/view?usp=drive_link) (NVIDIA GeForce RTX 3090)\n```bash\nconda create --name GraphEdit python=3.8\nconda activate GraphEdit\n\npip install torch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0\npip install torch_geometric\npip install dgl\npip install transformers==4.31.0\npip install flash_attn==1.0.4\n```\n\n## 1. Download TAG datasets\n| Dataset | Description |\n|--|--|\n| Pubmed | Download the dataset [here](https://drive.google.com/file/d/11OVDmP_DaM3urAswIlMLjiby28X8-8_Z/view?usp=drive_link), unzip and move it to `GNN/datasets/pubmed` |\n| Citeseer | Download the dataset [here](https://drive.google.com/file/d/1KtFjg95p3tPRWQ5XCqTtjJh9nXLylcbQ/view?usp=drive_link), unzip and move it to `GNN/datasets/citeseer` |\n| Cora | Download the dataset [here](https://drive.google.com/file/d/1fO9tAX2yUoQ74WBE25bAw943nRCKaBqj/view?usp=drive_link), unzip and move it to `GNN/datasets/cora` |\n\n## 2. Getting Started\n\n* Replace the system path in `eval_model.py`, `train_lora.py` and `get_embs.py`  with your path.\n### Stage-1: Instruction tuning the LLM\n* Vicuna-7b can get from the [huggingface](https://huggingface.co/lmsys/vicuna-7b-v1.5-16k).\n* Trained Lora models are provided [here](https://drive.google.com/drive/folders/15MO09sVetHaEPBAYM2M2kZ4eyuPdL-Ng?usp=drive_link).\n```bash\ncd GraphEdit/LLM/\nsh scripts/train_lora.sh\n\npython scripts/apply_lora.py\n```\n### Stage-2: Get the candidate structure\n* Trained edge predictors are provided [here](https://drive.google.com/drive/folders/1bJ5rArLRa-MMbqytioZFQt2HRWMquIHl?usp=drive_link)\n```bash\npython scripts/get_embs.py\n\ncd ../GNN/\npython train_edge_predictor.py\npython predict_edge.py --combine True\n```\n### Stage-3: Refine the candidate structure\n```bash\ncd ../LLM/\npython scripts/create_ins.py\nsh scripts/eval.sh\n\npython scripts/result2np.py\n```\n\n### Stage-4: Eval the refined structure\n* Refined structrues are provided [here](https://drive.google.com/drive/folders/1EeggwedsQraVVIqxkqQDOGBOH4qwVvLU?usp=drive_link)\n```bash\ncd ../GNN/\npython main.py\n```\n\n## 3. Instruction Template\n\u003e Pubmed\n\n```\nBased on the title and abstract of the two papers. Do they belong to the same category among Diabetes Mellitus Type 1, Diabetes Mellitus Type 2, or Diabetes Mellitus, Experimental? If the answer is \\\"True\\\", answer \\\"True\\\" and the category, otherwise answer \\\"False\\\". The first paper: {pubmed.raw_texts[paperID_0]} The second paper: {pubmed.raw_texts[paperID_1]}.\n```\n\n\u003e Citeseer\n\n```\nBased on the title and abstract of the two papers. Do they belong to the same category among Agent, ML, IR, DB, HCI and AI? If the answer is \\\"True\\\", answer \\\"True\\\" and the category, otherwise answer \\\"False\\\". The first paper: {citeseer.raw_texts[paperID_0]} The second paper: {citeseer.raw_texts[paperID_1]}.\n```\n\u003e Cora\n\n```\nBased on the title and abstract of the two papers. Do they belong to the same category among Rule_Learning, Neural_Networks, Case_Based, Genetic_Algorithms, Theory, Reinforcement_Learning or Probabilistic_Methods? If the answer is \\\"True\\\", answer \\\"True\\\" and the category, otherwise answer \\\"False\\\". If there is insufficient text information, answer \\\"True\\\". The first paper: Title: {cora.raw_text[paperID_0].split(':')[0]}  Abstract: {cora.raw_text[paperID_0].split(':')[1]}  The second paper: Title: {cora.raw_text[paperID_1].split(':')[0]}  Abstract: {cora.raw_text[paperID_1].split(':')[1]}.\n```\n## Citation\n\n```\n@article{guo2024graphedit,\ntitle={GraphEdit: Large Language Models for Graph Structure Learning}, \nauthor={Zirui Guo and Lianghao Xia and Yanhua Yu and Yuling Wang and Zixuan Yang and Wei Wei and Liang Pang and Tat-Seng Chua and Chao Huang},\nyear={2024},\neprint={2402.15183},\narchivePrefix={arXiv},\nprimaryClass={cs.CL}\n}\n```\n\n## Acknowledgement\nThe structure of the LLM in this code is largely based on [FastChat](https://github.com/lm-sys/FastChat). And the original TAG datasets are provided by [Graph-LLM](https://github.com/CurryTang/Graph-LLM). Thanks for their work.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhkuds%2Fgraphedit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhkuds%2Fgraphedit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhkuds%2Fgraphedit/lists"}