{"id":18954059,"url":"https://github.com/canqin001/efficient_graph_similarity_computation","last_synced_at":"2025-10-31T11:11:00.004Z","repository":{"id":160187487,"uuid":"419773996","full_name":"canqin001/Efficient_Graph_Similarity_Computation","owner":"canqin001","description":"[NeurIPS-2021] Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation ","archived":false,"fork":false,"pushed_at":"2023-03-24T02:38:39.000Z","size":3173,"stargazers_count":40,"open_issues_count":0,"forks_count":4,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-29T06:01:42.246Z","etag":null,"topics":["graph","knowledge-distillation","regression"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/canqin001.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-21T15:16:05.000Z","updated_at":"2024-12-24T11:01:45.000Z","dependencies_parsed_at":null,"dependency_job_id":"25502238-2d8d-4416-828a-472d95edd359","html_url":"https://github.com/canqin001/Efficient_Graph_Similarity_Computation","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/canqin001%2FEfficient_Graph_Similarity_Computation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/canqin001%2FEfficient_Graph_Similarity_Computation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/canqin001%2FEfficient_Graph_Similarity_Computation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/canqin001%2FEfficient_Graph_Similarity_Computation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/canqin001","download_url":"https://codeload.github.com/canqin001/Efficient_Graph_Similarity_Computation/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249184596,"owners_count":21226419,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["graph","knowledge-distillation","regression"],"created_at":"2024-11-08T13:43:00.062Z","updated_at":"2025-10-31T11:10:54.972Z","avatar_url":"https://github.com/canqin001.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Efficient Graph Similarity Computation - (EGSC)\n\nThis repo contains the source code and dataset for our NeurIPS 2021 paper:\n\n[**Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation**](https://papers.nips.cc/paper/2021/file/75fc093c0ee742f6dddaa13fff98f104-Paper.pdf)\n\u003cbr\u003e\n2019 Conference on Neural Information Processing Systems (NeurIPS 2021)\n\u003cbr\u003e\n[paper](https://papers.nips.cc/paper/2021/file/75fc093c0ee742f6dddaa13fff98f104-Paper.pdf)\n\n\u003cdiv\u003e\n    \u003cdiv style=\"display: none;\" id=\"egsc2021\"\u003e\n      \u003cpre class=\"bibtex\"\u003e@inproceedings{qin2021slow,\n              title={Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation},\n              author={Qin, Can and Zhao, Handong and Wang, Lichen and Wang, Huan and Zhang, Yulun and Fu, Yun},\n              booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},\n              year={2021}\n            }\n    \u003c/pre\u003e\n  \u003c/div\u003e\n  \u003cbr\u003e\n\u003c/div\u003e\n\n![EGSC](/Figs/our-setting.png)\n\n## Introduction\n\u003cdiv\u003e\n    \u003cbr\u003e\nGraph Similarity Computation (GSC) is essential to wide-ranging graph appli- cations such as retrieval, plagiarism/anomaly detection, etc. The exact computation of graph similarity, e.g., Graph Edit Distance (GED), is an NP-hard problem that cannot be exactly solved within an adequate time given large graphs. Thanks to the strong representation power of graph neural network (GNN), a variety of GNN-based inexact methods emerged. To capture the subtle difference across graphs, the key success is designing the dense interaction with features fusion at the early stage, which, however, is a trade-off between speed and accuracy. For Slow Learning of graph similarity, this paper proposes a novel early-fusion approach by designing a co-attention-based feature fusion network on multilevel GNN features. To further improve the speed without much accuracy drop, we introduce an efficient GSC solution by distilling the knowledge from the slow early-fusion model to the student one for Fast Inference. Such a student model also enables the offline collection of individual graph embeddings, speeding up the inference time in orders. To address the instability through knowledge transfer, we decompose the dynamic joint embedding into the static pseudo individual ones for precise teacher-student alignment. The experimental analysis on the real-world datasets demonstrates the superiority of our approach over the state-of-the-art methods on both accuracy and efficiency. Particularly, we speed up the prior art by more than 10x on the benchmark AIDS data.\n    \u003cbr\u003e\n\u003c/div\u003e\n\n## Dataset\nWe have used the standard dataloader, i.e., ‘GEDDataset’, directly provided in the [PyG](https://pytorch-geometric.readthedocs.io/en/latest/_modules/torch_geometric/datasets/ged_dataset.html#GEDDataset).\n\n```  AIDS700nef:  ``` https://drive.google.com/uc?export=download\u0026id=10czBPJDEzEDI2tq7Z7mkBjLhj55F-a2z\n\n```  LINUX:  ``` https://drive.google.com/uc?export=download\u0026id=1nw0RRVgyLpit4V4XFQyDy0pI6wUEXSOI\n\n```  ALKANE:  ``` https://drive.google.com/uc?export=download\u0026id=1-LmxaWW3KulLh00YqscVEflbqr0g4cXt\n\n```  IMDBMulti:  ``` https://drive.google.com/uc?export=download\u0026id=12QxZ7EhYA7pJiF4cO-HuE8szhSOWcfST\n\n\n\u003cp align=\"justify\"\u003e\nThe code takes pairs of graphs for training from an input folder where each pair of graph is stored as a JSON. Pairs of graphs used for testing are also stored as JSON files. Every node id and node label has to be indexed from 0. Keys of dictionaries are stored strings in order to make JSON serialization possible.\u003c/p\u003e\n\nEvery JSON file has the following key-value structure:\n\n```javascript\n{\"graph_1\": [[0, 1], [1, 2], [2, 3], [3, 4]],\n \"graph_2\":  [[0, 1], [1, 2], [1, 3], [3, 4], [2, 4]],\n \"labels_1\": [2, 2, 2, 2],\n \"labels_2\": [2, 3, 2, 2, 2],\n \"ged\": 1}\n```\n\u003cp align=\"justify\"\u003e\nThe **graph_1** and **graph_2** keys have edge list values which descibe the connectivity structure. Similarly, the **labels_1**  and **labels_2** keys have labels for each node which are stored as list - positions in the list correspond to node identifiers. The **ged** key has an integer value which is the raw graph edit distance for the pair of graphs.\u003c/p\u003e\n\n## Requirements\nThe codebase is implemented in Python 3.6.12. package versions used for development are just below.\n```\nmatplotlib        3.3.4\nnetworkx          2.4\nnumpy             1.19.5\npandas            1.1.2\nscikit-learn      0.23.2\nscipy             1.4.1\ntexttable         1.6.3\ntorch             1.6.0\ntorch-cluster     1.5.9\ntorch-geometric   1.7.0\ntorch-scatter     2.0.6\ntorch-sparse      0.6.9\ntqdm              4.60.0\n```\n\n## File Structure\n```\n.\n├── README.md\n├── LICENSE                            \n├── EGSC-T\n│   ├── src\n│   │    ├── egsc.py \n│   │    ├── layers.py\n│   │    ├── main.py\n│   │    ├── model.py\n│   │    ├── parser.py        \n│   │    └── utils.py                             \n│   ├── README.md                      \n│   └── train.sh\n├── EGSC-KD\n│   ├── src\n│   │    ├── egsc_kd.py \n│   │    ├── egsc_nonkd.py \n│   │    ├── layers.py\n│   │    ├── main_kd.py\n│   │    ├── main_nonkd.py\n│   │    ├── model_kd.py\n│   │    ├── parser.py    \n│   │    ├── trans_modules.py    \n│   │    └── utils.py                             \n│   ├── README.md  \n│   ├── train_kd.md                     \n│   └── train_nonkd.sh \n├── Checkpoints\n│   ├── G_EarlyFusion_Disentangle_LINUX_gin_checkpoint.pth\n│   ├── G_EarlyFusion_Disentangle_IMDBMulti_gin_checkpoint.pth\n│   ├── G_EarlyFusion_Disentangle_ALKANE_gin_checkpoint.pth\n│   └── G_EarlyFusion_Disentangle_AIDS700nef_gin_checkpoint.pth                         \n└── GSC_datasets\n    ├── AIDS700nef\n    ├── ALKANE\n    ├── IMDBMulti\n    └── LINUX\n```\n\n## To Do\n- [x] GED Datasets Processing\n- [x] Teacher Model Training\n- [x] Student Model Training\n- [x] Knowledge Distillation\n- [ ] Online Inference\n\n## Acknowledgement\nWe would like to thank the [SimGNN](https://github.com/benedekrozemberczki/SimGNN) and [Extended-SimGNN](https://github.com/gospodima/Extended-SimGNN) which we used for this implementation.\n\n## Hint\nOn some datasets, the results are not quite stable. We suggest to run multiple times to report the avarage one.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcanqin001%2Fefficient_graph_similarity_computation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcanqin001%2Fefficient_graph_similarity_computation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcanqin001%2Fefficient_graph_similarity_computation/lists"}