{"id":13478338,"url":"https://github.com/qitianwu/DIFFormer","last_synced_at":"2025-03-27T07:31:02.465Z","repository":{"id":114300394,"uuid":"608014608","full_name":"qitianwu/DIFFormer","owner":"qitianwu","description":"The official implementation for ICLR23 spotlight paper \"DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion\"","archived":false,"fork":false,"pushed_at":"2025-03-10T23:41:18.000Z","size":106,"stargazers_count":295,"open_issues_count":0,"forks_count":32,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-03-11T00:27:38.629Z","etag":null,"topics":["attention","diffusion","diffusion-equation","geometric-deep-learning","graph-neural-networks","graph-transformer","iclr2023","image-classification","large-graph","node-classification","pytorch","pytorch-geometric","pytorch-geometric-temporal","spatial-temporal-forecasting","text-classification","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/qitianwu.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-01T06:15:15.000Z","updated_at":"2025-03-06T17:14:15.000Z","dependencies_parsed_at":"2024-10-30T11:41:49.381Z","dependency_job_id":null,"html_url":"https://github.com/qitianwu/DIFFormer","commit_stats":{"total_commits":14,"total_committers":1,"mean_commits":14.0,"dds":0.0,"last_synced_commit":"d65842bc42532d13d526ab21b3ff2591c61b5a0e"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qitianwu%2FDIFFormer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qitianwu%2FDIFFormer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qitianwu%2FDIFFormer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qitianwu%2FDIFFormer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/qitianwu","download_url":"https://codeload.github.com/qitianwu/DIFFormer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245802357,"owners_count":20674645,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attention","diffusion","diffusion-equation","geometric-deep-learning","graph-neural-networks","graph-transformer","iclr2023","image-classification","large-graph","node-classification","pytorch","pytorch-geometric","pytorch-geometric-temporal","spatial-temporal-forecasting","text-classification","transformer"],"created_at":"2024-07-31T16:01:55.704Z","updated_at":"2025-03-27T07:31:01.636Z","avatar_url":"https://github.com/qitianwu.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# DIFFormer: Diffusion-based (Graph) Transformers\n\nThe official implementation for ICLR23 paper \"DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion\".\n\nRelated material: [[Paper](https://arxiv.org/pdf/2301.09474.pdf)], [Blog [Chinese](https://zhuanlan.zhihu.com/p/622970740) | [English](https://medium.com/towards-data-science/how-to-build-graph-transformers-with-o-n-complexity-d507e103d30a)], [[Video](https://www.bilibili.com/video/BV1dP411C7Ti/?share_source=copy_web\u0026vd_source=28f0a1823e05d5df3685cb9737bba371)] \n\nDIFFormer is a general-purpose encoder that can be used to compute instance representations with their latent/observed interactions accommodated.\n\nThis work is built upon [NodeFormer](https://github.com/qitianwu/NodeFormer) (NeurIPS22) which is a scalable Transformer for large graphs with linear complexity. \n\n## What's news\n\n[2023.03.01] We release the early version of our codes for node classification.\n\n[2023.03.09] We release codes for image/text classification and spatial-temporal prediction.\n\n[2023.07.03] I gave a talk on LOG seminar about scalable graph Transformers. See the online video [here](https://www.bilibili.com/video/BV1dP411C7Ti/?share_source=copy_web\u0026vd_source=28f0a1823e05d5df3685cb9737bba371).\n\n## Model Overview\n\nDIFFormer is motivated by an energy-constrained diffusion process which encodes a batch of instances to their structured representations. At each step, the model will first estimate pair-wise influence (i.e., attention) among arbitrary instance pairs (regardless of whether they connected by an input graph) and then update instance embeddings by feature propagation. The feed-forward process can be treated as a diffusion process that minimizes the global energy.\n\n\u003cimg width=\"700\" alt=\"image\" src=\"https://user-images.githubusercontent.com/22075007/232401434-e433a273-2083-4ac8-ad82-e9e15dd51d49.png\"\u003e\n\nIn specific, the DIFFormer's architecture is depicted by the following figure where one DIFFormer layer comprises of global attention, GCN convolution and residual link. The global attention is our key design including two instantiations: DIFFormer-s and DIFFormer-a.\n\n\u003cimg width=\"700\" alt=\"image\" src=\"https://files.mdnice.com/user/23982/0f71e990-acbc-4706-aca3-680628f8ac92.png\"\u003e\n\nWe implement the model in `difformer.py` where the DIFFormer-s (resp. DIFFormer-a) corresponds to `kernel = 'simple' (resp. 'sigmoid')`. The differences of two model versions lie in the global attention computation where DIFFormer-s only requires $O(N)$ complexity and DIFFormer-a requires $O(N^2)$, illustrated by the figure below where the red color marks the computation bottleneck. \n\n\u003cimg width=\"700\" alt=\"image\" src=\"https://files.mdnice.com/user/23982/3c433a8d-faf4-45f7-a4bd-c599e3288077.png\"\u003e\n\n## Where DIFFormer can be used?\n\nWe focus on three different types of tasks in our experiments: graph-based node classification, image and text classification, and spatial-temporal prediction. Beyond these scenarios, DIFFormer can be used as a general-purpose encoder for various applications including but not limited to:\n\n- Encoding node features and graph structures: given node features $X$ and graph adjacency $A$, output node embeddings $Z$ or predictions $\\hat Y$\n\n```python\n      model = DIFFormer(in_channels, hidden_channels, out_channels, use_graph=True)\n      z = model(x, edge_index) # x: [num_nodes, in_channels], edge_index: [2, E], z: [num_nodes, out_channels]\n```\n\n- Encoding instances (w/o graph structures): given instance features $X$ that are independent samples, output instance embeddings $Z$ or predictions $\\hat Y$\n\n```python\n      model = DIFFormer(in_channels, hidden_channels, out_channels, use_graph=False)\n      z = model(x, edge_index=None) # x: [num_inst, in_channels], z: [num_inst, out_channels]\n```\n\n- As plug-in encoder backbone for computing representations in latent space under a large framework for various downstream tasks (generation, prediction, decision, etc.).\n\n## Dependence\n\nOur implementation is based on Pytorch and Pytorch Geometric.\nPlease refer to `requirements.txt` in each folder for preparing the required packages.\n\n## Datasets\n\nWe apply our model to three different tasks and consider different datasets.\n\n- For ***node classification*** and ***image/text classification***, we provide an easy access to the used datasets in the [Google drive](https://drive.google.com/drive/folders/1sWIlpeT_TaZstNB5MWrXgLmh522kx4XV?usp=sharing) \nexcept two large graph datasets, OGBN-Proteins and Pokec, which can be automatically downloaded running the training/evaluation codes. \n\n*(for two image datasets CIFAR and STL, we use a self-supervised pretrained model (ResNet-18) to obtain the embeddings of images as input features)*\n\n- For ***spatial-temporal prediction***, the datasets can be automatically downloaded from Pytorch Geometric Temporal.\n\nFollowing [here](https://github.com/qitianwu/DIFFormer#how-to-run-our-codes) for how to get the datasets ready for running our codes.\n\n## How to run our codes\n\n1. Install the required package according to `requirements.txt` in each folder (notice that the required packages are different in each task)\n\n2. Create a folder `../data` and download the datasets from [here](https://drive.google.com/drive/folders/1sWIlpeT_TaZstNB5MWrXgLmh522kx4XV?usp=sharing)\n(For OGBN-Proteins, Pokec and three spatial-temporal datasets, the datasets will be automatically downloaded)\n\n3. To train the model from scratch and evaluate on specific datasets, one can refer to the scripts `run.sh` in each folder.\n\n4. To directly reproduce the results on two large datasets (the training can be time-consuming), we also provide the [checkpoints](https://drive.google.com/drive/folders/1sKIMSS9KrTsWazO_QLY7t84kcjrRNuxo?usp=sharing) of DIFFormer on OGBN-Proteins and Pokec.\nOne can download the trained models into `../model/` and run the scripts in `node classification/run_test_large_graph.sh` for reproducing the results. \n\n- For Pokec, to ensure obtaining the result as ours, one need to download the fixed splits from [here](https://drive.google.com/drive/folders/1in2___ubLLCo4f9resuM8qln6gsz7sS4?usp=sharing) to `../data/pokec/split_0.5_0.25`.\n\n\n### Citation\n\nIf you find our codes useful, please consider citing our work\n\n```bibtex\n      @inproceedings{\n        wu2023difformer,\n        title={{DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion},\n        author={Qitian Wu and Chenxiao Yang and Wentao Zhao and Yixuan He and David Wipf and Junchi Yan},\n        booktitle={International Conference on Learning Representations (ICLR)},\n        year={2023}\n        }\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqitianwu%2FDIFFormer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqitianwu%2FDIFFormer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqitianwu%2FDIFFormer/lists"}