{"id":14977118,"url":"https://github.com/gordicaleksa/pytorch-gat","last_synced_at":"2025-05-15T15:02:15.330Z","repository":{"id":37237176,"uuid":"324526009","full_name":"gordicaleksa/pytorch-GAT","owner":"gordicaleksa","description":"My implementation of the original GAT paper (Veličković et al.). I've additionally included the playground.py file for visualizing the Cora dataset, GAT embeddings, an attention mechanism, and entropy histograms. I've supported both Cora (transductive) and PPI (inductive) examples!","archived":false,"fork":false,"pushed_at":"2022-11-17T14:21:18.000Z","size":26404,"stargazers_count":2524,"open_issues_count":13,"forks_count":335,"subscribers_count":47,"default_branch":"main","last_synced_at":"2025-04-07T20:08:53.788Z","etag":null,"topics":["attention","attention-mechanism","deep-learning","gat","gat-tutorial","graph-attention-network","graph-attention-networks","jupyter","python","pytorch","pytorch-gat","pytorch-implementation","self-attention"],"latest_commit_sha":null,"homepage":"https://youtube.com/c/TheAIEpiphany","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gordicaleksa.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"patreon":"theaiepiphany"}},"created_at":"2020-12-26T09:54:52.000Z","updated_at":"2025-04-07T07:28:43.000Z","dependencies_parsed_at":"2023-01-21T04:02:42.969Z","dependency_job_id":null,"html_url":"https://github.com/gordicaleksa/pytorch-GAT","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gordicaleksa%2Fpytorch-GAT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gordicaleksa%2Fpytorch-GAT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gordicaleksa%2Fpytorch-GAT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gordicaleksa%2Fpytorch-GAT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gordicaleksa","download_url":"https://codeload.github.com/gordicaleksa/pytorch-GAT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247721898,"owners_count":20985084,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attention","attention-mechanism","deep-learning","gat","gat-tutorial","graph-attention-network","graph-attention-networks","jupyter","python","pytorch","pytorch-gat","pytorch-implementation","self-attention"],"created_at":"2024-09-24T13:55:07.155Z","updated_at":"2025-04-07T20:09:02.301Z","avatar_url":"https://github.com/gordicaleksa.png","language":"Jupyter Notebook","funding_links":["https://patreon.com/theaiepiphany"],"categories":[],"sub_categories":[],"readme":"## GAT - Graph Attention Network (PyTorch) :computer: + graphs + :mega: = :heart:\nThis repo contains a PyTorch implementation of the original GAT paper (:link: [Veličković et al.](https://arxiv.org/abs/1710.10903)). \u003cbr/\u003e\nIt's aimed at making it **easy to start playing and learning** about GAT and GNNs in general. \u003cbr/\u003e\n\n## Table of Contents\n* [What are graph neural networks and GAT?](#what-are-gnns)\n* [Visualizations (Cora and PPI, attention, t-SNE embeddings, entropy histograms)](#cora-visualized)\n* [Setup](#setup)\n* [Usage](#usage)\n    * [Training GAT](#training-gat)\n    * [Tips for understanding the code](#tip-for-understanding-the-code)\n    * [Profiling GAT](#profiling-gat)\n    * [Visualization tools](#visualization-tools)\n* [Hardware requirements](#hardware-requirements)\n* [Learning material](#learning-material)\n    \n## What are GNNs?\n\nGraph neural networks are a family of neural networks that are dealing with signals defined over graphs!\n\nGraphs can model many interesting natural phenomena, so you'll see them used everywhere from:\n* Computational biology - predicting potent [antibiotics like halicin](https://www.nature.com/articles/d41586-020-00018-3)\n* Computational pharmacology - predicting [drug side effects](https://arxiv.org/abs/1802.00543)\n* Traffic forecasting - e.g. it's used in [Google Maps](https://deepmind.com/blog/article/traffic-prediction-with-advanced-graph-neural-networks)\n* Recommendation systems (used at [Pintrest](https://medium.com/pinterest-engineering/pinsage-a-new-graph-convolutional-neural-network-for-web-scale-recommender-systems-88795a107f48), [Uber](https://eng.uber.com/uber-eats-graph-learning/), [Twitter](https://towardsdatascience.com/temporal-graph-networks-ab8f327f2efe), etc.) \n\nand all the way to [particle physics](https://news.fnal.gov/2020/09/the-next-big-thing-the-use-of-graph-neural-networks-to-discover-particles/) at Large Hedron Collider [(LHC)](https://en.wikipedia.org/wiki/Large_Hadron_Collider), [fake news detection](https://arxiv.org/abs/1902.06673) and the list goes on and on!\n\nGAT is a representative of spatial (convolutional) GNNs. Since CNNs had a tremendous success in the field of computer vision,\nresearchers decided to generalize it to graphs and so here we are! :nerd_face:\n\nHere is a schematic of GAT's structure:\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"data/readme_pics/GAT_schematic.PNG\" width=\"600\"/\u003e\n\u003c/p\u003e\n\n## Cora visualized\n\nYou can't just start talking about GNNs without mentioning the single most famous graph dataset - **Cora**.\n\nNodes in Cora represent research papers and the links are, you guessed it, citations between those papers.\n\nI've added a utility for visualizing Cora and doing basic network analysis. Here is how Cora looks like:\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"data/readme_pics/cora_graph_jupyter.PNG\"/\u003e\n\u003c/p\u003e\n\nNode size corresponds to its degree (i.e. the number of in/outgoing edges). Edge thickness roughly corresponds\nto how \"popular\" or \"connected\" that edge is (**edge betweennesses** is the nerdy term [check out the code](https://github.com/gordicaleksa/pytorch-GAT/blob/main/utils/visualizations.py#L104).)\n\nAnd here is a plot showing the degree distribution on Cora:\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"data/readme_pics/cora_degree_statistics.PNG\" width=\"850\"/\u003e\n\u003c/p\u003e\n\nIn and out degree plots are the same since we're dealing with an undirected graph. \n\nOn the bottom plot (degree distribution) you can see an interesting peak happening in the `[2, 4]` range.\nThis means that the majority of nodes have a small number of edges but there is 1 node that has 169 edges! (the big green node)\n\n## Attention visualized\n\nOnce we have a fully-trained GAT model we can visualize the attention that certain \"nodes\" have learned. \u003cbr/\u003e\nNodes use attention to decide how to aggregate their neighborhood, enough talk, let's see it:\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"data/readme_pics/attention1.jpg\" width=\"600\"/\u003e\n\u003c/p\u003e\n\nThis is one of Cora's nodes that has the most edges (citations). The colors represent the nodes of the same class.\nYou can clearly see 2 things from this plot:\n* The graph is [homophilic](https://en.wikipedia.org/wiki/Homophily) meaning similar nodes (nodes with same class) tend to cluster together.\n* Edge thickness on this chart is a function of attention, and since they are all of the same thickness, GAT basically learned to do something similar to [GCN!](https://www.youtube.com/watch?v=VyIOfIglrUM)\n\nSimilar rules hold for smaller neighborhoods. Also notice the self edges:\n\n\u003cp align=\"left\"\u003e\n\u003cimg src=\"data/readme_pics/attention2.jpg\" width=\"400\"/\u003e\n\u003cimg src=\"data/readme_pics/attention4.jpg\" width=\"400\"/\u003e\n\u003c/p\u003e\n\nOn the other hand PPI is learning much more interesting attention patterns:\n\n\u003cp align=\"left\"\u003e\n\u003cimg src=\"data/readme_pics/neighborhood_attention_ppi/3.jpg\" width=\"400\"/\u003e\n\u003cimg src=\"data/readme_pics/neighborhood_attention_ppi/2.jpg\" width=\"400\"/\u003e\n\u003c/p\u003e\n\nOn the left we can see that 6 neighbors are receiving a non-negligible amount of attention and on the right we can\nsee that all of the attention is **focused onto a single neighbor**.\n\nFinally 2 more interesting patterns - a **strong self edge** on the left and on the right we can see that a single neighbor\nis receiving a bulk of attention whereas the rest is **equally distributed** across the rest of the neighborhood:\n\n\u003cp align=\"left\"\u003e\n\u003cimg src=\"data/readme_pics/neighborhood_attention_ppi/4.jpg\" width=\"400\"/\u003e\n\u003cimg src=\"data/readme_pics/neighborhood_attention_ppi/1.jpg\" width=\"400\"/\u003e\n\u003c/p\u003e\n\n**Important note:** all of the `PPI` visualizations are only possible for the first GAT layer. \nFor some reason the attention coefficients for the second and third layers are almost all 0s (even though [I achieved](#training-gat) the published results).\n\n## Entropy histograms\n\nAnother way to understand that GAT isn't learning interesting attention patterns on Cora (i.e. that it's learning const attention)\nis by treating the node neighborhood's attention weights as a probability distribution, calculating the entropy, and\naccumulating the info across every node's neighborhood.\n\nWe'd love GAT's attention distributions to be skewed. You can see in orange how the histogram looks like for ideal uniform distributions,\nand you can see in light blue the learned distributions - they are exactly the same!\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"data/readme_pics/entropy_histograms/layer_0_head_0.jpg\" width=\"400\"/\u003e\n\u003cimg src=\"data/readme_pics/entropy_histograms/layer_1_head_0.jpg\" width=\"400\"/\u003e\n\u003c/p\u003e\n\nI've plotted only a single attention head from the first layer (out of 8) because they're all the same!\n\nOn the other hand PPI is learning much more interesting attention patterns:\n\n\u003cp align=\"left\"\u003e\n\u003cimg src=\"data/readme_pics/entropy_histograms_ppi/layer_0_head_0.jpg\" width=\"400\"/\u003e\n\nAs expected, the uniform distribution entropy histogram lies to the right (orange) since uniform distributions have the highest entropy.\n\n## Analyzing Cora's embedding space (t-SNE)\n\nOk, we've seen attention! What else is there to visualize? Well, let's visualize the learned embeddings from GAT's\nlast layer. The output of GAT is a tensor of shape = (2708, 7) where 2708 is the number of nodes in Cora and 7 is\nthe number of classes. Once we project those 7-dim vectors into 2D, using [t-SNE](https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding), we get this:\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"data/readme_pics/t-sne.PNG\" width=\"600\"/\u003e\n\u003c/p\u003e\n\nWe can see that the nodes with the same label/class are roughly **clustered together** - with these representations it's easy\nto train a simple classifier on top that will tell us which class the node belongs to.\n\n*Note: I've tried UMAP as well but didn't get nicer results + it has a lot of dependencies if you want to use their plot util.*\n\n## Setup\n\nSo we talked about what GNNs are, and what they can do for you (among other things). \u003cbr/\u003e\nLet's get this thing running! Follow the next steps:\n\n1. `git clone https://github.com/gordicaleksa/pytorch-GAT`\n2. Open Anaconda console and navigate into project directory `cd path_to_repo`\n3. Run `conda env create` from project directory (this will create a brand new conda environment).\n4. Run `activate pytorch-gat` (for running scripts from your console or setup the interpreter in your IDE)\n\nThat's it! It should work out-of-the-box executing environment.yml file which deals with dependencies. \u003cbr/\u003e\n\n-----\n\nPyTorch pip package will come bundled with some version of CUDA/cuDNN with it,\nbut it is highly recommended that you install a system-wide CUDA beforehand, mostly because of the GPU drivers. \nI also recommend using Miniconda installer as a way to get conda on your system.\nFollow through points 1 and 2 of [this setup](https://github.com/Petlja/PSIML/blob/master/docs/MachineSetup.md)\nand use the most up-to-date versions of Miniconda and CUDA/cuDNN for your system.\n\n## Usage\n\n#### Option 1: Jupyter Notebook\n\nJust run `jupyter notebook` from you Anaconda console and it will open up a session in your default browser. \u003cbr/\u003e\nOpen `The Annotated GAT.ipynb` and you're ready to play!\n\n---\n\n**Note:** if you get `DLL load failed while importing win32api: The specified module could not be found` \u003cbr/\u003e\nJust do `pip uninstall pywin32` and then either `pip install pywin32` or `conda install pywin32` [should fix it](https://github.com/jupyter/notebook/issues/4980)!\n\n#### Option 2: Use your IDE of choice\n\nYou just need to link the Python environment you created in the [setup](#setup) section.\n\n### Training GAT\n\nFYI, my GAT implementation achieves the published results:\n* On Cora I get the `82-83%` accuracy on test nodes\n* On PPI I achieved the `0.973` micro-F1 score (and actually even higher)\n\n---\n\nEverything needed to train GAT on Cora is already setup. To run it (from console) just call: \u003cbr/\u003e\n`python training_script_cora.py`\n\nYou could also potentially:\n* add the `--should_visualize` - to visualize your graph data\n* add the `--should_test` - to evaluate GAT on the test portion of the data\n* add the `--enable_tensorboard` - to start saving metrics (accuracy, loss)\n\nThe code is well commented so you can (hopefully) understand how the training itself works. \u003cbr/\u003e\n\nThe script will:\n* Dump checkpoint *.pth models into `models/checkpoints/`\n* Dump the final *.pth model into `models/binaries/`\n* Save metrics into `runs/`, just run `tensorboard --logdir=runs` from your Anaconda to visualize it\n* Periodically write some training metadata to the console\n\nSame goes for training on PPI, just run `python training_script_ppi.py`. PPI is much more GPU-hungry so if\nyou don't have a strong GPU with at least 8 GBs you'll need to add the `--force_cpu` flag to train GAT on CPU.\nYou can alternatively try reducing the batch size to 1 or making the model slimmer.\n\nYou can visualize the metrics during the training, by calling `tensorboard --logdir=runs` from your console\nand pasting the `http://localhost:6006/` URL into your browser:\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"data/readme_pics/val_loss.PNG\" height=\"290\"/\u003e\n\u003cimg src=\"data/readme_pics/val_acc.PNG\" height=\"290\"/\u003e\n\u003c/p\u003e\n\n*Note: Cora's train split seems to be much harder than the validation and test splits looking at the loss and accuracy metrics.*\n\nHaving said that most of the fun actually lies in the `playground.py` script.\n\n### Tip for understanding the code\n\nI've added 3 GAT implementations - some are conceptually easier to understand some are more efficient.\nThe most interesting and hardest one to understand is implementation 3.\nImplementation 1 and implementation 2 differ in subtle details but basically do the same thing.\n\n**Advice on how to approach the code:**\n* Understand the implementation #2 first\n* Check out the differences it has compared to implementation #1\n* Finally, tackle the implementation #3\n\n### Profiling GAT\n\nIf you want to profile the 3 implementations just set the the `playground_fn` variable to `PLAYGROUND.PROFILE_GAT` in `playground.py`.\n\nThere are 2 params you may care about:\n* `store_cache` - set to `True` if you wish to save the memory/time profiling results after you've run it\n* `skip_if_profiling_info_cached` - set to `True` if you want to pull the profiling info from cache\n\nThe results will get stored in `data/` in `memory.dict` and `timing.dict` dictionaries (pickle).\n\n*Note: implementation #3 is by far the most optimized one - you can see the details in the code.*\n\n---\n\nI've also added `profile_sparse_matrix_formats` if you want to get some familiarity with different matrix sparse formats\nlike `COO`, `CSR`, `CSC`, `LIL`, etc.\n\n### Visualization tools\n\nIf you want to visualize t-SNE embeddings, attention or embeddings set the `playground_fn` variable to `PLAYGROUND.VISUALIZE_GAT` and\nset the `visualization_type` to:\n* `VisualizationType.ATTENTION` - if you wish to visualize attention across node neighborhoods\n* `VisualizationType.EMBEDDING` - if you wish to visualize the embeddings (via t-SNE)\n* `VisualizationType.ENTROPY` - if you wish to visualize the entropy histograms\n\nAnd you'll get crazy visualizations like these ones (`VisualizationType.ATTENTION` option):\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"data/readme_pics/attention3.jpg\" width=\"410\"/\u003e\n\u003cimg src=\"data/readme_pics/kk_layout.jpg\" width=\"410\"/\u003e\n\u003c/p\u003e\n\nOn the left you can see the node with the highest degree in the whole Cora dataset.\n\nIf you're wondering about why these look like a circle, it's because I've used the `layout_reingold_tilford_circular` layout \nwhich is particularly well suited for tree like graphs (since we're visualizing a node and its neighbors this\nsubgraph is effectively a `m-ary` tree).\n\nBut you can also use different drawing algorithms like `kamada kawai` (on the right), etc.\n\nFeel free to go through the code and play with plotting attention from different GAT layers, plotting different node\nneighborhoods or attention heads. You can also easily change the number of layers in your GAT, although [shallow GNNs](https://towardsdatascience.com/do-we-need-deep-graph-neural-networks-be62d3ec5c59)\ntend to perform the best on [small-world](https://en.wikipedia.org/wiki/Small-world_network), homophilic graph datasets.\n\n---\n\nIf you want to visualize Cora/PPI just set the `playground_fn` to `PLAYGROUND.VISUALIZE_DATASET` and you'll get the results [from this README](#cora-visualized).\n\n## Hardware requirements\n\nHW requirements are highly dependent on the graph data you'll use. If you just want to play with `Cora`, you're good to go with a **2+ GBs** GPU.\n\nIt takes (on Cora citation network):\n* ~10 seconds to train GAT on my RTX 2080 GPU\n* 1.5 GBs of VRAM memory is *reserved* (PyTorch's caching overhead - far less is allocated for the actual tensors)\n* The model itself has only 365 KBs!\n\nCompare this to hardware needed even for the smallest of [transformers](https://github.com/gordicaleksa/pytorch-original-transformer#hardware-requirements)!\n\nOn the other hand the `PPI` dataset is much more GPU-hungry. You'll need a GPU with **8+ GBs** of VRAM, or you\ncan reduce the batch size to 1 and make the model \"slimmer\" and thus try to reduce the VRAM consumption.\n\n### Future todos:\n\n* Figure out why are the *attention coefficients equal to 0* (for the PPI dataset, second and third layer)\n* Potentially add an implementation leveraging PyTorch's `sparse API`\n\nIf you have an idea of how to implement GAT using PyTorch's sparse API please feel free to submit a PR.\nI personally had difficulties with their API, it's in beta, and it's questionable whether it's at all possible\nto make an implementation as efficient as my implementation 3 using it.\n\nSecondly, I'm still not sure why is GAT achieving reported results on PPI while there are some obvious numeric\nproblems in deeper layers as manifested by all attention coefficients being equal to 0.\n\n## Learning material\n\nIf you're having difficulties understanding GAT I did an in-depth overview of the paper [in this video:](https://www.youtube.com/watch?v=uFLeKkXWq2c\u0026ab_channel=TheAIEpiphany)\n\n\u003cp align=\"left\"\u003e\n\u003ca href=\"https://www.youtube.com/watch?v=uFLeKkXWq2c\" target=\"_blank\"\u003e\u003cimg src=\"https://img.youtube.com/vi/uFLeKkXWq2c/0.jpg\" \nalt=\"The GAT paper explained\" width=\"480\" height=\"360\" border=\"10\" /\u003e\u003c/a\u003e\n\u003c/p\u003e\n\nI also made a [walk-through video](https://www.youtube.com/watch?v=364hpoRB4PQ) of this repo (focusing on the potential pain points), \nand a blog for [getting started with Graph ML](https://gordicaleksa.medium.com/how-to-get-started-with-graph-machine-learning-afa53f6f963a) in general! :heart:\n\nI have some more videos which could further help you understand GNNs:\n* [My overview of the GCN paper](https://www.youtube.com/watch?v=VyIOfIglrUM)\n* [My overview of the GraphSAGE paper](https://www.youtube.com/watch?v=vinQCnizqDA)\n* [My overview of the PinSage paper](https://www.youtube.com/watch?v=ed0NJdqwEyg)\n* [My overview of Temporal Graph Networks (TGN)](https://www.youtube.com/watch?v=0tw66aTfWaI)\n\n## Acknowledgements\n\nI found these repos useful (while developing this one):\n\n* [official GAT](https://github.com/PetarV-/GAT) and [GCN](https://github.com/tkipf/gcn)\n* [PyTorch Geometric](https://github.com/rusty1s/pytorch_geometric)\n* [DeepInf](https://github.com/xptree/DeepInf) and [pyGAT](https://github.com/Diego999/pyGAT)\n\n## Citation\n\nIf you find this code useful, please cite the following:\n\n```\n@misc{Gordić2020PyTorchGAT,\n  author = {Gordić, Aleksa},\n  title = {pytorch-GAT},\n  year = {2020},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/gordicaleksa/pytorch-GAT}},\n}\n```\n\n## Licence\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/gordicaleksa/pytorch-GAT/blob/master/LICENCE)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgordicaleksa%2Fpytorch-gat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgordicaleksa%2Fpytorch-gat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgordicaleksa%2Fpytorch-gat/lists"}