{"id":25215617,"url":"https://github.com/deezer/linear_graph_autoencoders","last_synced_at":"2025-10-25T14:31:20.740Z","repository":{"id":40643770,"uuid":"212307771","full_name":"deezer/linear_graph_autoencoders","owner":"deezer","description":"Source code from the NeurIPS 2019 workshop article \"Keep It Simple: Graph Autoencoders Without Graph Convolutional Networks\" (G. Salha, R. Hennequin, M. Vazirgiannis) + k-core framework implementation from IJCAI 2019 article \"A Degeneracy Framework for Scalable Graph Autoencoders\" (G. Salha, R. Hennequin, V.A. Tran, M. Vazirgiannis)","archived":false,"fork":false,"pushed_at":"2020-10-12T08:19:36.000Z","size":5858,"stargazers_count":131,"open_issues_count":0,"forks_count":14,"subscribers_count":12,"default_branch":"master","last_synced_at":"2024-04-16T11:27:17.728Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/deezer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-10-02T10:08:31.000Z","updated_at":"2024-01-23T16:39:46.000Z","dependencies_parsed_at":"2022-09-10T19:51:15.498Z","dependency_job_id":null,"html_url":"https://github.com/deezer/linear_graph_autoencoders","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Flinear_graph_autoencoders","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Flinear_graph_autoencoders/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Flinear_graph_autoencoders/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Flinear_graph_autoencoders/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/deezer","download_url":"https://codeload.github.com/deezer/linear_graph_autoencoders/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238161490,"owners_count":19426669,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-10T18:15:09.420Z","updated_at":"2025-10-25T14:31:19.687Z","avatar_url":"https://github.com/deezer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Linear Graph Autoencoders\n\nThis repository provides Python (Tensorflow) code to reproduce experiments from the article [Keep It Simple: Graph Autoencoders Without Graph Convolutional Networks](https://arxiv.org/pdf/1910.00942.pdf) presented at the **NeurIPS 2019** Workshop on Graph Representation Learning. \n\n\n***Update**: an extended conference version of this article is now available here: [Simple and Effective Graph Autoencoders with One-Hop Linear Models](https://arxiv.org/pdf/2001.07614.pdf) (accepted at **ECML-PKDD 2020**).*\n\n***Update 2**: do you prefer **PyTorch**? An implementation of Linear Graph AE and VAE is now available in the [pytorch_geometric](https://github.com/rusty1s/pytorch_geometric) project! See the example [here](https://github.com/rusty1s/pytorch_geometric/blob/master/examples/autoencoder.py).* \n\n## Introduction\n\nWe release Tensorflow implementations of the following **two graph embedding models** from the paper:\n - Linear Graph Autoencoders\n - Linear Graph Variational Autoencoders\n\ntogether with standard Graph Autoencoders (AE) and Graph Variational Autoencoders (VAE) models (with 2-layer or 3-layer Graph Convolutional Networks encoders) from [Kipf and Welling (2016)](https://arxiv.org/pdf/1611.07308.pdf). \n\nWe evaluate all models on the **link prediction** and **node clustering** tasks introduced in the paper. We provide the **Cora**, **Citeseer** and **Pubmed** datasets in the `data` folder, and refer to section 4 of the paper for direct link to the additional datasets used in our experiments.\n\nOur code builds upon Thomas Kipf's [original Tensorflow implementation](https://github.com/tkipf/gae) of standard Graph AE/VAE.\n\n![Linear AE and VAE](figures/linearsummary.png)\n\n#### Scaling-Up Graph AE and VAE\n\nStandard Graph AE and VAE models suffer from scalability issues. In order to scale them to **large graphs** with millions of nodes and egdes, we also provide an implementation of our framework from the article [A Degeneracy Framework for Scalable Graph Autoencoders](https://arxiv.org/pdf/1902.08813.pdf) (IJCAI 2019). In this paper, we propose to train the graph AE/VAE only from a dense subset of nodes, namely the [k-core or k-degenerate](https://networkx.github.io/documentation/stable/reference/algorithms/core.html) subgraph. Then, we propagate embedding representations to the remaining nodes using faster heuristics.\n\n***Update**: in [this other repository](https://github.com/deezer/fastgae), we provide an implementation of **FastGAE**, a new (and more effective) method from our group to scale Graph AE and VAE.*\n\n\n![Degeneracy Framework](figures/ijcaisummary.png)\n\n## Installation\n\n```bash\npython setup.py install\n```\n\nRequirements: tensorflow (1.X), networkx, numpy, scikit-learn, scipy\n\n\n## Run Experiments\n\n```bash\ncd linear_gae\npython train.py --model=gcn_vae --dataset=cora --task=link_prediction\npython train.py --model=linear_vae --dataset=cora --task=link_prediction\n```\n\nThe above commands will train a *standard Graph VAE with 2-layer GCN encoders (line 2)* and a *Linear Graph VAE (line 3)* on *Cora dataset* and will evaluate embeddings on the *Link Prediction* task, with all parameters set to default values.\n\n```bash\npython train.py --model=gcn_vae --dataset=cora --task=link_prediction --kcore=True --k=2\npython train.py --model=gcn_vae --dataset=cora --task=link_prediction --kcore=True --k=3\npython train.py --model=gcn_vae --dataset=cora --task=link_prediction --kcore=True --k=4\n```\n\nBy adding `--kcore=True`, the model will only be trained on the k-core subgraph instead of using the entire graph. Here, k is a parameter (from 0 to the maximal core number of the graph) to specify using the `--k` flag.\n\n#### Complete list of parameters\n\n\n| Parameter        | Type           | Description  | Default Value |\n| :-------------: |:-------------:| :-------------------------------|:-------------: |\n| `model`     | string | Name of the model, among:\u003cbr\u003e - `gcn_ae`: Graph AE from Kipf and Welling (2016), with 2-layer GCN encoder and inner product decoder\u003cbr\u003e - `gcn_vae`: Graph VAE from Kipf and Welling (2016), with Gaussian distributions, 2-layer GCN encoders for mu and sigma, and inner product decoder \u003cbr\u003e - `linear_ae`: Linear Graph AE, as introduced in section 3 of NeurIPS workshop paper, with linear encoder, and inner product decoder \u003cbr\u003e - `linear_vae`: Linear Graph VAE, as introduced in section 3 of NeurIPS workshop paper, with Gaussian distributions, linear encoders for mu and sigma, and inner product decoder \u003cbr\u003e - `deep_gcn_ae`: Deeper version of Graph AE, with 3-layer GCN encoder, and inner product decoder \u003cbr\u003e - `deep_gcn_vae`: Deeper version of Graph VAE, with Gaussian distributions, 3-layer GCN encoders for mu and sigma, and inner product decoder| `gcn_ae` |\n| `dataset`    | string      | Name of the dataset, among:\u003cbr\u003e - `cora`: scientific publications citation network \u003cbr\u003e - `citeseer`: scientific publications citation network  \u003cbr\u003e - `pubmed`: scientific publications citation network \u003cbr\u003e \u003cbr\u003e We provide the preprocessed versions, coming from the [tkipf/gae](https://github.com/tkipf/gae/) repository. Please check the [LINQS](https://linqs.soe.ucsc.edu/data) website for raw data  \u003cbr\u003e \u003cbr\u003e You can specify any additional graph dataset, in *edgelist* format,\u003cbr\u003e by editing `input_data.py`| `cora`|\n| `task` | string |Name of the Machine Learning evaluation task, among: \u003cbr\u003e - `link_prediction`: Link Prediction \u003cbr\u003e - `node_clustering`: Node Clustering \u003cbr\u003e \u003cbr\u003e See section 4 and supplementary material of NeurIPS 2019 workshop paper for details about tasks| `link_prediction`|\n| `dropout`| float | Dropout rate | `0.` |\n| `epoch`| int | Number of epochs in model training | `200` |\n| `features`| boolean | Whether to include node features in encoder | `False` |\n| `learning_rate`| float | Initial learning rate (with Adam optimizer) | `0.01` |\n| `hidden`| int | Number of units in GCN encoder hidden layer(s) | `32` |\n| `dimension`| int | Dimension of encoder output, i.e. embedding dimension | `16` |\n| `kcore`| boolean | Whether to run k-core decomposition and use the degeneracy framework from IJCAI paper. If `False`, the AE/VAE will be trained on the entire graph | `False` |\n| `k`| int | Which k-core to use. Higher k =\u003e smaller graphs and faster (but maybe less accurate) training | `2` |\n| `nb_run`| integer | Number of model runs + tests | `1` |\n| `prop_val`| float | Proportion of edges in validation set (for Link Prediction) | `5.` |\n| `prop_test`| float | Proportion of edges in test set (for Link Prediction) | `10.` |\n| `validation`| boolean | Whether to report validation results  at each epoch (for Link Prediction) | `False` |\n| `verbose`| boolean | Whether to print full comments details | `True` |\n\n#### Models from the paper\n\n**Cora**\n\n```Bash\npython train.py --dataset=cora --model=linear_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5\npython train.py --dataset=cora --model=linear_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5\npython train.py --dataset=cora --model=gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\npython train.py --dataset=cora --model=gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\npython train.py --dataset=cora --model=deep_gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\npython train.py --dataset=cora --model=deep_gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\n```\n\n**Cora** - with features\n\n```Bash\npython train.py --dataset=cora --features=True --model=linear_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5\npython train.py --dataset=cora --features=True --model=linear_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5\npython train.py --dataset=cora --features=True --model=gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\npython train.py --dataset=cora --features=True --model=gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\npython train.py --dataset=cora --features=True --model=deep_gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\npython train.py --dataset=cora --features=True --model=deep_gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\n```\n\n**Citeseer**\n\n```Bash\npython train.py --dataset=citeseer --model=linear_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5\npython train.py --dataset=citeseer --model=linear_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5\npython train.py --dataset=citeseer --model=gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\npython train.py --dataset=citeseer --model=gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\npython train.py --dataset=citeseer --model=deep_gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\npython train.py --dataset=citeseer --model=deep_gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\n```\n\n**Citeseer** - with features\n\n```Bash\npython train.py --dataset=citeseer --features=True --model=linear_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5\npython train.py --dataset=citeseer --features=True --model=linear_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5\npython train.py --dataset=citeseer --features=True --model=gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\npython train.py --dataset=citeseer --features=True --model=gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\npython train.py --dataset=citeseer --features=True --model=deep_gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\npython train.py --dataset=citeseer --features=True --model=deep_gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\n```\n\n**Pubmed**\n\n```Bash\npython train.py --dataset=pubmed --model=linear_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5\npython train.py --dataset=pubmed --model=linear_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5\npython train.py --dataset=pubmed --model=gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\npython train.py --dataset=pubmed --model=gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\npython train.py --dataset=pubmed --model=deep_gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\npython train.py --dataset=pubmed --model=deep_gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\n```\n\n**Pubmed** - with features\n\n```Bash\npython train.py --dataset=pubmed --features=True --model=linear_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5\npython train.py --dataset=pubmed --features=True --model=linear_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5\npython train.py --dataset=pubmed --features=True --model=gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\npython train.py --dataset=pubmed --features=True --model=gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\npython train.py --dataset=pubmed --features=True --model=deep_gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\npython train.py --dataset=pubmed --features=True --model=deep_gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5\n```\n\nNotes:\n - Set `--task=node_clustering` with same hyperparameters to evaluate models on node clustering (as in Table 4) instead of link prediction\n - Set `--nb_run=100` to report mean AUC and AP along with standard errors over 100 runs, as in the paper\n - We recommend GPU usage for faster learning\n\n## Cite\n\n**1** - Please cite the following paper(s) if you use linear graph AE/VAE code in your own work.\n\nNeurIPS 2019 workshop version:\n\n```BibTeX\n@misc{salha2019keep,\n  title={Keep It Simple: Graph Autoencoders Without Graph Convolutional Networks},\n  author={Salha, Guillaume and Hennequin, Romain and Vazirgiannis, Michalis},\n  howpublished={Workshop on Graph Representation Learning, 33rd Conference on Neural Information Processing Systems (NeurIPS)},\n  year={2019}\n}\n```\n\nand/or the extended conference version:\n\n```BibTeX\n@inproceedings{salha2020simple,\n  title={Simple and Effective Graph Autoencoders with One-Hop Linear Models},\n  author={Salha, Guillaume and Hennequin, Romain and Vazirgiannis, Michalis},\n  booktitle={European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)},\n  year={2020}\n}\n```\n\n**2** - Please cite the following paper if you use the k-core framework for scalability in your own work.\n\n```BibTeX\n@inproceedings{salha2019degeneracy,\n  title={A Degeneracy Framework for Scalable Graph Autoencoders},\n  author={Salha, Guillaume and Hennequin, Romain and Tran, Viet Anh and Vazirgiannis, Michalis},\n  booktitle={28th International Joint Conference on Artificial Intelligence (IJCAI)},\n  year={2019}\n}\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeezer%2Flinear_graph_autoencoders","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeezer%2Flinear_graph_autoencoders","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeezer%2Flinear_graph_autoencoders/lists"}