{"id":20216274,"url":"https://github.com/thudm/gatne","last_synced_at":"2025-04-05T05:09:21.468Z","repository":{"id":37664715,"uuid":"185018536","full_name":"THUDM/GATNE","owner":"THUDM","description":"Source code and dataset for KDD 2019 paper \"Representation Learning for Attributed Multiplex Heterogeneous Network\"","archived":false,"fork":false,"pushed_at":"2022-01-11T12:55:18.000Z","size":9766,"stargazers_count":528,"open_issues_count":37,"forks_count":142,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-03-29T04:11:19.676Z","etag":null,"topics":["attributed-networks","heterogeneous-network","multiplex-networks","network-embedding","representation-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/THUDM.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-05-05T10:50:32.000Z","updated_at":"2025-02-24T08:12:04.000Z","dependencies_parsed_at":"2022-07-10T18:00:36.298Z","dependency_job_id":null,"html_url":"https://github.com/THUDM/GATNE","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FGATNE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FGATNE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FGATNE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FGATNE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/THUDM","download_url":"https://codeload.github.com/THUDM/GATNE/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247289429,"owners_count":20914464,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attributed-networks","heterogeneous-network","multiplex-networks","network-embedding","representation-learning"],"created_at":"2024-11-14T06:27:12.453Z","updated_at":"2025-04-05T05:09:21.444Z","avatar_url":"https://github.com/THUDM.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GATNE\n\n### [Project](https://sites.google.com/view/gatne) | [Arxiv](https://arxiv.org/abs/1905.01669)\n\nRepresentation Learning for Attributed Multiplex Heterogeneous Network.\n\n[Yukuo Cen](https://sites.google.com/view/yukuocen), Xu Zou, Jianwei Zhang, [Hongxia Yang](https://sites.google.com/site/hystatistics/home), [Jingren Zhou](http://www.cs.columbia.edu/~jrzhou/), [Jie Tang](http://keg.cs.tsinghua.edu.cn/jietang/)\n\nAccepted to KDD 2019 Research Track!\n\n## ❗ News\n\nRecent Updates (Nov. 2020):\n- Use multiprocessing to speedup the random walk procedure (by `--num-workers`)\n- Support saving/loading walk file (by `--walk-file`)\n- The PyTorch version now supports node features (by `--features`)\n\nSome Tips:\n- The PyTorch version may not reproduce the results (especially on the Twitter dataset). Please use the original TensorFlow version (`src/main.py`) for reproducing the paper results.\n- Running on large-scale datasets needs to set a larger value for `batch-size` to speedup training (e.g., several hundred or thousand).\n- If **out of memory (OOM)** occurs, you may need to decrease the values of `dimensions` and `att-dim`.\n\nOur GATNE models have been implemented by many popular graph toolkits:\n- Deep Graph Library ([DGL](https://github.com/dmlc/dgl)): see https://github.com/dmlc/dgl/tree/master/examples/pytorch/GATNE-T \n- Paddle Graph Learning ([PGL](https://github.com/PaddlePaddle/PGL)): see https://github.com/PaddlePaddle/PGL/tree/main/examples/GATNE\n- [CogDL](https://github.com/THUDM/cogdl): see https://github.com/THUDM/cogdl/blob/master/cogdl/models/emb/gatne.py\n\nSome recent papers have listed GATNE models as a strong baseline:\n- [Deep Adversarial Completion for Sparse Heterogeneous Information Network Embedding](https://dl.acm.org/doi/pdf/10.1145/3366423.3380134) (WWW'20)\n- [Decoupled Graph Convolution Network for Inferring Substitutable and Complementary Items](https://dl.acm.org/doi/pdf/10.1145/3340531.3412695) (CIKM'20)\n- [Graph Attention Networks over Edge Content-Based Channels](https://dl.acm.org/doi/pdf/10.1145/3394486.3403233) (KDD'20)\n- [Temporal heterogeneous interaction graph embedding for next-item recommendation](http://shichuan.org/doc/84.pdf) (PKDD'20)\n- [Link Inference via Heterogeneous Multi-view Graph Neural Networks](https://link.springer.com/chapter/10.1007/978-3-030-59410-7_48) (DASFAA 2020)\n- [Multi-View Collaborative Network Embedding](https://arxiv.org/pdf/2005.08189.pdf) (Arxiv, May 2020)\n\nPlease let me know if your toolkit includes GATNE models or your paper uses GATNE models as baselines. \n\n## Prerequisites\n\n- Python 3\n- TensorFlow \u003e= 1.8 or PyTorch\n\n## Getting Started\n\n### Installation\n\nClone this repo.\n\n```bash\ngit clone https://github.com/THUDM/GATNE\ncd GATNE\n```\n\nPlease first install TensorFlow or PyTorch, and then install other dependencies by\n\n```bash\npip install -r requirements.txt\n```\n\n### Dataset\n\nThese datasets are sampled from the original datasets.\n\n- Amazon contains 10,166 nodes and 148,865 edges. [Source](http://jmcauley.ucsd.edu/data/amazon)\n- Twitter contains 10,000 nodes and 331,899 edges. [Source](https://snap.stanford.edu/data/higgs-twitter.html)\n- YouTube contains 2,000 nodes and 1,310,617 edges. [Source](http://socialcomputing.asu.edu/datasets/YouTube)\n- Alibaba contains 6,163 nodes and 17,865 edges.\n\n### Training\n\n#### Training on the existing datasets\n\nYou can use `./scripts/run_example.sh` or `python src/main.py --input data/example` or `python src/main_pytorch.py --input data/example` to train GATNE-T model on the example data. (If you share the server with others or you want to use the specific GPU(s), you may need to set `CUDA_VISIBLE_DEVICES`.) \n\nIf you want to train on the Amazon dataset, you can run `python src/main.py --input data/amazon` or `python src/main.py --input data/amazon --features data/amazon/feature.txt` to train GATNE-T model or GATNE-I model, respectively. \n\nYou can use the following commands to train GATNE-T on Twitter and YouTube datasets: `python src/main.py --input data/twitter --eval-type 1` or `python src/main.py --input data/youtube`. We only evaluate the edges of the first edge type on Twitter dataset as the number of edges of other edge types is too small.\n\nAs Twitter and YouTube datasets do not have node attributes, you can generate heuristic features for them, such as DeepWalk embeddings. Then you can train GATNE-I model on these two datasets by adding the `--features` argument.\n\n#### Training on your own datasets\n\nIf you want to train GATNE-T/I on your own dataset, you should prepare the following three(or four) files:\n- train.txt: Each line represents an edge, which contains three tokens `\u003cedge_type\u003e \u003cnode1\u003e \u003cnode2\u003e` where each token can be either a number or a string.\n- valid.txt: Each line represents an edge or a non-edge, which contains four tokens `\u003cedge_type\u003e \u003cnode1\u003e \u003cnode2\u003e \u003clabel\u003e`, where `\u003clabel\u003e` is either 1 or 0 denoting an edge or a non-edge\n- test.txt: the same format with valid.txt\n- feature.txt (optional): First line contains two number `\u003cnum\u003e \u003cdim\u003e` representing the number of nodes and the feature dimension size. From the second line, each line describes the features of a node, i.e., `\u003cnode\u003e \u003cf_1\u003e \u003cf_2\u003e ... \u003cf_dim\u003e`.\n\nIf your dataset contains several node types and you want to use meta-path based random walk, you should also provide an additional file as follows:\n- node_type.txt: Each line contains two tokens `\u003cnode\u003e \u003cnode_type\u003e`, where `\u003cnode_type\u003e` should be consistent with the meta-path schema in the training command, i.e., `--schema node_type_1-node_type_2-...-node_type_k-node_type_1`. (Note that the first node type in the schema should equals to the last node type.)\n\n\nIf you have ANY difficulties to get things working in the above steps, feel free to open an issue. You can expect a reply within 24 hours.\n\n## Cite\n\nPlease cite our paper if you find this code useful for your research:\n\n```\n@inproceedings{cen2019representation,\n  title = {Representation Learning for Attributed Multiplex Heterogeneous Network},\n  author = {Cen, Yukuo and Zou, Xu and Zhang, Jianwei and Yang, Hongxia and Zhou, Jingren and Tang, Jie},\n  booktitle = {Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},\n  year = {2019},\n  pages = {1358--1368},\n  publisher = {ACM},\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthudm%2Fgatne","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthudm%2Fgatne","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthudm%2Fgatne/lists"}