{"id":13628124,"url":"https://github.com/zhenv5/atp","last_synced_at":"2025-04-22T15:09:30.733Z","repository":{"id":71125486,"uuid":"169783064","full_name":"zhenv5/atp","owner":"zhenv5","description":"ATP: Directed Graph Embedding with Asymmetric Transitivity Preservation","archived":false,"fork":false,"pushed_at":"2019-04-18T15:26:22.000Z","size":7088,"stargazers_count":10,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-22T15:09:23.355Z","etag":null,"topics":["aaai2019","asymmetric-transitivity","atp","directed-acyclic-graph","directed-graph","directed-graph-embedding","graph-embedding"],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zhenv5.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2019-02-08T18:50:40.000Z","updated_at":"2024-07-02T20:37:08.000Z","dependencies_parsed_at":"2023-03-11T09:57:29.589Z","dependency_job_id":null,"html_url":"https://github.com/zhenv5/atp","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhenv5%2Fatp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhenv5%2Fatp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhenv5%2Fatp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhenv5%2Fatp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zhenv5","download_url":"https://codeload.github.com/zhenv5/atp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250264909,"owners_count":21402004,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aaai2019","asymmetric-transitivity","atp","directed-acyclic-graph","directed-graph","directed-graph-embedding","graph-embedding"],"created_at":"2024-08-01T22:00:46.251Z","updated_at":"2025-04-22T15:09:30.706Z","avatar_url":"https://github.com/zhenv5.png","language":"Cuda","readme":"# ATP: Directed Graph Embedding with Asymmetric Transitivity Preservation\n\n## Required Packages\n\n\n* networkx \n* numpy\n* scipy\n* pandas\n\n## Optional Packages\n\nSome packages for matrix factorization are optional. And you can use your own package to do the matrix factorization or simply use SVD (supported by numpy and scipy) to generate the embeddings.\n\n* sklearn\n* nimfa\n* cumf_ccd (already included, please compile the code if you use it)\n* libpmf (already included, please compile the code if you use it)\n\nCheck section ```Using other Matrix Factorization Algorithms``` for more details.\n\n## Introduction\n\n\u003e Directed graphs have been widely used in Community Question Answering services (CQAs) to model asymmetric relationships among different types of nodes in CQA graphs, e.g., question, answer, user. Asymmetric transitivity is an essential property of directed graphs, since it can play an important role in downstream graph inference and analysis. Question difficulty and user expertise follow the characteristic of asymmetric transitivity. Maintaining such properties, while reducing the graph to a lower dimensional vector embedding space, has been the focus of much recent research. In this paper, we tackle the challenge of directed graph embedding with asymmetric transitivity preservation and then leverage the proposed embedding method to solve a fundamental task in CQAs: how to appropriately route and assign newly posted questions to users with the suitable expertise and interest in CQAs. The technique incorporates graph hierarchy and reachability information naturally by relying on a non-linear transformation that operates on the core reachability and implicit hierarchy within such graphs. Subsequently, the methodology levers a factorization-based approach to generate two embedding vectors for each node within the graph, to capture the asymmetric transitivity. Extensive experiments show that our framework consistently and significantly outperforms the state-of-the-art baselines on two diverse real-world tasks: link prediction, and question difficulty estimation and expert finding in online forums like Stack Exchange. Particularly, our framework can support inductive embedding learning for newly posted questions (unseen nodes during training), and therefore can properly route and assign these kinds of questions to experts in CQAs.\n* The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI 2019), acceptance rate: 1150/7095 = 16.2%\n* [arXiv](https://arxiv.org/abs/1811.00839)\n* [Slides for AAAI 2019 Presentation](https://www.dropbox.com/s/jk6auc7bvuw1dvb/Slides_AAAI_2019_ATP.pdf?dl=0)\n\n\n## How does it work \n\n\n### First Step: Break Cycles\n\nIf the input directed graph is not a directed acyclic graph (DAG), we should delete some cycle edges to make it be a DAG.\n\nCorresponding code is availabe at: [breaking_cycles_in_noisy_hierarchies](https://github.com/zhenv5/breaking_cycles_in_noisy_hierarchies)\n\nAbove code will save the edges which should be deleted to break cycles in a file. We then use ```remove_cycle_edges_to_DAGs.py``` to save the corresponding DAG to a file.\n\nFor example, run:\n\n* ```python remove_cycle_edges_to_DAGs.py --original_graph dataset/demo.edges --deleted_edges dataset/demo_deleted_edges.edges```\n\nInput:\n\n* ```--original_graph dataset/demo.edges```: original graph with cycles\n* ```--deleted_edges dataset/demo_deleted_edges.edges```: edges should be deleted to break cycles\n\nCorresponding DAG is saved at:\n\n* ```dataset/demo_DAG.edges```\n\nThe DAG file (```dataset/demo_DAG.edges```) will be our input for generating embeddings. \n\n### Generate Embeddings\n\nGiven a DAG, we can run ```main_atp.py``` to generate the required embeddings.\n\nParameters of ```main_atp.py```:\n\n* ```--dag```: input directed acyclic graph (DAG) (format can be *.gpickle, *.edges)\n* ```--rank```: number of latent factors\n* ```--strategy```: strategies to bulid hierarchical matrix: constant, linear, harmonic, ln (log)\t\n* ```--id_mapping```: 'Making Node ID start with 0', action='store_true'\n* ```--using_GPU```: 'Using GPU to do the matrix factorization (cumf/cumf_ccd)', action='store_true'\n* ```--dense_M```: 'Dense representation of M', action='store_true'\n* ```--using_SVD```: 'Using SVD to generate embeddings from M', action='store_true'\n\nOutput:\n\n* ```S``` is saved at: ```*_W.pkl```\n* ```T``` is saved at: ```*_H.pkl```\n\nSome examples:\n\nSuppose we use CPU based matrix factorization to generate corresponding embeddigns, and nodes of ```demo_DAG.edges``` start with index 0, we run:\n\n* ```python main_atp.py --dag dataset/demo_DAG.edges --rank 2 --strategy ln --dense_M```\n\nWe specify ```--id_mapping```, if nodes' id are not integers or do not start with index 0, For example, we run:\n\n* ```python main_atp.py --dag dataset/demo_DAG_String.edges --rank 2 --strategy ln --id_mapping --dense_M```\n\nIf we would like to do some GPU based matrix factorization, we have to specify ```--using_GPU```:\n\n* ```python main_atp.py --dag dataset/demo_DAG_String.edges --rank 2 --strategy ln --id_mapping --using_GPU```\n\nCheck [cumf_ccd](https://github.com/zhenv5/atp/tree/master/cumf/cumf_ccd) for more details about matrix factorization on GPU. We use ```prepare_cumf_data.py``` and ```load_cumf_ccd_matrices.py``` to prepare the inputs for ```cumf_ccd``` and process the outputs of ```cumf_ccd``` respectively.\n\n## Using other Matrix Factorization Algorithms\n\n```ATP``` can use other matrix factorization based methods easily. \n\nSimply modify ```graph_embedding.py``` to add new matrix factorization based methods.\n\nFor example:\n\nWe can use ```NMF``` from ```sklearn``` to generate corresponding embeddings:\n\n```\nfrom sklearn.decomposition import NMF\nmodel = NMF(n_components= rank, init='random', random_state=0)\nW = model.fit_transform(matrix)\nH = model.components_\n```\n\nThe input ```matrix``` is a dense matrix, so we have to specify ```--dense_M``` when we run ```main_atp.py```.\n\nHowever, when we use ```libpmf``` to do the matrix factorization, the input matrix should use a sparse representation, it's not necessary for us to specify ```--dense_M```.\n\n\n## Using SVD to generate embeddings\n\n* Using dense representation of ```M```: ```python main_atp.py --dag dataset/demo_DAG.edges --rank 2 --using_SVD --dense_M```\n* Using sparse representation of ```M```: ```python main_atp.py --dag dataset/demo_DAG.edges --rank 2 --using_SVD```\n\n## Using ```M``` as an input for other applications\n\n```M``` is the matrix which incorporates graph hierarchy and reachability. It's saved as ```train_ranking_differences.dat```.\nYou can use ```M``` as an input for other applications. \n\n\n\n## Datasets\n\nThere are three different types of datasets used in our paper:\n\n* Synthetic datasets (randomly generated): See [breaking_cycles_in_noisy_hierarchies](https://github.com/zhenv5/breaking_cycles_in_noisy_hierarchies) for details\n* Data from Stack Exchange sites: See [PyStack](https://github.com/zhenv5/PyStack) for more details\n* Other datasets: Check our paper to access the download links\n\n## Citation\n\nIf you use this code, please consider to cite ATP:\n\n```\n@article{DBLP:journals/corr/abs-1811-00839,\n  author    = {Jiankai Sun and\n               Bortik Bandyopadhyay and\n               Armin Bashizade and\n               Jiongqian Liang and\n               P. Sadayappan and\n               Srinivasan Parthasarathy},\n  title     = {{ATP:} Directed Graph Embedding with Asymmetric Transitivity Preservation},\n  journal   = {CoRR},\n  volume    = {abs/1811.00839},\n  year      = {2018},\n  url       = {http://arxiv.org/abs/1811.00839},\n  archivePrefix = {arXiv},\n  eprint    = {1811.00839},\n  timestamp = {Thu, 22 Nov 2018 17:58:30 +0100},\n  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1811-00839},\n  bibsource = {dblp computer science bibliography, https://dblp.org}\n}\n```\n\nDownload as ```bib``` file: [https://dblp.uni-trier.de/rec/bibtex/journals/corr/abs-1811-00839](https://dblp.uni-trier.de/rec/bibtex/journals/corr/abs-1811-00839)\n\n\n","funding_links":[],"categories":["Uncategorized"],"sub_categories":["Uncategorized"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzhenv5%2Fatp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzhenv5%2Fatp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzhenv5%2Fatp/lists"}