{"id":13605356,"url":"https://github.com/HKUST-KnowComp/EFOK-CQA","last_synced_at":"2025-04-12T05:32:48.877Z","repository":{"id":181385981,"uuid":"666686739","full_name":"HKUST-KnowComp/EFOK-CQA","owner":"HKUST-KnowComp","description":"EFOK-CQA: Towards Knowledge Graph Complex Query Answering beyond Set Operation","archived":false,"fork":false,"pushed_at":"2024-03-11T13:20:57.000Z","size":137,"stargazers_count":7,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-08-02T19:37:20.767Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HKUST-KnowComp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-07-15T08:31:39.000Z","updated_at":"2024-03-08T02:36:57.000Z","dependencies_parsed_at":"2023-07-15T09:39:37.446Z","dependency_job_id":null,"html_url":"https://github.com/HKUST-KnowComp/EFOK-CQA","commit_stats":null,"previous_names":["hkust-knowcomp/efok-cqa"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUST-KnowComp%2FEFOK-CQA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUST-KnowComp%2FEFOK-CQA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUST-KnowComp%2FEFOK-CQA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUST-KnowComp%2FEFOK-CQA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HKUST-KnowComp","download_url":"https://codeload.github.com/HKUST-KnowComp/EFOK-CQA/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223497748,"owners_count":17155199,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T19:00:57.769Z","updated_at":"2024-11-07T10:30:28.893Z","avatar_url":"https://github.com/HKUST-KnowComp.png","language":"Python","funding_links":[],"categories":[":wrench: Implementations"],"sub_categories":["Dataset tools"],"readme":"# EFO\u003csub\u003ek\u003c/sub\u003e-CQA: Towards Knowledge Graph Complex Query Answering beyond Set Operation\n\nThis repository is for implementation for the paper \"EFO\u003csub\u003ek\u003c/sub\u003e-CQA: Towards Knowledge Graph Complex Query Answering beyond Set Operation\".\n\n\n\n## 1 Preparation\n\n### 1.1 Environment\n\nWe have utilized a CSP solver provided in the python-constraint package, please install it by:\n```\npip install python-constraint\n```\n\nWe have also utilized the pytorch-geometric and networkx package, please install it by:\n```\nconda install pyg -c pyg\nconda install networkx\n```\n\n\n### 1.1 Data Preparation\n\n\nPlease download the EFO\u003csub\u003ek\u003c/sub\u003e-CQA dataset from [here](https://drive.google.com/drive/folders/1kqnRdpcnVdBfbY8eVRIoVUgXdgkU8qd4?usp=sharing), \nthe data of three knowledge graphs can be downloaded separately and put it in the `data` folder, as well as a file named \n`DNF_EFO2_23_4123166.csv` which is used to store the abstract query graph(query type) for the EFOX experiment.\n\nThe `DNF_EFO2_23_4123166.csv` should be put into the `data` folder.\n\nThen, after unzipping the query data. an example data folder should look like this:\n```\ndata/FB15k-237-EFOX-final/\n  - kgindex.json\n  - train_kg.tsv\n  - valid_kg.tsv\n  - test_kg.tsv\n  - test-type0000-EFOX-qaa.json \n  - test-type0001-EFOX-qaa.json\n  - ......\n```\n\nwhere the `test-type0000-EFOX-qaa.json` is used for the EFOX experiment, containing the data for query type0000. \n\nThe `kgindex.json` and `train_kg.tsv` are the index file and the training graph for the knowledge graph respectively, \nthe `valid_kg.tsv` and `test_kg.tsv` are the validation graph and the test graph respectively. They are used for data generation.\n\nIn each `test-type0000-EFOX-qaa.json`, it contains a dict with the key be the formula and the value is a list of query:\n{formula: [query1, query2, ...]}.\n\n\n\n\n### 1.2 Checkpoint Preparation\n\nTo reproduce the experiment in the paper, we have provided the checkpoint for each model foreach knowledge graph, we\noffer the checkpoint for six representative model (BetaE, LogicE, ConE, CQD, LMPNN, FIT), which can be downloaded from [here](https://drive.google.com/drive/folders/13S3wpcsZ9t02aOgA11Qd8lvO0JGGENZ2?usp=sharing),\n\n\nIt should be unzipped and put in the `ckpt` folder.\n\nAn example of the `ckpt` sub folder, which includes the model trained on the knowledge graph ``FB15k-237'' should look like this:\n```\nckpt/FB15k-237\n  - BetaE_full/checkpoint\n  - LogicE_full/450000.ckpt\n  - ConE_full/300000.ckpt\n  - CQD/FB15k-237-model-rank-1000-epoch-100-1602508358.pt\n  - LMPNN/lmpnn-FB15k-237.ckpt\n  - FIT/torch_0.005_0.001.ckpt\n```\n\nwhere each sub folder is the checkpoint for each model, and the name of the sub folder is the name of the model.\n\n## 2. Sample the data yourself\n\nWe have the powerful frame that supports several key functionalities for the task of complex query answering, \nyou can also sample the query by yourself following the instruction. \n\nIf you have downloaded the EFO\u003csub\u003ek\u003c/sub\u003e-CQA dataset, you can also skip this section.\n\n\n\n### 2.1 Enumerate the abstract query graph\nTo try to enumerate the abstract query graph, please run the following command:\n```angular2html\npython data_preparation/create_qg.py\n```\nit should output the abstract query graph in the `data` folder, with the name of `DNF_EFO2_23_4123166.csv_filtered.csv`, \nwhich is used to store the abstract query graph(query type) for the EFOX experiment. You can also change hyperparameters used \nin this code to explore other possible combinatorial space different from the one in the paper.\n\n\n### 2.2 Sample the query graph\nGround the abstract query graph to become a query graph requires two functionalities: \n1. Ground entities and relations, \n2. Compute the answer for the grounded query graph.\n\nWe give a example of sampling the query graph for the knowledge graph FB15k-237, where each type of query has \n1000 samples if it does not negation and 500 otherwise, please run the following command:\n\n```angular2html\npython data_preparation/sample_query.py --output_folder data/FB15k-237-EFOX-final --data_folder data/FB15k-237-EFOX-final --num_positive 1000 --num_negative 500\n```\n\n## 3. Reproduce the result of the paper.\n\n### 3.1 Query embedding method\nFor query embedding method, including BetaE, LogicE, ConE, please run the following command:\n\n```angular2html\npython QG_EFOX.py --config config/LogicE_FB15k-237_EFOX.yaml\n```\n\nwhich is an example for LogicE method on FB15k-237 dataset. The config file in the `config` folder is used to specify \nthe model and knowledge graph used in the experiment.\n\n### 3.2 Query graph method: CQD + LMPNN\n\nFor LMPNN, note that you need to download the CQD checkpoint as well checkpoint of LMPNN since LMPNN is built upon CQD,\nplease run the following command for LMPNN on FB15k-237 dataset:\n\n```angular2html\npython evaluate_lmpnn.py \\\n  --task_folder data/FB15k-237-EFOX-final \\\n  --checkpoint_path ckpt/FB15k-237/CQD/FB15k-237-model-rank-1000-epoch-100-1602508358.pt \\\n  --checkpoint_path_lmpnn ckpt/FB15k-237/LMPNN/lmpnn-FB15K-237.ckpt \\\n  --embedding_dim 1000\n```\n\n\nFor LMPNN on FB15k:\n```angular2html\npython evaluate_lmpnn.py \\\n  --task_folder data/FB15k-EFOX-final \\\n  --checkpoint_path ckpt/FB15k/CQD/FB15k-model-rank-1000-epoch-100-1602520745.pt \\\n  --checkpoint_path_lmpnn ckpt/FB15k/LMPNN/lmpnn-FB15K.ckpt \\\n  --hidden_dim 8192 \n```\n\n\nFor LMPNN on NELL:\n\n```angular2html\npython3 evaluate_lmpnn.py \\\n  --task_folder data/NELL-EFOX-final \\\n  --checkpoint_path ckpt/NELL/CQD/NELL-model-rank-1000-epoch-100-1602499096.pt \\\n  --checkpoint_path_lmpnn ckpt/NELL/LMPNN/lmpnn-NELL.ckpt \\\n  --hidden_dim 8192 \\\n  --temp 0.1 \n```\n\nFor CQD on FB15k-237:\n```angular2html\npython evaluate_lmpnn.py \\\n  --task_folder data/FB15k-237-EFOX-final \\\n  --reasoner gradient \\\n  --checkpoint_path ckpt/FB15k-237/CQD/FB15k-237-model-rank-1000-epoch-100-1602508358.pt \n```\n\nFor CQD on FB15k:\n\n```angular2html\npython evaluate_lmpnn.py \\\n  --task_folder data/FB15k-EFOX-final \\\n  --reasoner gradient \\\n  --checkpoint_path ckpt/FB15k/CQD/FB15k-model-rank-1000-epoch-100-1602520745.pt \\\n  --hidden_dim 8192 \n```\n\nFor CQD on NELL:\n\n```angular2html\npython3 evaluate_lmpnn.py \\\n  --task_folder data/NELL-EFOX-final \\\n  --reasoner gradient \\\n  --checkpoint_path ckpt/NELL/CQD/NELL-model-rank-1000-epoch-100-1602499096.pt \\\n  --hidden_dim 8192 \n```\n### 3.3 Query graph method: FIT\n\nFor FIT, please run the following command to run the expriment on FB15k-237: \n\n```angular2html\npython solve_EFOX.py \n```\n \nIf you want to try to use FIT on KB15k or NELL, please run the following command:\n\n```angular2html\npython solve_EFOX.py  --ckpt ckpt/FB15k/FIT/torch_0.005_0.001.ckpt --data_folder data/FB15k-EFOX-final\n```\n\n```angular2html\npython solve_EFOX.py  --ckpt ckpt/NELL/FIT/torch_0.0002_0.001.ckpt --data_folder data/NELL-EFOX-final\n```\n\n\nWe note it may encounter out-of-memory error when running the FIT model on KB15k and NELL as mentioned in the paper, \nwhich indicates that FIT face the challenge of scalability.\n\n## 4. Aggregate the final result.\n\nAs there are numerous abstract query graphs (query types), we can aggregate the result of each query type to get the \nfinal result.\n\nPlease create a folder to record the benchmark result, just like the following:\n```\nresult/FB15k-237_result\n-BetaE_test\n-LogicE_test\n-ConE_test\n-CQD_test\n-LMPNN_test\n-FIT_test\n```\n\nThen run the following code and get the presentation of the table, for knowledge graph FB15k-237, queries with one free variable: \n```angular2html\npython construct_and_analyze.py --out_folder result --dataset FB15k-237 --model LogicE --variable 1 --construct 1\n```\n\nThe `--variable` is used to specify the number of variable in the query graph, it can be set to 1 or 2.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHKUST-KnowComp%2FEFOK-CQA","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FHKUST-KnowComp%2FEFOK-CQA","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHKUST-KnowComp%2FEFOK-CQA/lists"}