{"id":21023982,"url":"https://github.com/linxueyuanstdio/tflex","last_synced_at":"2025-05-15T08:33:07.626Z","repository":{"id":97885517,"uuid":"456135441","full_name":"LinXueyuanStdio/TFLEX","owner":"LinXueyuanStdio","description":"[NeurIPS 2023] TFLEX: Temporal Feature-Logic Embedding Framework for Complex Reasoning over Temporal Knowledge Graph","archived":false,"fork":false,"pushed_at":"2024-06-18T15:57:44.000Z","size":67162,"stargazers_count":35,"open_issues_count":1,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-03T06:51:12.250Z","etag":null,"topics":["knowledge-graph","knowledge-graph-reasoning","temporal-knowledge-graph","temporal-logic"],"latest_commit_sha":null,"homepage":"https://xichen.pub/project-TFLEX/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LinXueyuanStdio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-06T11:47:56.000Z","updated_at":"2025-03-28T16:53:27.000Z","dependencies_parsed_at":"2024-06-18T19:57:52.140Z","dependency_job_id":null,"html_url":"https://github.com/LinXueyuanStdio/TFLEX","commit_stats":null,"previous_names":[],"tags_count":0,"template":true,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LinXueyuanStdio%2FTFLEX","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LinXueyuanStdio%2FTFLEX/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LinXueyuanStdio%2FTFLEX/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LinXueyuanStdio%2FTFLEX/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LinXueyuanStdio","download_url":"https://codeload.github.com/LinXueyuanStdio/TFLEX/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254304716,"owners_count":22048449,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["knowledge-graph","knowledge-graph-reasoning","temporal-knowledge-graph","temporal-logic"],"created_at":"2024-11-19T11:21:56.768Z","updated_at":"2025-05-15T08:33:02.610Z","avatar_url":"https://github.com/LinXueyuanStdio.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv id=\"top\"\u003e\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n# TFLEX: Temporal Feature-Logic Embedding Framework for Complex Reasoning over Temporal Knowledge Graph #\n\n[![PyTorch](https://img.shields.io/badge/PyTorch_1.8+-ee4c2c?logo=pytorch\u0026logoColor=white)](https://pytorch.org/get-started/locally/)\n![acceptance](https://img.shields.io/badge/Conference-NeurIPS2023-blue.svg?\u0026labelColor=gray)\n![license](https://img.shields.io/badge/License-Apache2.0-green.svg?labelColor=gray)\n\n\u003c/div\u003e\n\nCode for \"[TFLEX: Temporal Feature-Logic Embedding Framework for Complex Reasoning over Temporal Knowledge Graph](https://openreview.net/forum?id=oaGdsgB18L)\" accepted to NeurIPS 2023.\n\n[[OpenReview]](https://openreview.net/forum?id=oaGdsgB18L) [[arXiv]](https://arxiv.org/abs/2205.14307) [[Dataset: Google Drive]](https://drive.google.com/drive/folders/1ddkJoUBKxgonD8rYTIL_Tb3Pei_Mtvdb?usp=sharing)\n\nMulti-hop logical reasoning over knowledge graph (KG) plays a fundamental role in many artificial intelligence tasks. Recent complex query embedding (CQE) methods for reasoning focus on static KGs, while temporal knowledge graphs (TKGs) have not been fully explored. Reasoning over TKGs has two challenges: 1. The query should answer entities or timestamps; 2. The operators should consider both set logic on entity set and temporal logic on timestamp set. To bridge this gap, we define the multi-hop logical reasoning problem on TKGs. With generated three datasets, we propose the first temporal CQE named Temporal Feature-Logic Embedding framework (TFLEX) to answer the temporal complex queries. We utilize vector logic to compute the logic part of Temporal Feature-Logic embeddings, thus naturally modeling all First-Order Logic (FOL) operations on entity set. In addition, our framework extends vector logic on timestamp set to cope with three extra temporal operators (After, Before and Between). Experiments on numerous query patterns demonstrate the effectiveness of our method.\n\nBelow is a typical multi-hop temporal complex query and its computation graph: \"During François Hollande was the president of France, which countries did Xi Jinping visit but Barack Obama did not visit?\". In the computation graph, there are entity set (blue circle), timestamp set (green triangle), time set projection (green arrow), entity set projection (blue arrow) and logical operators (red rectangle).\n\n![](assets/TemporalComplexQuery.png)\n\n## 🔔 News\n\n- **`May. 5, 2024`: Datasets are also held in 🤗 HuggingFace: [ICEWS14](https://huggingface.co/datasets/linxy/ICEWS14), [ICEWS05_15](https://huggingface.co/datasets/linxy/ICEWS05_15), [GDELT](https://huggingface.co/datasets/linxy/GDELT)**\n- **`May. 1, 2024`: ICEWS14 dataset is converted to json list for academic exploring.**\n- **`Oct. 15, 2023`: Accepted to NeurIPS 2023! We have released the datasets of TFLEX in [Google Drive](https://drive.google.com/drive/folders/1ddkJoUBKxgonD8rYTIL_Tb3Pei_Mtvdb?usp=sharing).**\n\n\n## 🌍 Contents\n\n- [1. Install](#-1-install)\n- [2. Get Started](#-2-get-started)\n- [3. Results](#-3-results)\n- [4. Visualization](#-4-visualization)\n- [5. Interpreter](#-5-interpreter)\n- [6. Dataset](#-6-dataset)\n\n### 🔬 1. Install\n\n- Python (\u003e= 3.7)\n- [PyTorch](http://pytorch.org/) (\u003e= 1.8.0)\n- numpy (\u003e= 1.19.2)\n\n```sh\npip install -r requirements.txt\ncd assistance\npip install -e .\ncd ..\n```\n\n### 🚀 2. Get Started\n\n❗NOTE: Download the datasets in [Google Drive](https://drive.google.com/drive/folders/1ddkJoUBKxgonD8rYTIL_Tb3Pei_Mtvdb?usp=sharing) (~5G) and place in `data` folder.\n\n```\n./data\n  - ICEWS14\n    - cache\n      - cache_xxx.pkl\n      - cache_xxx.pkl\n    - train\n    - test\n    - valid\n  - ICEWS05-15\n    - cache\n      - cache_xxx.pkl\n      - cache_xxx.pkl\n    - train\n    - test\n    - valid\n  - GDELT\n    - cache\n      - cache_xxx.pkl\n      - cache_xxx.pkl\n    - train\n    - test\n    - valid\n```\nThen run the command to train TFLEX on ICEWS14:\n\n```sh\n$ python train_TCQE_TFLEX.py --name=\"TFLEX_dim800_gamma15\" --hidden_dim=800 --test_batch_size=32 --every_test_step=10000 --dataset=\"ICEWS14\" --data_home=\"./data\"\n\n$ python train_TCQE_TFLEX.py --help\nUsage: train_TCQE_TFLEX.py [OPTIONS]\n\nOptions:\n  --data_home TEXT                The folder path to dataset.\n  --dataset TEXT                  Which dataset to use: ICEWS14, ICEWS05_15,\n                                  GDELT.\n  --name TEXT                     Name of the experiment.\n  --start_step INTEGER            start step.\n  --max_steps INTEGER             Number of steps.\n  --every_test_step INTEGER       test every k steps\n  --every_valid_step INTEGER      validation every k steps.\n  --batch_size INTEGER            Batch size.\n  --test_batch_size INTEGER       Test batch size. Scoring to all is memory\n                                  consuming. We need small test batch size.\n  --negative_sample_size INTEGER  negative entities sampled per query\n  --train_device TEXT             choice: cuda:0, cuda:1, cpu.\n  --test_device TEXT              choice: cuda:0, cuda:1, cpu.\n  --resume BOOLEAN                Resume from output directory.\n  --resume_by_score FLOAT         Resume by score from output directory.\n                                  Resume best if it is 0. Default: 0\n  --lr FLOAT                      Learning rate.\n  --cpu_num INTEGER               used to speed up torch.dataloader\n  --hidden_dim INTEGER            embedding dimension\n  --input_dropout FLOAT           Input layer dropout.\n  --gamma FLOAT                   margin in the loss\n  --center_reg FLOAT              center_reg for ConE, center_reg balances the\n                                  in_cone dist and out_cone dist\n  --train_tasks TEXT              the tasks for training\n  --train_all BOOLEAN             if training all, it will use all tasks in\n                                  data.train_queries_answers\n  --eval_tasks TEXT               the tasks for evaluation\n  --eval_all BOOLEAN              if evaluating all, it will use all tasks in\n                                  data.test_queries_answers\n  --help                          Show this message and exit.\n```\n\n\n\u003cdetails\u003e\n  \u003csummary\u003e👈 🔎 Full commands for reproducing all results in the paper\u003c/summary\u003e\n\n\n```shell\n# ICEWS14\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_TFLEX.py --name=\"TFLEX_dim800_gamma15\" --hidden_dim=800 --test_batch_size=32 --every_test_step=10000 --dataset=\"ICEWS14\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_X+ConE.py --name=\"X+ConE_dim800_gamma15\" --hidden_dim=800 --test_batch_size=32 --every_test_step=10000 --dataset=\"ICEWS14\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_X-1F.py --name=\"X-1F_dim800_gamma15\" --hidden_dim=800 --test_batch_size=32 --every_test_step=10000 --dataset=\"ICEWS14\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_X_without_entity_logic.py --name=\"X_without_entity_logic_dim800_gamma15\" --hidden_dim=800 --test_batch_size=32 --every_test_step=10000 --dataset=\"ICEWS14\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_X_without_time_logic.py --name=\"X_without_time_logic_dim800_gamma15\" --hidden_dim=800 --test_batch_size=32 --every_test_step=10000 --dataset=\"ICEWS14\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_X_without_logic.py --name=\"X_without_logic_dim800_gamma15\" --hidden_dim=800 --test_batch_size=32 --every_test_step=10000 --dataset=\"ICEWS14\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_Query2box.py --name=\"Query2box_dim800_gamma15\" --hidden_dim=800 --test_batch_size=32 --every_test_step=10000 --dataset=\"ICEWS14\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_BetaE.py --name=\"BetaE_dim800_gamma15\" --hidden_dim=800 --test_batch_size=32 --every_test_step=10000 --dataset=\"ICEWS14\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_ConE.py --name=\"ConE_dim800_gamma15\" --hidden_dim=800 --test_batch_size=32 --every_test_step=10000 --dataset=\"ICEWS14\"\n\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_Query2box.py --name=\"Query2box_dim800_gamma15\" --hidden_dim=800 --test_batch_size=32 --every_test_step=10000 --dataset=\"ICEWS14\" --resume=True --eval_tasks=\"Pe,Pe2,Pe3,e2i,e3i\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_BetaE.py --name=\"BetaE_dim800_gamma15\" --hidden_dim=800 --test_batch_size=32 --every_test_step=10000 --dataset=\"ICEWS14\" --resume=True --eval_tasks=\"Pe,Pe2,Pe3,e2i,e3i,e2i_N,e3i_N,Pe_e2i_Pe_NPe,e2i_PeN,e2i_NPe,e2u,Pe_e2u\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_ConE.py --name=\"ConE_dim800_gamma15\" --hidden_dim=800 --test_batch_size=16 --every_test_step=10000 --dataset=\"ICEWS14\" --resume=True --eval_tasks=\"Pe,Pe2,Pe3,e2i,e3i,e2i_N,e3i_N,Pe_e2i_Pe_NPe,e2i_PeN,e2i_NPe,e2u,Pe_e2u\"\n\n# ICEWS05-15\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_TFLEX.py --name=\"TFLEX_dim800_gamma15\" --hidden_dim=800 --test_batch_size=16 --every_test_step=10000 --dataset=\"ICEWS05_15\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_X+ConE.py --name=\"X+ConE_dim800_gamma15\" --hidden_dim=800 --test_batch_size=16 --every_test_step=10000 --dataset=\"ICEWS05_15\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_X-1F.py --name=\"X-1F_dim800_gamma15\" --hidden_dim=800 --test_batch_size=16 --every_test_step=10000 --dataset=\"ICEWS05_15\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_X_without_entity_logic.py --name=\"X_without_entity_logic_dim800_gamma15\" --hidden_dim=800 --test_batch_size=16 --every_test_step=10000 --dataset=\"ICEWS05_15\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_X_without_time_logic.py --name=\"X_without_time_logic_dim800_gamma15\" --hidden_dim=800 --test_batch_size=16 --every_test_step=10000 --dataset=\"ICEWS05_15\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_X_without_logic.py --name=\"X_without_logic_dim800_gamma15\" --hidden_dim=800 --test_batch_size=16 --every_test_step=10000 --dataset=\"ICEWS05_15\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_Query2box.py --name=\"Query2box_dim800_gamma15\" --hidden_dim=800 --test_batch_size=16 --every_test_step=10000 --dataset=\"ICEWS05_15\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_BetaE.py --name=\"BetaE_dim800_gamma15\" --hidden_dim=800 --test_batch_size=16 --every_test_step=10000 --dataset=\"ICEWS05_15\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_ConE.py --name=\"ConE_dim800_gamma15\" --hidden_dim=800 --test_batch_size=16 --every_test_step=10000 --dataset=\"ICEWS05_15\"\n\n# GDELT\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_TFLEX.py --name=\"TFLEX_dim800_gamma15\" --hidden_dim=800 --test_batch_size=64 --every_test_step=100000 --dataset=\"GDELT\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_X+ConE.py --name=\"X+ConE_dim800_gamma15\" --hidden_dim=800 --test_batch_size=64 --every_test_step=100000 --dataset=\"GDELT\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_X-1F.py --name=\"X-1F_dim800_gamma15\" --hidden_dim=800 --test_batch_size=64 --every_test_step=100000 --dataset=\"GDELT\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_X_without_entity_logic.py --name=\"X_without_entity_logic_dim800_gamma15\" --hidden_dim=800 --test_batch_size=64 --every_test_step=100000 --dataset=\"GDELT\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_X_without_time_logic.py --name=\"X_without_time_logic_dim800_gamma15\" --hidden_dim=800 --test_batch_size=64 --every_test_step=100000 --dataset=\"GDELT\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_X_without_logic.py --name=\"X_without_logic_dim800_gamma15\" --hidden_dim=800 --test_batch_size=64 --every_test_step=100000 --dataset=\"GDELT\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_Query2box.py --name=\"Query2box_dim800_gamma15\" --hidden_dim=800 --test_batch_size=64 --every_test_step=100000 --dataset=\"GDELT\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_BetaE.py --name=\"BetaE_dim800_gamma15\" --hidden_dim=800 --test_batch_size=64 --every_test_step=100000 --dataset=\"GDELT\"\nCUDA_VISIBLE_DEVICES=0 python train_TCQE_ConE.py --name=\"ConE_dim800_gamma15\" --hidden_dim=800 --test_batch_size=64 --every_test_step=100000 --dataset=\"GDELT\"\n```\n\n\u003c/details\u003e\n\n\u003cbr/\u003e\n\n### 🎯 3. Results\n\n\u003cdetails open\u003e\n  \u003csummary\u003e👈 🔎 Reported results\u003c/summary\u003e\n\n![table1_main_results](assets/table1_main_results.png)\n\n\u003c/details\u003e\n\nTo support your research, we also open source some of our LaTeX files. Full LaTeX files can be found in [arXiv](https://arxiv.org/abs/2205.14307).\n\n- [Table 1: Main Results](./assets/table1_main_results.tex)\n- [Table 2: Full Detail Results](./assets/table2_full_results.tex)\n- [Table 3: TKGC Results](./assets/table3_TKGC.tex)\n\n### 🔬 4. Visualization\n\nPlease refer to `notebook/Draw.ipynb` to visualize the inference process of temporal complex queries.\n\n![](assets/TimeInferenceVisualization.png)\n\n### 🤖 5. Interpreter\n\nTo launch an interactive interpreter, please run `python run_reasoning_interpreter.py`\n\n![](assets/interpreter.png)\n\n```python\nuse_dataset(data_home=\"/data/TFLEX/data\"); use_embedding_reasoning_interpreter(\"TFLEX_dim800_gamma15\", device=\"cuda:1\");\nsample(task_name=\"e2i\", k=1);\nemb_e1=entity_token(); emb_r1=relation_token(); emb_t1=timestamp_token();\nemb_e2=entity_token(); emb_r2=relation_token(); emb_t2=timestamp_token();\nemb_q1 = Pe(emb_e1, emb_r1, emb_t1)\nemb_q2 = Pe(emb_e2, emb_r2, emb_t2)\nemb_q = And(emb_q1, emb_q2)\nembedding_answer_entities(emb_q, topk=3)\nuse_groundtruth_reasoning_interpreter()\ngroundtruth_answer()\nOK. The bot correctly predict the hard answer which only exists in the test set!\n```\n\n### 📚 6. Dataset\n\n\u003cdetails\u003e\n  \u003csummary\u003e👈 🔎 Data directory structure\u003c/summary\u003e\n\n```\n./data\n  - ICEWS14\n    - cache\n      - cache_xxx.pkl\n      - cache_xxx.pkl\n    - train\n    - test\n    - valid\n  - ICEWS05-15\n    - cache\n      - cache_xxx.pkl\n      - cache_xxx.pkl\n    - train\n    - test\n    - valid\n  - GDELT\n    - cache\n      - cache_xxx.pkl\n      - cache_xxx.pkl\n    - train\n    - test\n    - valid\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e👈 🔎 Dataset statistics: queries_count\u003c/summary\u003e\n\n| query | ICEWS14|     |       | ICEWS05_15|   |      | GDELT |       |      |\n| :---- | :---- | :---- | :--- | :---- | :---- | :--- | :---- | :---- | :--- |\n|  | train | valid | test | train | valid | test | train | valid | test |\n| Pe | 66783 | 8837 | 8848 | 344042 | 45829 | 45644 | 1115102 | 273842 | 273432 |\n| Pe2 | 72826 | 3482 | 4037 | 368962 | 10000 | 10000 | 2215309 | 10000 | 10000 |\n| Pe3 | 72826 | 3492 | 4083 | 368962 | 10000 | 10000 | 2215309 | 10000 | 10000 |\n| e2i | 72826 | 3305 | 3655 | 368962 | 10000 | 10000 | 2215309 | 10000 | 10000 |\n| e3i | 72826 | 2966 | 3023 | 368962 | 10000 | 10000 | 2215309 | 10000 | 10000 |\n| Pt | 42690 | 7331 | 7419 | 142771 | 28795 | 28752 | 687326 | 199780 | 199419 |\n| aPt | 13234 | 4411 | 4411 | 68262 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| bPt | 13234 | 4411 | 4411 | 68262 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| Pe_Pt | 7282 | 3385 | 3638 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| Pt_sPe_Pt | 13234 | 5541 | 6293 | 68262 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| Pt_oPe_Pt | 13234 | 5480 | 6242 | 68262 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| t2i | 72826 | 5112 | 6631 | 368962 | 10000 | 10000 | 2215309 | 10000 | 10000 |\n| t3i | 72826 | 3094 | 3296 | 368962 | 10000 | 10000 | 2215309 | 10000 | 10000 |\n| e2i_N | 7282 | 2949 | 2975 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| e3i_N | 7282 | 2913 | 2914 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| Pe_e2i_Pe_NPe | 7282 | 2968 | 3012 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| e2i_PeN | 7282 | 2971 | 3031 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| e2i_NPe | 7282 | 3061 | 3192 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| t2i_N | 7282 | 3135 | 3328 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| t3i_N | 7282 | 2924 | 2944 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| Pe_t2i_PtPe_NPt | 7282 | 3031 | 3127 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| t2i_PtN | 7282 | 3300 | 3609 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| t2i_NPt | 7282 | 4873 | 5464 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| e2u | - | 2913 | 2913 | - | 10000 | 10000 | - | 10000 | 10000 |\n| Pe_e2u | - | 2913 | 2913 | - | 10000 | 10000 | - | 10000 | 10000 |\n| t2u | - | 2913 | 2913 | - | 10000 | 10000 | - | 10000 | 10000 |\n| Pe_t2u | - | 2913 | 2913 | - | 10000 | 10000 | - | 10000 | 10000 |\n| t2i_Pe | - | 2913 | 2913 | - | 10000 | 10000 | - | 10000 | 10000 |\n| Pe_t2i | - | 2913 | 2913 | - | 10000 | 10000 | - | 10000 | 10000 |\n| e2i_Pe | - | 2913 | 2913 | - | 10000 | 10000 | - | 10000 | 10000 |\n| Pe_e2i | - | 2913 | 2913 | - | 10000 | 10000 | - | 10000 | 10000 |\n| between | 7282 | 2913 | 2913 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| Pe_aPt | 7282 | 4134 | 4733 | 68262 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| Pe_bPt | 7282 | 3970 | 4565 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| Pt_sPe | 7282 | 4976 | 5608 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| Pt_oPe | 7282 | 3321 | 3621 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| Pt_se2i | 7282 | 3226 | 3466 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| Pt_oe2i | 7282 | 3236 | 3485 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| Pe_at2i | 7282 | 4607 | 5338 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |\n| Pe_bt2i | 7282 | 4583 | 5386 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e👈 🔎 Dataset statistics: avg_answers_count\u003c/summary\u003e\n\n| query | ICEWS14|     |       | ICEWS05_15|   |      | GDELT |       |      |\n| :---- | :---- | :---- | :--- | :---- | :---- | :--- | :---- | :---- | :--- |\n|  | train | valid | test | train | valid | test | train | valid | test |\n|Pe | 1.09 | 1.01 | 1.01 | 1.07 | 1.01 | 1.01 | 2.07 | 1.21 | 1.21|\n|Pe2 | 1.03 | 2.19 | 2.23 | 1.02 | 2.15 | 2.19 | 2.61 | 6.51 | 6.13|\n|Pe3 | 1.04 | 2.25 | 2.29 | 1.02 | 2.18 | 2.21 | 5.11 | 10.86 | 10.70|\n|e2i | 1.02 | 2.76 | 2.84 | 1.01 | 2.36 | 2.52 | 1.05 | 2.30 | 2.32|\n|e3i | 1.00 | 1.57 | 1.59 | 1.00 | 1.26 | 1.26 | 1.00 | 1.20 | 1.35|\n|Pt | 1.71 | 1.22 | 1.21 | 2.58 | 1.61 | 1.60 | 3.36 | 1.66 | 1.66|\n|aPt | 177.99 | 176.09 | 175.89 | 2022.16 | 2003.85 | 1998.71 | 156.48 | 155.38 | 153.41|\n|bPt | 181.20 | 179.88 | 179.26 | 1929.98 | 1923.75 | 1919.83 | 160.38 | 159.29 | 157.42|\n|Pe_Pt | 1.58 | 7.90 | 8.62 | 2.84 | 18.11 | 20.63 | 26.56 | 42.54 | 41.33|\n|Pt_sPe_Pt | 1.79 | 7.26 | 7.47 | 2.49 | 13.51 | 10.86 | 4.92 | 14.13 | 12.80|\n|Pt_oPe_Pt | 1.75 | 7.27 | 7.48 | 2.55 | 13.01 | 14.34 | 4.62 | 14.47 | 12.90|\n|t2i | 1.19 | 6.29 | 6.38 | 3.07 | 29.45 | 25.61 | 1.97 | 8.98 | 7.76|\n|t3i | 1.01 | 2.88 | 3.14 | 1.08 | 10.03 | 10.22 | 1.06 | 3.79 | 3.52|\n|e2i_N | 1.02 | 2.10 | 2.14 | 1.01 | 2.05 | 2.08 | 2.04 | 4.66 | 4.58|\n|e3i_N | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.02 | 1.19 | 1.37|\n|Pe_e2i_Pe_NPe | 1.04 | 2.21 | 2.25 | 1.02 | 2.16 | 2.19 | 3.67 | 8.54 | 8.12|\n|e2i_PeN | 1.04 | 2.22 | 2.26 | 1.02 | 2.17 | 2.21 | 3.67 | 8.66 | 8.36|\n|e2i_NPe | 1.18 | 3.03 | 3.11 | 1.12 | 2.87 | 2.99 | 4.00 | 8.15 | 7.81|\n|t2i_N | 1.15 | 3.31 | 3.44 | 1.21 | 4.06 | 4.20 | 2.91 | 8.78 | 7.56|\n|t3i_N | 1.00 | 1.02 | 1.03 | 1.01 | 1.02 | 1.02 | 1.15 | 3.19 | 3.20|\n|Pe_t2i_PtPe_NPt | 1.08 | 2.59 | 2.70 | 1.08 | 2.47 | 2.62 | 4.10 | 12.02 | 11.37|\n|t2i_PtN | 1.41 | 5.22 | 5.47 | 1.70 | 8.10 | 8.11 | 4.56 | 12.56 | 11.32|\n|t2i_NPt | 8.14 | 25.96 | 26.23 | 66.99 | 154.01 | 147.34 | 17.58 | 35.60 | 32.22|\n|e2u | 0.00 | 3.12 | 3.17 | 0.00 | 2.38 | 2.40 | 0.00 | 5.04 | 5.41|\n|Pe_e2u | 0.00 | 2.38 | 2.44 | 0.00 | 1.24 | 1.25 | 0.00 | 9.39 | 10.78|\n|t2u | 0.00 | 4.35 | 4.53 | 0.00 | 5.57 | 5.92 | 0.00 | 9.70 | 10.51|\n|Pe_t2u | 0.00 | 2.72 | 2.83 | 0.00 | 1.24 | 1.28 | 0.00 | 9.90 | 11.27|\n|t2i_Pe | 0.00 | 1.03 | 1.03 | 0.00 | 1.01 | 1.02 | 0.00 | 1.34 | 1.44|\n|Pe_t2i | 0.00 | 1.14 | 1.16 | 0.00 | 1.07 | 1.08 | 0.00 | 2.01 | 2.20|\n|e2i_Pe | 0.00 | 1.00 | 1.00 | 0.00 | 1.00 | 1.00 | 0.00 | 1.07 | 1.10|\n|Pe_e2i | 0.00 | 2.18 | 2.24 | 0.00 | 1.32 | 1.33 | 0.00 | 5.08 | 5.49|\n|between | 122.61 | 120.94 | 120.27 | 1407.87 | 1410.39 | 1404.76 | 214.16 | 210.99 | 207.85|\n|Pe_aPt | 4.67 | 16.73 | 16.50 | 18.68 | 43.80 | 46.23 | 49.31 | 66.21 | 68.88|\n|Pe_bPt | 4.53 | 17.07 | 16.80 | 18.70 | 45.81 | 48.23 | 67.67 | 84.79 | 83.00|\n|Pt_sPe | 8.65 | 28.86 | 29.22 | 71.51 | 162.36 | 155.46 | 27.55 | 45.83 | 43.73|\n|Pt_oPe | 1.41 | 5.23 | 5.46 | 1.68 | 8.36 | 8.21 | 3.84 | 11.31 | 10.06|\n|Pt_se2i | 1.31 | 5.72 | 6.19 | 1.37 | 9.00 | 9.30 | 2.76 | 8.72 | 7.66|\n|Pt_oe2i | 1.32 | 6.51 | 7.00 | 1.44 | 10.49 | 10.89 | 2.55 | 8.17 | 7.27|\n|Pe_at2i | 7.26 | 22.63 | 21.98 | 30.40 | 60.03 | 53.18 | 88.77 | 101.60 | 101.88|\n|Pe_bt2i | 7.27 | 21.92 | 21.23 | 30.31 | 61.59 | 64.98 | 88.80 | 100.64 | 100.67|\n\u003c/details\u003e\n\n\u003cbr/\u003e\n\n**📚 Explore the dataset**\n\nTo speed up the training, we have preprocessed the dataset and cached the data in `./data/{dataset_name}/cache/`.\nAnd we aim to provide a unified, human-friendly interface to access the dataset.\nThat is, we need to annotate the type of each data object in the dataset and allow to access as attribution.\nThe type annotation is friendly to IDE and can help us to avoid some bugs, otherwise, we won't know the type of object before loading it.\n\nTo inspect the dataset in jupyter notebook, we can use the following code:\n```py\nfrom ComplexTemporalQueryData import ICEWS14, ICEWS05_15, GDELT\nfrom ComplexTemporalQueryData import ComplexTemporalQueryDatasetCachePath, TemporalComplexQueryData\n\ndata_home = \"./data\"\nif dataset_name == \"ICEWS14\":\n    dataset = ICEWS14(data_home)\nelif dataset_name == \"ICEWS05_15\":\n    dataset = ICEWS05_15(data_home)\nelif dataset_name == \"GDELT\":\n    dataset = GDELT(data_home)\ncache = ComplexTemporalQueryDatasetCachePath(dataset.cache_path)\ndata = TemporalComplexQueryData(dataset, cache_path=cache)\ndata.preprocess_data_if_needed()\ndata.load_cache([\n    \"meta\",\n    \"all_timestamps\",  # -\u003e ./data/{dataset_name}/cache/all_timestamps.pkl\n    \"idx2entity\",\n    \"test_queries_answers\",\n])\nprint(data.entity_count)  # with \"meta\" loaded\nprint(data.all_timestamps)  # directly access as attribution with cache \"all_timestamps\" loaded\nprint(data.test_queries_answers)  # all cache can be found in dir \"./data/{dataset_name}/cache\", specific in class ComplexTemporalQueryDatasetCachePath\n```\n\n\u003cdetails\u003e\n  \u003csummary\u003e👈 🔎 Available attribution and cache\u003c/summary\u003e\n\n```py\n# (s, r, o, t)\nself.all_triples: List[Tuple[str, str, str, str]]\nself.train_triples: List[Tuple[str, str, str, str]]\nself.test_triples: List[Tuple[str, str, str, str]]\nself.valid_triples: List[Tuple[str, str, str, str]]\n\n# (s, r, o, t)\nself.all_triples_ids: List[Tuple[int, int, int, int]]\nself.train_triples_ids: List[Tuple[int, int, int, int]]\nself.test_triples_ids: List[Tuple[int, int, int, int]]\nself.valid_triples_ids: List[Tuple[int, int, int, int]]\n\nself.all_relations: List[str]  # name\nself.all_entities: List[str]\nself.all_timestamps: List[str]\nself.entities_ids: List[int]  # id, starting from 0\nself.relations_ids: List[int]  # origin in [0, relation_count), reversed relation in [relation_count, 2*relation_count)\nself.timestamps_ids: List[int]\n\nself.entity2idx: Dict[str, int]\nself.idx2entity: Dict[int, str]\nself.relation2idx: Dict[str, int]\nself.idx2relation: Dict[int, str]\nself.timestamp2idx: Dict[str, int]\nself.idx2timestamp: Dict[int, str]\n\n# Dict[str, Dict[str, Union[List[str], List[Tuple[List[int], Set[int]]]]]]\n#       |                       |                     |          |\n#     structure name      args name list              |          |\n#                                    ids corresponding to args   |\n#                                                          answers id set\n# 1. `structure name` is the name of a function (named query function), parsed to AST and eval to get results.\n# 2. `args name list` is the arg list of query function.\n# 3. train_queries_answers, valid_queries_answers and test_queries_answers are heavy to load (~10G+ memory)\n#    we suggest to load by query task, e.g. load_cache_by_tasks([\"Pe\", \"Pe2\", \"Pe3\", \"e2i\", \"e3i\"], \"train\")\nself.train_queries_answers: TYPE_train_queries_answers = {\n    # \"Pe_aPt\": {\n    #     \"args\": [\"e1\", \"r1\", \"e2\", \"r2\", \"e3\"],\n    #     \"queries_answers\": [\n    #         ([1, 2, 3, 4, 5], {2, 3, 5}),\n    #         ([1, 2, 3, 4, 5], {2, 3, 5}),\n    #         ([1, 2, 3, 4, 5], {2, 3, 5}),\n    #     ]\n    # }\n    # \u003e\u003e\u003e answers = Pe_aPt(1, 2, 3, 4, 5)\n    # then, answers == {2, 3}\n}\nself.valid_queries_answers: TYPE_test_queries_answers = {\n    # \"Pe_aPt\": {\n    #     \"args\": [\"e1\", \"r1\", \"e2\", \"r2\", \"e3\"],\n    #     \"queries_answers\": [\n    #         ([1, 2, 3, 4, 5], {2, 3}, {2, 3, 5}),\n    #         ([1, 2, 3, 4, 5], {2, 3}, {2, 3, 5}),\n    #         ([1, 2, 3, 4, 5], {2, 3}, {2, 3, 5}),\n    #     ]\n    # }\n    # \u003e\u003e\u003e answers = Pe_aPt(1, 2, 3, 4, 5)\n    # in training set, answers == {2, 3}\n    # in validation set, answers == {2, 3, 5}, harder and more complete\n}\nself.test_queries_answers: TYPE_test_queries_answers = {\n    # \"Pe_aPt\": {\n    #     \"args\": [\"e1\", \"r1\", \"e2\", \"r2\", \"e3\"],\n    #     \"queries_answers\": [\n    #         ([1, 2, 3, 4, 5], {2, 3, 5}, {2, 3, 5, 6}),\n    #         ([1, 2, 3, 4, 5], {2, 3, 5}, {2, 3, 5, 6}),\n    #         ([1, 2, 3, 4, 5], {2, 3, 5}, {2, 3, 5, 6}),\n    #     ]\n    # }\n    # \u003e\u003e\u003e answers = Pe_aPt(1, 2, 3, 4, 5)\n    # in training and validation set, answers == {2, 3}\n    # in testing set, answers == {2, 3, 5}, harder and more complete\n}\n\n# meta info\n# `load_cache([\"meta\"])` will load below all.\nself.query_meta = {\n    # \"Pe_aPt\": {\n    #     \"queries_count\": 1,\n    #     \"avg_answers_count\": 1\n    # }\n}\nself.entity_count: int\nself.relation_count: int\nself.timestamp_count: int\nself.valid_triples_count: int\nself.test_triples_count: int\nself.train_triples_count: int\nself.triple_count: int\n```\n\n\u003c/details\u003e\n\nor we can load or save the cache using `pickle`, bypassing the `load_cache` method:\n```py\nimport pickle\n\ndef cache_data(data, cache_path: Union[str, Path]):\n    with open(str(cache_path), 'wb') as f:\n        pickle.dump(data, f)\n\n\ndef read_cache(cache_path: Union[str, Path]):\n    with open(str(cache_path), 'rb') as f:\n        return pickle.load(f)\n\n# or we can use\n# from toolbox.data.functional import read_cache, cache_data\nidx2entity = read_cache(\"./data/{dataset_name}/cache/idx2entity.pkl\")\nprint(type(idx2entity))\ncache_data(idx2entity, \"./data/{dataset_name}/cache/idx2entity.pkl\")\n```\n\n**📚 Customize your own TKG complex query dataset**\n\nTo implement other temporal knowledge graph complex query datasets, we need to provide initial data files and customize a dataset schema class:\n```py\n\"\"\"\n./data\n  - ICEWS14\n    - cache\n      - cache_xxx.pkl\n      - cache_xxx.pkl\n    - train\n    - test\n    - valid\n\"\"\"\nfrom toolbox.data.DatasetSchema import RelationalTripletDatasetSchema\n\nclass ICEWS14(RelationalTripletDatasetSchema):\n    def __init__(self, home: Union[Path, str] = \"data\"):\n        super(ICEWS14, self).__init__(\"ICEWS14\", home)\n\n    def get_data_paths(self) -\u003e Dict[str, Path]:\n        return {\n            # provided initial data file\n            # txt utf-8 format, ecah line is\n            # \"{subject_name}\\t{relation_name}\\t{object_name}\\t{timestamp_name}\\n\"\n            'train': self.get_dataset_path_child('train'),  # data/ICEWS14/train,\n            'test': self.get_dataset_path_child('test'),  # data/ICEWS14/test\n            'valid': self.get_dataset_path_child('valid'),  # data/ICEWS14/valid\n        }\n\n    def get_dataset_path(self):\n        return self.root_path  # data root path = \"data\"\n\ndataset = ICEWS14(\"./data\")\nprint(dataset.root_path)  # data\nprint(dataset.dataset_path)  # data/ICEWS14, specific in get_dataset_path()\nprint(dataset.cache_path) # data/ICEWS14/cache\n\n# then use it as is introduced above\ncache = ComplexTemporalQueryDatasetCachePath(dataset.cache_path)\ndata = TemporalComplexQueryData(dataset, cache_path=cache)\n...\n```\n\nTo generate temporal complex queries (TCQs), we have a terminal user interface: `python run_sampling_TCQs.py`.\n```shell\n$ python run_sampling_TCQs.py --help\nUsage: run_sampling_TCQs.py [OPTIONS]\n\nOptions:\n  --data_home TEXT  The folder path to dataset.\n  --dataset TEXT    Which dataset to use: ICEWS14, ICEWS05_15, GDELT.\n  --help            Show this message and exit.\n\n$ python run_sampling_TCQs.py --data_home data --dataset ICEWS14\npreparing data\nentities_ids 7128\nrelations_ids 230\ntimestamps_ids 365\nPe train 66783 valid 8837 test 8848\nPt train 42690 valid 7331 test 7419\n...\n```\n\nTo show the meta of the generated dataset, run `python run_meta.py`.\n```shell\n$ python run_meta.py --help\nUsage: run_meta.py [OPTIONS]\n\nOptions:\n  --data_home TEXT  The folder path to dataset.\n  --help            Show this message and exit.\n```\n\n## 🤝 Citation\n\nPlease condiser citing this paper if you use the ```code``` or ```data``` from our work. Thanks a lot :)\n\n(`Xueyuan et al., 2023` preferred, instead of `Lin et al., 2023`)\n\n```bibtex\n@inproceedings{\n  xueyuan2023tflex,\n  title={TFLEX: Temporal Feature-Logic Embedding Framework for Complex Reasoning over Temporal Knowledge Graph},\n  author={Lin Xueyuan and Haihong E and Chengjin Xu and Gengxian Zhou and Haoran Luo and Tianyi Hu and Fenglong Su and Ningyuan Li and Mingzhi Sun},\n  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},\n  year={2023},\n  url={https://openreview.net/forum?id=oaGdsgB18L}\n}\n```\n\n---\n\nTFLEX is released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0) license.\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinxueyuanstdio%2Ftflex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinxueyuanstdio%2Ftflex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinxueyuanstdio%2Ftflex/lists"}