{"id":13605401,"url":"https://github.com/DimitrisAlivas/StarQE","last_synced_at":"2025-04-12T05:32:51.481Z","repository":{"id":70294591,"uuid":"377767122","full_name":"DimitrisAlivas/StarQE","owner":"DimitrisAlivas","description":"Code for the paper \"Query Embedding on Hyper-relational Knowledge Graphs\"","archived":false,"fork":false,"pushed_at":"2024-05-18T18:07:26.000Z","size":550,"stargazers_count":31,"open_issues_count":0,"forks_count":5,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-11T14:12:49.116Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DimitrisAlivas.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-06-17T08:58:31.000Z","updated_at":"2024-11-30T19:12:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"fe60090d-216c-4745-9c00-dd85650a78cc","html_url":"https://github.com/DimitrisAlivas/StarQE","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DimitrisAlivas%2FStarQE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DimitrisAlivas%2FStarQE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DimitrisAlivas%2FStarQE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DimitrisAlivas%2FStarQE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DimitrisAlivas","download_url":"https://codeload.github.com/DimitrisAlivas/StarQE/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248524204,"owners_count":21118609,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T19:00:58.320Z","updated_at":"2025-04-12T05:32:51.421Z","avatar_url":"https://github.com/DimitrisAlivas.png","language":"Python","funding_links":[],"categories":[":wrench: Implementations"],"sub_categories":["Dataset tools"],"readme":"# Query Embedding on Hyper-Relational Knowledge Graphs\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://img.shields.io/badge/python-3.8-blue.svg\"\u003e\n\u003ca href=\"https://github.com/DimitrisAlivas/mphrqe/blob/master/LICENSE\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/License-MIT-blue.svg\"\u003e\u003c/a\u003e\n\u003ca href=\"https://openreview.net/pdf?id=4rLw09TgRw9\"\u003e\u003cimg src=\"http://img.shields.io/badge/Paper-PDF-red.svg\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\n\u003cimg src=\"./the_qual_effect.svg\" width=\"100%\"\u003e\n\nThis repository contains the code used for the experiments in the paper\n\n```\nQuery Embedding on Hyper-Relational Knowledge Graphs.\nDimitrios Alivanistos and Max Berrendorf and Michael Cochez and Mikhail Galkin\nInternational Conference on Learning Representations, 2022\n```\n\nIf you encounter any problems, or have suggestions on how to improve this code, open an issue.\n\n**Abstract:**\n\u003e Multi-hop logical reasoning is an established problem in the field of representation learning on knowledge graphs (KGs).\nIt subsumes both one-hop link prediction as well as other more complex types of logical queries.\nExisting algorithms operate only on classical, triple-based graphs, whereas modern KGs often employ a hyper-relational modeling paradigm.\nIn this paradigm, typed edges may have several key-value pairs known as qualifiers that provide fine-grained context for facts.\nIn queries, this context modifies the meaning of relations, and usually reduces the answer set.\nHyper-relational queries are often observed in real-world KG applications, and existing approaches for approximate query answering cannot make use of qualifier pairs.\nIn this work, we bridge this gap and extend the multi-hop reasoning problem to hyper-relational KGs allowing to tackle this new type of complex queries.\nBuilding upon recent advancements in Graph Neural Networks and query embedding techniques, we study how to embed and answer hyper-relational conjunctive queries.\nBesides that, we propose a method to answer such queries and demonstrate in our experiments that qualifiers improve query answering on a diverse set of query patterns.\n\n## Requirements\n\nWe developed our repository using Python 3.8.5. Other version may also work.\n\nFirst, please ensure that you have properly installed\n\n* torch 1.11.0, cf. [here](https://pytorch.org/get-started/locally/)\n* torch [scatter, sparse, and geometric](https://github.com/rusty1s/pytorch_geometric), cf. [here](https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html)\n\nin your environment - the setup does not take care of these. Running experiments is possible on both CPU and GPU. On a GPU, the training should go noticeably faster. If you are using GPU, please make sure that the installed versions match your CUDA version.\n\nWe recommend the use of virtual environments, be it `virtualenv` or `conda`.\n\nNow, clone the repository and install other dependencies using pip. After moving to the root of the repo (and with your virtual env activated) type:\n\n```bash\npip install .\n```\n\nIf you want to change code, we suggest to use the *editable* mode of the pip installation:\n\n```bash\npip install -e .\n```\n\nTo log results, we suggest using [wandb](https://wandb.ai/). Instructions on installation and setting up can be found here: https://docs.wandb.ai/quickstart\n\n## Installing additional packages\n\nApart from the base packages installed with `pip install -e .` we provide with extra functionality that requires additional packages.\n\n```bash\npip install -e '.[wandb]' # For integration with wandb\npip install -e '.[test]'  # For running tests ( for further instructions see below )\npip install -e '.[dev]'   # For development\npip install -e '.[rdf]'   # For working with RDF data + triplestore integration\npip install -e '.[vis]'   # For visualisations (qualifier importance etc) \n```\n\n## Running test (optional) ##\n\nYou can run the tests by installing the test dependencies (should already be installed from the previous section)\n```bash\npip install -e '.[test]'\n```\nand then executing them\n```\npytest\n```\nBoth from the root of the project.\n\nIt is normal that you see some skipped tests.\n\n## Running experiments\n\nThe easiest way to start experiments is via the command line interface. The command line also provides more information on the options available for each command. You can show the help it by typing\n\n```bash\nhqe --help\n```\n\ninto a terminal within your active python environment. Some IDEs, e.g. PyCharm, require you to start from a file if you want to enable the debugger. To this end, we also provide a thin wrapper in `executables`, which you can start by\n\n```bash\npython executables/main.py\n```\n\n### Downloading the data\n\nAs a first step, download the WD50K dataset:\n```bash\nhqe preprocess download-wd50k\n```\n\nThen to run experiments, we offer the preprocessed queries for download.\nIt is also possible to run the preprocessing steps yourself, cf. the [data preprocessing README](src/mphrqe/data/README.md), using the following command\n\n```bash\nhqe preprocess skip-and-download-binary\n```\n\n### Training a model\n\nThere are many options are available for model training.\nFor an overview of options, run\n```bash\nhqe train --help\n```\n\n*Some examples:*\n\n---\nTrain with default settings, using 10000 reified 1hop queries with a qualifier and use 5000 reified triples from the validation set.\nDetails on how to specify the amount of samples can be found in [src/mphrqe/data/loader.Sample](the Sample class).\nNote that the data loading is taking care of only using data from the correct data split.\n\n```bash\nhqe train \\\n    -tr /1hop/1qual:atmost10000:reify \\\n    -va /1hop/1qual:5000:reify\n```\n\n---\n\nTrain with the same data, but with custom parameters for the model.\nThe example below uses target pooling to get the embedding of the query graph, uses a dropout of 0.5 in the layers, uses cosine similarity instead of the dot product to compute similarity when ranking answers to the query, and enables wandb for logging the metrics.\nFinally, the trained model is stored as a file `training-example-model.pt` which then be used in the evaluation.\n\n```bash\nhqe train \\\n    -tr /1hop/1qual:atmost10000:reify \\\n    -va /1hop/1qual:5000:reify \\\n    --graph-pooling TargetPooling \\\n    --dropout 0.5 \\\n    --similarity CosineSimilarity \\\n    --use-wandb --wandb-name \"training-example\" \\\n    --save \\\n    --model-path \"training-example-model.pt\"\n```\n\nBy default, the model path is relative to the current working directory.\nProviding an absolute path to a different directory can change that.\n\n\n### Performing hyper parameter optimization\n\nTo find optimal parameters for a dataset, one can run a hyperparameter optimization.\nUnder the hood this is using the [optuna](https://optuna.readthedocs.io/en/stable/) framework.\n\nAll options for the hyperparameter optimization can be seen with\n```bash\nhqe optimize --help\n```\n\n*Some examples:*\n\n---\nRun hyper-parameter optimization.\nThis will result in a set of runs with different hyper-parameters from which the user can pick the best.\n\n```bash\nhqe optimize \\\n    -tr \"/1hop/1qual-per-triple:*\" \\\n    -tr \"/2i/1qual-per-triple:atmost40000\" \\\n    -tr \"/2hop/1qual-per-triple:40000\" \\\n    -tr \"/3hop/1qual-per-triple:40000\" \\\n    -tr \"/3i/1qual-per-triple:40000\" \\\n    -va \"/1hop/1qual-per-triple:atmost3500\" \\\n    -va \"/2i/1qual-per-triple:atmost3500\" \\\n    -va \"/2hop/1qual-per-triple:atmost3500\" \\\n    -va \"/3hop/1qual-per-triple:atmost3500\" \\\n    -va \"/3i/1qual-per-triple:atmost3500\" \\\n    --use-wandb \\\n    --wandb-name \"hpo-query2box-style\"\n```\n\n### Evaluating model performance\n\nTo evaluate a model's performance on the test set, we provide an example below:\n\n```bash\nhqe evaluate \\\n    --test-data \"/1hop/1qual:5000:reify\" \\\n    --use-wandb \\\n    --wandb-name \"test-example\" \\\n    --model-path \"training-example-model.pt\"\n```\n\n\n# Citation\n\nIf you find this work useful, please consider citing\n\n```bibtex\n@inproceedings{\n  alivanistos2022query,\n  title={Query Embedding on Hyper-Relational Knowledge Graphs},\n  author={Dimitrios Alivanistos and Max Berrendorf and Michael Cochez and Mikhail Galkin},\n  booktitle={International Conference on Learning Representations},\n  year={2022},\n  url={https://openreview.net/forum?id=4rLw09TgRw9}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDimitrisAlivas%2FStarQE","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDimitrisAlivas%2FStarQE","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDimitrisAlivas%2FStarQE/lists"}