{"id":13605335,"url":"https://github.com/miselico/graph_query_sampler","last_synced_at":"2025-07-04T12:07:10.355Z","repository":{"id":40318685,"uuid":"479516863","full_name":"miselico/graph_query_sampler","owner":"miselico","description":"Implementation of graph query sampling","archived":false,"fork":false,"pushed_at":"2024-11-07T17:37:38.000Z","size":269,"stargazers_count":5,"open_issues_count":19,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-27T02:47:47.062Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/miselico.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-08T19:37:23.000Z","updated_at":"2024-07-18T14:07:32.000Z","dependencies_parsed_at":"2023-10-11T17:35:38.373Z","dependency_job_id":"77ae03a8-c86b-458d-ad14-3d5ac1ec8064","html_url":"https://github.com/miselico/graph_query_sampler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/miselico/graph_query_sampler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miselico%2Fgraph_query_sampler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miselico%2Fgraph_query_sampler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miselico%2Fgraph_query_sampler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miselico%2Fgraph_query_sampler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/miselico","download_url":"https://codeload.github.com/miselico/graph_query_sampler/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miselico%2Fgraph_query_sampler/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263507899,"owners_count":23477406,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T19:00:57.506Z","updated_at":"2025-07-04T12:07:10.338Z","avatar_url":"https://github.com/miselico.png","language":"Python","funding_links":[],"categories":["📈 Datasets and Benchmarking"],"sub_categories":["Dataset tools"],"readme":"Graph Query Sampler (gqs)\n========================\n\nGraph query Sampler provides an implementation to extract queries from a graph. This is used to train and evaluate approximate graph query answering (also called multi-hop reasoning) systems.\n\nTo install, clone the repository.\n\nWe recommend creating a virtual environment using conda.\n\n`conda create --name gqs_env --python=3.11`\n\nfrom the home of this repository:\n\n`conda activate gqs_env`\n\nand run:\n\n    pip install -e .\n\nTo run test install the test dependencies using\n\n`pip install -e .[test]`\n\nFor MacOS\n\n`pip install -e '.[test]'`\n\nand then execute the tests with\n\n    pytest\n\n\n## Creating a query dataset\n\nTo create a new query dataset, follow these steps. We assume a dataset named hp in which must be in n-triples format.\n\nWhen using the command line tool, you can always see more information and options by adding `--help` to a command.\n\n1. Install graphDB. You need to configure it with a lot of memory for the query sampler.\n2. Initialize the folder for your dataset. Specify your nt file and the name you want to use for your dataset, which can only contain lowercase characters.\n```bash\ngqs init RDF --input resources/harrypotter.nt --dataset hp --blank-node-strategy convert\n```\nThis will create a new folder with the name of your dataset under the folder called datasets. All data related to the query sampling will be stored in that folder.\n\n3. Split the dataset in train, validation and test. There are several options for the splitting, but  here we just do round-robin\n```bash\ngqs split round-robin --dataset hp\n```\n\n4. Store the splits in the triple store:\n```bash\ngqs store graphdb --dataset hp\n```\n\n5. Create the mapping for your dataset. This is the mapping between identifiers in the RDF file and indices which will be used in the tensor representations.\n```bash\ngqs mapping create --dataset hp\n```\n\n6. Configure the formulas you want to use for sampling.\nMake sure that the formulas are adapted to what you need, check the shapes and configurations.\nThen copy them as follows, the `--formula-root` argument specifies the directory with formulas, the glob pattern specifies the files within that directory.\n```bash\ngqs formulas copy --formula-root ./resources/formulas_example/ --formula-glob '**/0qual//**/*'  --dataset hp\n```\n\n7. Apply the constraints to the queries with:\n```bash\ngqs formulas add-constraints --dataset hp\n```\n\n8. Sample the queries from the triple store.\n```bash\ngqs sample create --dataset hp\n```\n\n9. To use the queries, we convert them to protocol buffers\n```bash\ngqs convert csv-to-proto --dataset hp\n```\n\nDone! Now the queries can be loaded with the provided data loader.\n\nAlternatively, you could export the queries to a format which can be loaded by the KGReasoning framework: https://github.com/pminervini/KGReasoning/\n\n```bash\ngqs export to-kgreasoning --dataset hp\n```\nThe result of the export will be placed in `/datasets/{datasetname}/export/kgreasoning` these files can then be put as a dataset in the KGReasoning framework.\n\n## Compilation of the protocol buffer file\n\n* Download the protocol buffer binary. We used 3.20 and have the same version in setup.cfg. Most likely it is possible to use a newer version and put a corresponding newer version of the python package.\n* `protoc-3.20.0-linux-x86_64/bin/protoc  -I=./src/gqs/query_represenation/ --python_out=./src/gqs/query_represenation/ --pyi_out=./src/gqs/query_represenation/ ./src/gqs/query_represenation/query.prot`\n\nThen, the version used above did generate stubs which mypy on the github CI complains about. Somehow it does not process the exclude directives correctly. Hence, some Mapping types without parameters were changed to Mapping[Any,Any]\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmiselico%2Fgraph_query_sampler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmiselico%2Fgraph_query_sampler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmiselico%2Fgraph_query_sampler/lists"}