{"id":13754292,"url":"https://github.com/JanKalo/KnowlyBERT","last_synced_at":"2025-05-09T22:31:46.751Z","repository":{"id":37624059,"uuid":"247922721","full_name":"JanKalo/KnowlyBERT","owner":"JanKalo","description":null,"archived":false,"fork":false,"pushed_at":"2023-03-24T22:17:23.000Z","size":654,"stargazers_count":9,"open_issues_count":3,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-11-16T07:33:19.636Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JanKalo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-17T08:58:39.000Z","updated_at":"2023-04-28T00:51:47.000Z","dependencies_parsed_at":"2024-08-03T09:17:19.106Z","dependency_job_id":null,"html_url":"https://github.com/JanKalo/KnowlyBERT","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JanKalo%2FKnowlyBERT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JanKalo%2FKnowlyBERT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JanKalo%2FKnowlyBERT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JanKalo%2FKnowlyBERT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JanKalo","download_url":"https://codeload.github.com/JanKalo/KnowlyBERT/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253335750,"owners_count":21892727,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T09:01:53.417Z","updated_at":"2025-05-09T22:31:41.691Z","avatar_url":"https://github.com/JanKalo.png","language":"Python","funding_links":[],"categories":["知识图谱"],"sub_categories":["其他_文本生成、文本对话"],"readme":"# KnowlyBERT - Hybrid Query Processing over Language Models and Knowledge Graphs\n\nThis repository contains the code which allows to reproduce our results in the paper.\n\n## System Requirements\n- Linux\n- 128GB RAM recommended\n- a CUDA-enabled GPU with at least 11GB memory (the software runs also on CPU, but it is extremely slow)\n\n## Dependencies\n- python3.6\n- python3-pip\n- unixodbc-dev\n- PyPi Packages\n    - matplotlib==3.1.2\n    - cython==0.29.2\n    - numpy==1.15.1\n    - torch==1.0.1\n    - pytorch-pretrained-bert==0.6.1\n    - allennlp==0.8.5\n    - spacy==2.1.8\n    - tqdm==4.26.0\n    - termcolor==1.1.0\n    - pandas==0.23.4\n    - fairseq==0.8.0\n    - colorama==0.4.1\n    - simplejson==3.17.2\n    - pyodbc==4.0.30\n    - dill==0.2.9\n    - tensorflow==1.14.0 (select GPU support in `requirements.txt` manually!)\n\n## RUN IN DOCKER\n\nWe provide Dockerfiles to create a docker image with which you are able to run our code with only a few commands.\n\n### Create Docker Images\n\nThere are two Dockerfiles in this repository:\n\n### `Dockerfile`\n\nCreates an image which reproduces ALL results, including the results of our HolE Embedding.\nWe highly recommend to install the [NVIDIA Container Toolkit][1] for Docker to enable GPU acceleration.\nRunning this image without GPU acceleration will be extremely time consuming.\nIf you don't want to setup GPU acceleration, you can instead create an image without the computation of our HolE results.\n(See `Dockerfile_no-hole`)\n\n```shell\n$ docker build --file Dockerfile --tag knowlybert:all .\n$ docker run -it --volume /path/on/host:/opt/KnowlyBERT/evaluation knowlybert:all\n```\n\nSet `/path/on/host` to any non-existent location on your host-system where the container should store our evaluation results.\n\n### `Dockerfile_no-hole`\n\nCreates an image which reproduces all results, EXCEPT the results of our HolE Embedding.\nYou can run this image without GPU acceleration and it should finish in a few hours.\n\n```shell\n$ docker build --file Dockerfile_no-hole --tag knowlybert:no-hole .\n$ docker run -it --volume /path/on/host:/opt/KnowlyBERT/evaluation knowlybert:no-hole\n```\n\nSet `/path/on/host` to any non-existent location on your host-system where the container should store our evaluation results.\n\n## FIRST STEPS\n\nIf you don't want to use Docker to reproduce our results, you have to manually setup the required environment.\n\n### Install Python requirements\n\n```shell\n$ python3 -m pip install -r requirements.txt\n```\n\n### Clone RelAlign Repository\n\n```shell\n$ cd kb_embeddings/\n$ git clone https://github.com/JanKalo/RelAlign.git\n$ cd ..\n```\n\n### Install LAMA\nDo not clone the LAMA repository again. Only install it as an editable package.\n\n```shell\n$ cd LAMA/\n$ pip install --editable .\n$ cd ..\n```\n\n## Repository Structure\n\n### /LAMA/\n\nThis is mainly the repository of Petroni et al. (https://github.com/facebookresearch/LAMA) but there are also some scripts added and edited to enable this hybrid system: 1) multi token results of the language model 2) automatically extracted templates\n\n### /baseline/\n\nThis directory includes the script to evaluate the results of the Laguage Model to a specific query file. It is also possible to evaluate the two baselines as a comaprison to the language model: 1) relation extraction model 2) knowledge base embedding. For more information, see the README.md file located in the directory `baseline/`.\n\n### /kb\\_embeddings/\n\nThis directory includes the script for the integration of the knowledge base embedding *HolE* to get the loss of a given tripel.\n\n### /threshold\\_method/\n\nThis directory includes the script for calculating the threshold of the language model probabilities.\n\n## Python Files\n\nThis section only contains the files which are needed to reproduce the results.\n\n### 1) get\\_results.py\n\nThis script saves the results of the language model to given queries and parameters of the hybrid system. The parameters can be changed in `get_results.py` starting from line 343. For each evaluation and the given parameters a result directory (e.g. `\u003cchosen_result_directory\u003e` = 21.05.\\_03:18:34\\_tmc\\_tprank2\\_ts5\\_trmmax\\_ps1\\_kbe-1\\_cpTrue\\_mmd0.6) is saved to `evaluation/`. \n\n```shell\n$ python3 get_results.py\n$ cd evaluation/\u003cchosen_result_directory\u003e/\n```\n### 2) baseline/evaluate.py\n\nThis script evaluates the results of the language model by reading the result files in `evaluation/\u003cchosen_result_directory\u003e/`.\nIt returns the following twelve files:\n- evaluation\\_all.json \u0026rarr; all given queries\n- evaluation\\_object.json \u0026rarr; only queries based on the tripel (s, p, ?x)\n- evaluation\\_subject.json \u0026rarr; only queries based on the tripel (?x, p, o)\n- evaluation\\_single.json \u0026rarr; only queries with only one-token results\n- evaluation\\_multi.json \u0026rarr; only queries with one-token AND multi-token results\n- evaluation\\_1-1.json \u0026rarr; only queries with 1-1 properties\n- evaluation\\_1-n.json \u0026rarr; only queries with 1-n properties\n- evaluation\\_n-m.json \u0026rarr; only queries with n-m properties\n- evaluation_cardinality-1.json \u0026rarr; only queries with one results\n- evaluation_cardinality-1-10.json \u0026rarr; only queries with two to ten results\n- evaluation_cardinality-10-100.json \u0026rarr; only queries with eleven to 100 results\n- evaluation_cardinality-100-inf.json \u0026rarr; only queries with more than 100 results\n\n```shell\n$ python3 ../../baseline/evaluate.py --missing-data ../../baseline/missing_data.json --query-groups *query_groups.json ../../baseline/query_propmap.json ../../baseline/gold_dataset.json ../../baseline/ContextWeighted2017.json ../../baseline/hole_baseline.json data/\n```\n### 3) baseline/get\\_precision\\_recall.py\n\nThis script saves files with precision and recall values by reading the output files of `baseline/evaluate.py`.\nFor each `evaluation.json`, it returns a file with the averaged precision and recall over all queries and a file with the precision and recall averaged over all the containing queries per property.\n\n```shell\n$ python3 ../../baseline/get_precision_recall.py evaluation_all.json evaluation_object.json evaluation_subject.json evaluation_single.json evaluation_multi.json evaluation_1-1.json evaluation_1-n.json evaluation_n-m.json evaluation_cardinality-1.json evaluation_cardinality-1-10.json evaluation_cardinality-10-100.json evaluation_cardinality-100-inf.json\n```\n\n[1]: https://github.com/NVIDIA/nvidia-docker\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJanKalo%2FKnowlyBERT","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FJanKalo%2FKnowlyBERT","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJanKalo%2FKnowlyBERT/lists"}