{"id":28795720,"url":"https://github.com/ukplab/starsem2018-entity-linking","last_synced_at":"2025-07-23T04:33:14.849Z","repository":{"id":66147423,"uuid":"129748026","full_name":"UKPLab/starsem2018-entity-linking","owner":"UKPLab","description":"Accompanying code for our *SEM 2018 @ NAACL 2018 paper \"Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories\"","archived":false,"fork":false,"pushed_at":"2020-02-12T13:05:43.000Z","size":10530,"stargazers_count":58,"open_issues_count":3,"forks_count":16,"subscribers_count":23,"default_branch":"master","last_synced_at":"2025-06-18T03:10:02.778Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/UKPLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-04-16T13:29:37.000Z","updated_at":"2025-01-17T12:57:12.000Z","dependencies_parsed_at":null,"dependency_job_id":"b5e032e5-560a-461c-bb94-e54230fc3c33","html_url":"https://github.com/UKPLab/starsem2018-entity-linking","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/UKPLab/starsem2018-entity-linking","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UKPLab%2Fstarsem2018-entity-linking","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UKPLab%2Fstarsem2018-entity-linking/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UKPLab%2Fstarsem2018-entity-linking/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UKPLab%2Fstarsem2018-entity-linking/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/UKPLab","download_url":"https://codeload.github.com/UKPLab/starsem2018-entity-linking/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UKPLab%2Fstarsem2018-entity-linking/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266618772,"owners_count":23957273,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-23T02:00:09.312Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-18T03:10:02.710Z","updated_at":"2025-07-23T04:33:14.836Z","avatar_url":"https://github.com/UKPLab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories\n\n## Entity linking with the Wikidata knowledge base\n\nThis is an accompanying repository for our ***SEM 2018 paper** ([.pdf](https://www.aclweb.org/anthology/S18-2007)). \nIt contains the code to replicate the experiments and train the models described in the paper.\n\n\u003e This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.\n \n\nPlease use the following citation:\n\n```\n@inproceedings{TUD-CS-2018-01,\n    title = {Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories},\n    author = {Sorokin, Daniil and Gurevych, Iryna},\n    publisher = {Association for Computational Linguistics},\n    booktitle = {Proceedings of the 7th Joint Conference on Lexical and Computational Semantics (*SEM 2018) },\n    pages = {to appear},\n    month = jun,\n    year = {2018},\n    location = {New Orleans, LA, U.S.}\n}\n```\n\n### Paper abstract:\n\u003e The first stage of every knowledge base question answering approach is to link entities in the input question. \n  We investigate entity linking in the context of a question answering task and present a jointly optimized neural architecture for entity mention detection and entity disambiguation that models the surrounding context on different levels of granularity. \n\n\u003e We use the Wikidata knowledge base and available question answering datasets to create benchmarks for entity linking on question answering data. \n  Our approach outperforms the previous state-of-the-art system on this data, resulting in an average 8% improvement of the final score. We further demonstrate that our model delivers a strong performance across different entity categories.\n\nPlease, refer to the paper for more the model description and training details \n \n### Contacts:\nIf you have any questions regarding the code, please, don't hesitate to contact the authors or report an issue.\n  * Daniil Sorokin, [personal page](https://daniilsorokin.github.io)\n  * https://www.ukp.tu-darmstadt.de\n  * https://www.tu-darmstadt.de\n\n### Project structure:\n\n\u003ctable\u003e\n    \u003ctr\u003e\n        \u003cth\u003eFile\u003c/th\u003e\u003cth\u003eDescription\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003econfigs/\u003c/td\u003e\u003ctd\u003eConfiguration files for the experiments\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eentitylinking/core\u003c/td\u003e\u003ctd\u003eMention extraction and candidate retrieval\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eentitylinking/datasets\u003c/td\u003e\u003ctd\u003eDatasets IO\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eentitylinking/evaluation\u003c/td\u003e\u003ctd\u003eEvaluation measures and scripts\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eentitylinking/mlearning\u003c/td\u003e\u003ctd\u003eModel definition and training scripts\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eentitylinking/wikidata\u003c/td\u003e\u003ctd\u003eRetrieving information from Wikidata\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eresources/\u003c/td\u003e\u003ctd\u003eNecessary resources\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003etrainedmodels/\u003c/td\u003e\u003ctd\u003eTrained models\u003c/td\u003e\n    \u003c/tr\u003e\n\u003c/table\u003e\n\n\n#### Requirements:\n* Python 3.6\n* PyTorch 0.3.0 - [read here about installation](http://pytorch.org/)\n* See `requirements.txt` for the full list of packages\n\n\n### QA data for benchmarking entity linking systems\n\n- Download the pre-processed data sets (WebQSP and GraphQuestions) for evaluating entity linkers on QA data with Wikidata: https://public.ukp.informatik.tu-darmstadt.de/starsem18-entity-linking/EntityLinkingForQADatasets.zip.\n- Read our [paper](https://www.aclweb.org/anthology/S18-2007) to learn the evaluation details.\n\n### Installation:\n\n1. Download and install Anaconda (https://www.anaconda.com/)\n2. Create an anaconda environment: `conda create -n qa-env python=3.6` and activate it `conda activate qa-env`\n3. Install PyTorch 0.3.1: `conda install pytorch=0.3.1 -c pytorch` (with CUDA if you want to use GPU)\n4. Install the rest of the dependencies from the `requirements.txt` with: `conda install --yes --file requirements.txt`. \n5. Install `pycorenlp, SPARQLWrapper` with `pip install pycorenlp SPARQLWrapper`.\n6. Create a local copy of the Wikidata knowledge base in RDF format. We use the [Virtuoso Opensource Server](https://github.com/openlink/virtuoso-opensource) and wrote a guide on the installation [here](https://github.com/UKPLab/coling2018-graph-neural-networks-question-answering/blob/master/WikidataHowTo.md) (in a different repository). This step takes a lot of time!. Right now this is the only way to run the models at test time, we are working to providing a smaller Wikidata dump just for the training/evaluation on the data sets.\n\n### Using the pre-trained model:\n\nFollow the steps to use this project as an external entity-linking tool. `FeatureModel_Baseline` is a part of the repository, you can download the `VCG` model [here](https://public.ukp.informatik.tu-darmstadt.de/starsem18-entity-linking/VectorModel_VCG.zip).\n\nFor the VCG model you also need KB embeddings produced by [Fast-TransX](https://github.com/thunlp/Fast-TransX). Download [here](https://public.ukp.informatik.tu-darmstadt.de/starsem18-entity-linking/Wikidata_TransE_50.zip). \n\n1. Clone/Download the project\n2. Take a pre-trained model and extract it into a `trainedmodels/` folder in the main directory of the project\n3. Download the [GloVe embeddings, glove.6B.zip](https://nlp.stanford.edu/projects/glove/)\nand put them into the folder `resources/glove/` in the main directory of the project\n4. Modify the path to the word embeddings in the configuration file for the model: `trainedmodels/FeatureModel_Baseline.param`\n5. Make sure that the project folder in your Python PATH\n6. Use the following code to initialize an entity linker and apply it on new data:\n\n```python\nfrom entitylinking import core\n    \nentitylinker = core.MLLinker(path_to_model=\"trainedmodels/FeatureModel_Baseline.torchweights\")\noutput = entitylinker.link_entities_in_raw_input(\"Barack Obama is a president.\")\nprint(output.entities)\n```\n\n### Running the experiments from the paper:\n\n1. Download and install the pre-trained models as described above.\n2. Download the pre-processed data sets for evaluating entity linkers on QA data [here](https://public.ukp.informatik.tu-darmstadt.de/starsem18-entity-linking/EntityLinkingForQADatasets.zip).\n3. If you use the given config files and the precomputed candidates for the train and the test set, you should not need the Wikidata local endpoint.\n2. See `run_experiments.sh`\n\n\n### License:\n* Apache License Version 2.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fukplab%2Fstarsem2018-entity-linking","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fukplab%2Fstarsem2018-entity-linking","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fukplab%2Fstarsem2018-entity-linking/lists"}