https://github.com/ukplab/starsem2018-entity-linking
Accompanying code for our *SEM 2018 @ NAACL 2018 paper "Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories"
https://github.com/ukplab/starsem2018-entity-linking
Last synced: 11 months ago
JSON representation
Accompanying code for our *SEM 2018 @ NAACL 2018 paper "Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories"
- Host: GitHub
- URL: https://github.com/ukplab/starsem2018-entity-linking
- Owner: UKPLab
- License: apache-2.0
- Created: 2018-04-16T13:29:37.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2020-02-12T13:05:43.000Z (over 6 years ago)
- Last Synced: 2025-06-18T03:10:02.778Z (12 months ago)
- Language: Python
- Size: 10 MB
- Stars: 58
- Watchers: 23
- Forks: 16
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories
## Entity linking with the Wikidata knowledge base
This is an accompanying repository for our ***SEM 2018 paper** ([.pdf](https://www.aclweb.org/anthology/S18-2007)).
It contains the code to replicate the experiments and train the models described in the paper.
> This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
Please use the following citation:
```
@inproceedings{TUD-CS-2018-01,
title = {Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories},
author = {Sorokin, Daniil and Gurevych, Iryna},
publisher = {Association for Computational Linguistics},
booktitle = {Proceedings of the 7th Joint Conference on Lexical and Computational Semantics (*SEM 2018) },
pages = {to appear},
month = jun,
year = {2018},
location = {New Orleans, LA, U.S.}
}
```
### Paper abstract:
> The first stage of every knowledge base question answering approach is to link entities in the input question.
We investigate entity linking in the context of a question answering task and present a jointly optimized neural architecture for entity mention detection and entity disambiguation that models the surrounding context on different levels of granularity.
> We use the Wikidata knowledge base and available question answering datasets to create benchmarks for entity linking on question answering data.
Our approach outperforms the previous state-of-the-art system on this data, resulting in an average 8% improvement of the final score. We further demonstrate that our model delivers a strong performance across different entity categories.
Please, refer to the paper for more the model description and training details
### Contacts:
If you have any questions regarding the code, please, don't hesitate to contact the authors or report an issue.
* Daniil Sorokin, [personal page](https://daniilsorokin.github.io)
* https://www.ukp.tu-darmstadt.de
* https://www.tu-darmstadt.de
### Project structure:
FileDescription
configs/Configuration files for the experiments
entitylinking/coreMention extraction and candidate retrieval
entitylinking/datasetsDatasets IO
entitylinking/evaluationEvaluation measures and scripts
entitylinking/mlearningModel definition and training scripts
entitylinking/wikidataRetrieving information from Wikidata
resources/Necessary resources
trainedmodels/Trained models
#### Requirements:
* Python 3.6
* PyTorch 0.3.0 - [read here about installation](http://pytorch.org/)
* See `requirements.txt` for the full list of packages
### QA data for benchmarking entity linking systems
- Download the pre-processed data sets (WebQSP and GraphQuestions) for evaluating entity linkers on QA data with Wikidata: https://public.ukp.informatik.tu-darmstadt.de/starsem18-entity-linking/EntityLinkingForQADatasets.zip.
- Read our [paper](https://www.aclweb.org/anthology/S18-2007) to learn the evaluation details.
### Installation:
1. Download and install Anaconda (https://www.anaconda.com/)
2. Create an anaconda environment: `conda create -n qa-env python=3.6` and activate it `conda activate qa-env`
3. Install PyTorch 0.3.1: `conda install pytorch=0.3.1 -c pytorch` (with CUDA if you want to use GPU)
4. Install the rest of the dependencies from the `requirements.txt` with: `conda install --yes --file requirements.txt`.
5. Install `pycorenlp, SPARQLWrapper` with `pip install pycorenlp SPARQLWrapper`.
6. Create a local copy of the Wikidata knowledge base in RDF format. We use the [Virtuoso Opensource Server](https://github.com/openlink/virtuoso-opensource) and wrote a guide on the installation [here](https://github.com/UKPLab/coling2018-graph-neural-networks-question-answering/blob/master/WikidataHowTo.md) (in a different repository). This step takes a lot of time!. Right now this is the only way to run the models at test time, we are working to providing a smaller Wikidata dump just for the training/evaluation on the data sets.
### Using the pre-trained model:
Follow the steps to use this project as an external entity-linking tool. `FeatureModel_Baseline` is a part of the repository, you can download the `VCG` model [here](https://public.ukp.informatik.tu-darmstadt.de/starsem18-entity-linking/VectorModel_VCG.zip).
For the VCG model you also need KB embeddings produced by [Fast-TransX](https://github.com/thunlp/Fast-TransX). Download [here](https://public.ukp.informatik.tu-darmstadt.de/starsem18-entity-linking/Wikidata_TransE_50.zip).
1. Clone/Download the project
2. Take a pre-trained model and extract it into a `trainedmodels/` folder in the main directory of the project
3. Download the [GloVe embeddings, glove.6B.zip](https://nlp.stanford.edu/projects/glove/)
and put them into the folder `resources/glove/` in the main directory of the project
4. Modify the path to the word embeddings in the configuration file for the model: `trainedmodels/FeatureModel_Baseline.param`
5. Make sure that the project folder in your Python PATH
6. Use the following code to initialize an entity linker and apply it on new data:
```python
from entitylinking import core
entitylinker = core.MLLinker(path_to_model="trainedmodels/FeatureModel_Baseline.torchweights")
output = entitylinker.link_entities_in_raw_input("Barack Obama is a president.")
print(output.entities)
```
### Running the experiments from the paper:
1. Download and install the pre-trained models as described above.
2. Download the pre-processed data sets for evaluating entity linkers on QA data [here](https://public.ukp.informatik.tu-darmstadt.de/starsem18-entity-linking/EntityLinkingForQADatasets.zip).
3. If you use the given config files and the precomputed candidates for the train and the test set, you should not need the Wikidata local endpoint.
2. See `run_experiments.sh`
### License:
* Apache License Version 2.0