An open API service indexing awesome lists of open source software.

https://github.com/ukplab/starsem2018-entity-linking

Accompanying code for our *SEM 2018 @ NAACL 2018 paper "Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories"
https://github.com/ukplab/starsem2018-entity-linking

Last synced: 11 months ago
JSON representation

Accompanying code for our *SEM 2018 @ NAACL 2018 paper "Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories"

Awesome Lists containing this project

README

          

# Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories

## Entity linking with the Wikidata knowledge base

This is an accompanying repository for our ***SEM 2018 paper** ([.pdf](https://www.aclweb.org/anthology/S18-2007)).
It contains the code to replicate the experiments and train the models described in the paper.

> This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Please use the following citation:

```
@inproceedings{TUD-CS-2018-01,
title = {Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories},
author = {Sorokin, Daniil and Gurevych, Iryna},
publisher = {Association for Computational Linguistics},
booktitle = {Proceedings of the 7th Joint Conference on Lexical and Computational Semantics (*SEM 2018) },
pages = {to appear},
month = jun,
year = {2018},
location = {New Orleans, LA, U.S.}
}
```

### Paper abstract:
> The first stage of every knowledge base question answering approach is to link entities in the input question.
We investigate entity linking in the context of a question answering task and present a jointly optimized neural architecture for entity mention detection and entity disambiguation that models the surrounding context on different levels of granularity.

> We use the Wikidata knowledge base and available question answering datasets to create benchmarks for entity linking on question answering data.
Our approach outperforms the previous state-of-the-art system on this data, resulting in an average 8% improvement of the final score. We further demonstrate that our model delivers a strong performance across different entity categories.

Please, refer to the paper for more the model description and training details

### Contacts:
If you have any questions regarding the code, please, don't hesitate to contact the authors or report an issue.
* Daniil Sorokin, [personal page](https://daniilsorokin.github.io)
* https://www.ukp.tu-darmstadt.de
* https://www.tu-darmstadt.de

### Project structure:


FileDescription


configs/Configuration files for the experiments


entitylinking/coreMention extraction and candidate retrieval


entitylinking/datasetsDatasets IO


entitylinking/evaluationEvaluation measures and scripts


entitylinking/mlearningModel definition and training scripts


entitylinking/wikidataRetrieving information from Wikidata


resources/Necessary resources


trainedmodels/Trained models

#### Requirements:
* Python 3.6
* PyTorch 0.3.0 - [read here about installation](http://pytorch.org/)
* See `requirements.txt` for the full list of packages

### QA data for benchmarking entity linking systems

- Download the pre-processed data sets (WebQSP and GraphQuestions) for evaluating entity linkers on QA data with Wikidata: https://public.ukp.informatik.tu-darmstadt.de/starsem18-entity-linking/EntityLinkingForQADatasets.zip.
- Read our [paper](https://www.aclweb.org/anthology/S18-2007) to learn the evaluation details.

### Installation:

1. Download and install Anaconda (https://www.anaconda.com/)
2. Create an anaconda environment: `conda create -n qa-env python=3.6` and activate it `conda activate qa-env`
3. Install PyTorch 0.3.1: `conda install pytorch=0.3.1 -c pytorch` (with CUDA if you want to use GPU)
4. Install the rest of the dependencies from the `requirements.txt` with: `conda install --yes --file requirements.txt`.
5. Install `pycorenlp, SPARQLWrapper` with `pip install pycorenlp SPARQLWrapper`.
6. Create a local copy of the Wikidata knowledge base in RDF format. We use the [Virtuoso Opensource Server](https://github.com/openlink/virtuoso-opensource) and wrote a guide on the installation [here](https://github.com/UKPLab/coling2018-graph-neural-networks-question-answering/blob/master/WikidataHowTo.md) (in a different repository). This step takes a lot of time!. Right now this is the only way to run the models at test time, we are working to providing a smaller Wikidata dump just for the training/evaluation on the data sets.

### Using the pre-trained model:

Follow the steps to use this project as an external entity-linking tool. `FeatureModel_Baseline` is a part of the repository, you can download the `VCG` model [here](https://public.ukp.informatik.tu-darmstadt.de/starsem18-entity-linking/VectorModel_VCG.zip).

For the VCG model you also need KB embeddings produced by [Fast-TransX](https://github.com/thunlp/Fast-TransX). Download [here](https://public.ukp.informatik.tu-darmstadt.de/starsem18-entity-linking/Wikidata_TransE_50.zip).

1. Clone/Download the project
2. Take a pre-trained model and extract it into a `trainedmodels/` folder in the main directory of the project
3. Download the [GloVe embeddings, glove.6B.zip](https://nlp.stanford.edu/projects/glove/)
and put them into the folder `resources/glove/` in the main directory of the project
4. Modify the path to the word embeddings in the configuration file for the model: `trainedmodels/FeatureModel_Baseline.param`
5. Make sure that the project folder in your Python PATH
6. Use the following code to initialize an entity linker and apply it on new data:

```python
from entitylinking import core

entitylinker = core.MLLinker(path_to_model="trainedmodels/FeatureModel_Baseline.torchweights")
output = entitylinker.link_entities_in_raw_input("Barack Obama is a president.")
print(output.entities)
```

### Running the experiments from the paper:

1. Download and install the pre-trained models as described above.
2. Download the pre-processed data sets for evaluating entity linkers on QA data [here](https://public.ukp.informatik.tu-darmstadt.de/starsem18-entity-linking/EntityLinkingForQADatasets.zip).
3. If you use the given config files and the precomputed candidates for the train and the test set, you should not need the Wikidata local endpoint.
2. See `run_experiments.sh`

### License:
* Apache License Version 2.0