Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/UKPLab/emnlp2017-relation-extraction
Context-Aware Representations for Knowledge Base Relation Extraction
https://github.com/UKPLab/emnlp2017-relation-extraction
Last synced: 3 months ago
JSON representation
Context-Aware Representations for Knowledge Base Relation Extraction
- Host: GitHub
- URL: https://github.com/UKPLab/emnlp2017-relation-extraction
- Owner: UKPLab
- License: apache-2.0
- Created: 2017-08-24T14:39:29.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2022-11-21T22:17:56.000Z (almost 2 years ago)
- Last Synced: 2024-07-17T01:56:42.889Z (4 months ago)
- Language: Python
- Homepage:
- Size: 557 KB
- Stars: 288
- Watchers: 39
- Forks: 71
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
- awesome-digitization - Context-Aware Representations for Knowledge Base Relation Extraction (UKP Lab)
README
# Context-Aware Representations for Knowledge Base Relation Extraction
## Relation extraction on an open-domain knowledge base
Accompanying repository for our **EMNLP 2017 paper** ([full paper](http://aclweb.org/anthology/D17-1188)). It contains the code to replicate the experiments and the pre-trained models for sentence-level relation extraction.
See [below](#ukp-labs-work-on-knowledge-bases) for links to other work on knowledge bases, question answering and graph neural networks.> This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
Please use the following citation:```
@inproceedings{TUD-CS-2017-0119,
title = {{Context-Aware Representations for Knowledge Base Relation Extraction}},
author = {Sorokin, Daniil and Gurevych, Iryna},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
pages = {1784-1789},
year = {2017},
location = {Copenhagen, Denmark},
publisher = {Association for Computational Linguistics},
doi = {10.18653/v1/D17-1188}
}
```### Paper abstract:
> We demonstrate that for sentence-level relation extraction it is beneficial to consider other relations in the sentential context while predicting the target relation. Our architecture uses an LSTM-based encoder to jointly learn representations for all relations in a single sentence.
We combine the context representations with an attention mechanism to make the final prediction.
> We use the Wikidata knowledge base to construct a dataset of multiple relations per sentence and to evaluate our approach. Compared to a baseline system, our method results in an average error reduction of 24\% on a held-out set of relations.Please, refer to the paper for more details.
The dataset described in the paper can be found here:
* https://www.informatik.tu-darmstadt.de/ukp/research_6/data/lexical_resources/wikipedia_wikidata_relations/
### Contacts:
If you have any questions regarding the code, please, don't hesitate to contact the authors or best report an issue here.
* Daniil Sorokin, [personal page](https://daniilsorokin.github.io)
* https://www.informatik.tu-darmstadt.de/ukp/
* https://www.tu-darmstadt.de### Demo:
You can try out the relation extraction model on single sentences in our demo:
http://semanticparsing.ukp.informatik.tu-darmstadt.de:5000/relation-extraction/
### UKP Lab's work on knowledge bases:
If you came here looking for our other work on linking text to Wikidata you can also find useful the following links* Wikidata Entity Linking: https://github.com/UKPLab/starsem2018-entity-linking
* Graph Neural Networks for Knowledge Base Question Answering: https://github.com/UKPLab/coling2018-graph-neural-networks-question-answering
* Question Answering Demo UI: https://github.com/UKPLab/emnlp2018-question-answering-interface### Wikipedia-Wikidata sentence-level relation data set
* Download the data set from the paper [here](https://www.informatik.tu-darmstadt.de/ukp/research_6/data/lexical_resources/wikipedia_wikidata_relations/). See the data set ReadMe for more information on the format and see the [paper](http://aclweb.org/anthology/D17-1188) on data set construction.
### Project structure:
```
relation_extraction/
├── eval.py
├── model-train-and-test.py
├── notebooks
├── optimization_space.py
├── core
│ ├── parser.py
│ ├── embeddings.py
│ ├── entity_extraction.py
│ └── keras_models.py
├── relextserver
│ └── server.py
├── graph
│ ├── graph_utils.py
│ ├── io.py
│ └── vis_utils.py
├── stanford_tag_dataset.py
└── evaluation
└── metrics.py
resources/
├── properties-with-labels.txt
└── property_blacklist.txt
```
FileDescription
relation_extraction/Main Python module
relation_extraction/coreModels for joint relation extraction
relation_extraction/relextserverThe code for the web demo.
relation_extraction/graphIO and processing for relation graphs
relation_extraction/evaluationEvaluation metrics
resources/Necessary resources
data/curves/The precision-recall curves for each model on the held out data
### Setup:
1. We recommend that you setup a new pip environment first: http://docs.python-guide.org/en/latest/dev/virtualenvs/
2. Check out the repository and run:
```
pip3 install -r requirements.txt
```3. Set the Keras (deep learning library) backend to TensorFlow with the following command:
```
export KERAS_BACKEND=tensorflow
```
You can also permanently change Keras backend (read more: https://keras.io/backend/).
Note that in order to reproduce the experiments in the paper you have to use Theano as a backend instead.4. Download the [data](https://www.informatik.tu-darmstadt.de/ukp/research_6/data/lexical_resources/wikipedia_wikidata_relations/), if you want to replicate the experiments from the paper.
Extract the archive inside `emnlp2017-relation-extraction/data/wikipedia-wikidata/`. The data was preprocessed using Stanford Core NLP 3.7.0 models. See `stanford_tag_dataset.py` for more information.5. Download the [GloVe embeddings, glove.6B.zip](https://nlp.stanford.edu/projects/glove/)
and put them into the folder `emnlp2017-relation-extraction/resources/glove/`. You can change the path to word embeddings in the `model_params.json` file if needed.### Pre-trained models:
* You can download the models that were used in the experiments [here](https://fileserver.ukp.informatik.tu-darmstadt.de/emnlp2017-relation-extraction/EMNLP2017_DS_IG_relation_extraction_trained_models.zip)
* See `Using pre-trained models.ipynb` for a detailed example on how to use the pre-trained models in your code#### Reproducing the experiments from the paper
To reproduce the experiments please refer to the version of the code that was published with the paper:
[tag emnlp17](https://github.com/UKPLab/emnlp2017-relation-extraction/tree/emnlp17)In any other case, we recommend using the most recent version.
1. Complete the setup above
2. Run `python model_train.py` in `emnlp2017-relation-extraction/relation_extraction/` to see the list of parameters
3. If you put the data into the default folders you can train the `ContextWeighted` model with the following command:
```
python model_train.py model_ContextWeighted train ../data/wikipedia-wikidata/enwiki-20160501/semantic-graphs-filtered-training.02_06.json ../data/wikipedia-wikidata/enwiki-20160501/semantic-graphs-filtered-validation.02_06.json
```4. Run the following command to compute the precision-recall curves:
```
python precision_recall_curves.py model_ContextWeighted ../data/wikipedia-wikidata/enwiki-20160501/semantic-graphs-filtered-held-out.02_06.json
```#### Notes
- The web demo code is provided for information only. It is not meant to be run elsewhere.
#### Requirements:
* Python 3.6
* Keras 2.1.5
* TensorFlow 1.6.0
* See requirements.txt for library requirements.### License:
* Apache License Version 2.0