https://github.com/thunlp/jointnre

Joint Neural Relation Extraction with Text and KGs
https://github.com/thunlp/jointnre

relation-extraction

Last synced: about 1 year ago
JSON representation

Joint Neural Relation Extraction with Text and KGs

Host: GitHub
URL: https://github.com/thunlp/jointnre
Owner: thunlp
License: mit
Created: 2017-11-20T02:25:47.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2022-11-03T06:52:50.000Z (over 3 years ago)
Last Synced: 2025-03-31T23:51:08.550Z (about 1 year ago)
Topics: relation-extraction
Language: Python
Homepage:
Size: 259 KB
Stars: 187
Watchers: 9
Forks: 35
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # JointNRE

This repository is a subproject of THU-OpenSK, and all subprojects of THU-OpenSK are as follows.

- [OpenNE](https://www.github.com/thunlp/OpenNE)

- [OpenKE](https://www.github.com/thunlp/OpenKE)

  - [KB2E](https://www.github.com/thunlp/KB2E)

  - [TensorFlow-Transx](https://www.github.com/thunlp/TensorFlow-Transx)

  - [Fast-TransX](https://www.github.com/thunlp/Fast-TransX)

- [OpenNRE](https://www.github.com/thunlp/OpenNRE)

  - [JointNRE](https://www.github.com/thunlp/JointNRE)

Codes and datasets for our paper "Neural Knowledge Acquisition via Mutual Attention between Knowledge Graph and Text"

Some Introduction

===

This implementation is a fast and stable version. 

We have made some simplifications for the original model so that to train a joint model just needs around 15min.

We also encapsulate more neural architectures into our framework to encode sentences.

The code and datasets mainly for the task relation extraction.

Data

==========

We provide the datasets used for the task relation extraction.

New York Times Corpus: The data used in relation extraction from text is published by "Modeling relations and their mentions without labeled text". The data should be obtained from [[LDC]](https://catalog.ldc.upenn.edu/LDC2008T19) first.

Datasets are required in the folder data/ in the following format, containing at least 4 files:

+ kg/train.txt: the knowledge graph for training, format (e1, e2, rel).

+ text/relation2id.txt: the relation needed to be predicted for RE, format (rel, id).

+ text/train.txt: the text for training, format (e1, e2, name1, name2, rel, sentence).

+ text/vec.txt: the initial word embeddings.

+ [[Download (Baidu Cloud)]](https://pan.baidu.com/s/1q7rctsoJ_YdlLa55yckwbQ)

+ [[Download (Tsinghua Cloud)]](https://cloud.tsinghua.edu.cn/f/28ba8ac5262349dd9622/?dl=1)

For FB15K-NYT, we directly give the data for our code [[Download (Tsinghua Cloud)]](https://cloud.tsinghua.edu.cn/f/384836aacb1f4aee9fa3/?dl=1), as we cannot release the original data limited by the license of LDC.

Run the experiments

==========

### To run the experiments, unpack the datasets first:

```

unzip origin_data.zip -d origin_data/

mkdir data/

python initial.py

```

### Run the corresponding python scripts to train models:

```

cd jointE

bash make.sh

python train.py

```

### Change the corresponding python code to set hyperparameters:

```

tf.app.flags.DEFINE_float('nbatch_kg',100,'entity numbers used each training time')

tf.app.flags.DEFINE_float('margin',1.0,'entity numbers used each training time')

tf.app.flags.DEFINE_float('learning_rate_kg',0.001,'learning rate for kg')

tf.app.flags.DEFINE_float('ent_total',lib.getEntityTotal(),'total of entities')

tf.app.flags.DEFINE_float('rel_total',lib.getRelationTotal(),'total of relations')

tf.app.flags.DEFINE_float('tri_total',lib.getTripleTotal(),'total of triples')

tf.app.flags.DEFINE_float('katt_flag', 1, '1 for katt, 0 for att')

tf.app.flags.DEFINE_string('model', 'cnn', 'neural models to encode sentences')

tf.app.flags.DEFINE_float('max_length',config['fixlen'],'maximum of number of words in one sentence')

tf.app.flags.DEFINE_float('pos_num', config['maxlen'] * 2 + 1,'number of position embedding vectors')

tf.app.flags.DEFINE_float('num_classes', config['textual_rel_total'],'maximum of relations')

tf.app.flags.DEFINE_float('hidden_size',230,'hidden feature size')

tf.app.flags.DEFINE_float('pos_size',5,'position embedding size')

tf.app.flags.DEFINE_float('max_epoch',20,'maximum of training epochs')

tf.app.flags.DEFINE_float('batch_size',160,'entity numbers used each training time')

tf.app.flags.DEFINE_float('learning_rate',0.5,'learning rate for nn')

tf.app.flags.DEFINE_float('weight_decay',0.00001,'weight_decay')

tf.app.flags.DEFINE_float('keep_prob',0.5,'dropout rate')

tf.app.flags.DEFINE_string('model_dir','./model/','path to store model')

tf.app.flags.DEFINE_string('summary_dir','./summary','path to store summary_dir')

```

### Run the corresponding python scripts to test models:

```

cd jointE

bash make.sh

python test.py

```

Note that the hyperparameters in the train.py and the test.py must be the same.

### Run the corresponding python script to get PR-curve results:

```

cd jointE

python pr_plot.py

```

Citation

===

```

 @inproceedings{han2018neural,

   title={Neural Knowledge Acquisition via Mutual Attention between Knowledge Graph and Text},

   author={Han, Xu and Liu, Zhiyuan and Sun, Maosong},

   booktitle={Proceedings of AAAI},

   year={2018}

 }

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/thunlp/jointnre

Awesome Lists containing this project

README