Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yuhaozhang/tacred-relation
PyTorch implementation of the position-aware attention model for relation extraction
https://github.com/yuhaozhang/tacred-relation
information-extraction natural-language-processing nlp relation-extraction
Last synced: 5 days ago
JSON representation
PyTorch implementation of the position-aware attention model for relation extraction
- Host: GitHub
- URL: https://github.com/yuhaozhang/tacred-relation
- Owner: yuhaozhang
- License: other
- Created: 2017-09-05T03:16:21.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2019-02-09T00:00:29.000Z (almost 6 years ago)
- Last Synced: 2023-11-07T14:14:40.346Z (about 1 year ago)
- Topics: information-extraction, natural-language-processing, nlp, relation-extraction
- Language: Python
- Homepage:
- Size: 42 KB
- Stars: 345
- Watchers: 13
- Forks: 97
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Position-aware Attention RNN Model for Relation Extraction
=========================This repo contains the *PyTorch* code for paper [Position-aware Attention and Supervised Data Improve Slot Filling](https://nlp.stanford.edu/pubs/zhang2017tacred.pdf).
**The TACRED dataset**: Details on the TAC Relation Extraction Dataset can be found on [this dataset website](https://nlp.stanford.edu/projects/tacred/).
## Requirements
- Python 3 (tested on 3.6.2)
- PyTorch (tested on 1.0.0)
- unzip, wget (for downloading only)## Preparation
First, download and unzip GloVe vectors from the Stanford website, with:
```
chmod +x download.sh; ./download.sh
```Then prepare vocabulary and initial word vectors with:
```
python prepare_vocab.py dataset/tacred dataset/vocab --glove_dir dataset/glove
```This will write vocabulary and word vectors as a numpy matrix into the dir `dataset/vocab`.
## Training
Train a position-aware attention RNN model with:
```
python train.py --data_dir dataset/tacred --vocab_dir dataset/vocab --id 00 --info "Position-aware attention model"
```Use `--topn N` to finetune the top N word vectors only. The script will do the preprocessing automatically (word dropout, entity masking, etc.).
Train an LSTM model with:
```
python train.py --data_dir dataset/tacred --vocab_dir dataset/vocab --no-attn --id 01 --info "LSTM model"
```Model checkpoints and logs will be saved to `./saved_models/00`.
## Evaluation
Run evaluation on the test set with:
```
python eval.py saved_models/00 --dataset test
```This will use the `best_model.pt` by default. Use `--model checkpoint_epoch_10.pt` to specify a model checkpoint file. Add `--out saved_models/out/test1.pkl` to write model probability output to files (for ensemble, etc.).
## Ensemble
Please see the example script `ensemble.sh`.
## License
All work contained in this package is licensed under the Apache License, Version 2.0. See the included LICENSE file.