Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/yuhaozhang/tacred-relation

PyTorch implementation of the position-aware attention model for relation extraction
https://github.com/yuhaozhang/tacred-relation

information-extraction natural-language-processing nlp relation-extraction

Last synced: 2 days ago
JSON representation

PyTorch implementation of the position-aware attention model for relation extraction

Host: GitHub
URL: https://github.com/yuhaozhang/tacred-relation
Owner: yuhaozhang
License: other
Created: 2017-09-05T03:16:21.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2024-04-24T11:03:43.000Z (10 months ago)
Last Synced: 2025-01-28T01:09:23.944Z (10 days ago)
Topics: information-extraction, natural-language-processing, nlp, relation-extraction
Language: Python
Homepage:
Size: 42 KB
Stars: 357
Watchers: 13
Forks: 97
Open Issues: 10
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

Position-aware Attention RNN Model for Relation Extraction
=========================

This repo contains the *PyTorch* code for paper [Position-aware Attention and Supervised Data Improve Slot Filling](https://nlp.stanford.edu/pubs/zhang2017tacred.pdf).

**The TACRED dataset**: Details on the TAC Relation Extraction Dataset can be found on [this dataset website](https://nlp.stanford.edu/projects/tacred/).

## Requirements

- Python 3 (tested on 3.6.2)
- PyTorch (tested on 1.0.0)
- unzip, wget (for downloading only)

## Preparation

First, download and unzip GloVe vectors from the Stanford website, with:
```
chmod +x download.sh; ./download.sh
```

Then prepare vocabulary and initial word vectors with:
```
python prepare_vocab.py dataset/tacred dataset/vocab --glove_dir dataset/glove
```

This will write vocabulary and word vectors as a numpy matrix into the dir `dataset/vocab`.

## Training

Train a position-aware attention RNN model with:
```
python train.py --data_dir dataset/tacred --vocab_dir dataset/vocab --id 00 --info "Position-aware attention model"
```

Use `--topn N` to finetune the top N word vectors only. The script will do the preprocessing automatically (word dropout, entity masking, etc.).

Train an LSTM model with:
```
python train.py --data_dir dataset/tacred --vocab_dir dataset/vocab --no-attn --id 01 --info "LSTM model"
```

Model checkpoints and logs will be saved to `./saved_models/00`.

## Evaluation

Run evaluation on the test set with:
```
python eval.py saved_models/00 --dataset test
```

This will use the `best_model.pt` by default. Use `--model checkpoint_epoch_10.pt` to specify a model checkpoint file. Add `--out saved_models/out/test1.pkl` to write model probability output to files (for ensemble, etc.).

## Ensemble

Please see the example script `ensemble.sh`.

## License

All work contained in this package is licensed under the Apache License, Version 2.0. See the included LICENSE file.