https://github.com/talschuster/tokenmasker
Masking tokens to modify the predictions of a pretrained sentence classifier
https://github.com/talschuster/tokenmasker
fact-checking masker nlp rational
Last synced: about 1 month ago
JSON representation
Masking tokens to modify the predictions of a pretrained sentence classifier
- Host: GitHub
- URL: https://github.com/talschuster/tokenmasker
- Owner: TalSchuster
- License: mit
- Created: 2019-11-12T21:42:12.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-02-04T18:40:25.000Z (over 5 years ago)
- Last Synced: 2025-03-29T08:51:09.118Z (about 2 months ago)
- Topics: fact-checking, masker, nlp, rational
- Language: Python
- Size: 323 KB
- Stars: 16
- Watchers: 2
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# TokenMasker
This repository contains the code for the masker module from the paper [Automatic Fact-guided Sentence Modification](https://arxiv.org/pdf/1909.13838.pdf) (AAAI 2020)The multiple-encoder pointer-generator is available [here](https://github.com/darsh10/split_encoder_pointer_summarizer).
## Description
The goal of the masker is to find the minimal group of tokens can be removed from a sentence in order to modify the relation of it with another setentence. For example, given a pair of claim and evidence sentences, it finds the words to delete from the evidence that will make it neutral with respect to the claim. The neutrality is determined by a pretrained classifier.For example ($ symbols a masked token):
* Claim: *Eddie Vedder sings.*
* Evidence: *He is known for his powerful baritone vocals.*
* Model's output: *He is known for $ powerful $ $.*Illustration of the model:

## Setup
You'll need the allennlp repo (version 0.8.3)
```
pip install -r requirements.txt
```## Training
### Neutrality classifier
Note - To train a masker with our neutrality pretrained classifier, skip to the next step (the config file has the path to our trained model).
```
allennlp train allen_configs/esim_fever_wmask.jsonnnet -s trained_neutrality_classsifier --include-package masker_allen_pkg
```### Masker
```
allennlp train allen_configs/mask_generator.jsonnet -s trained_mask_generator --include-package masker_allen_pkg
```## Extracting masks
### Trained model
To get the trained masked model and preprocessed FEVER training data:
```
wget https://www.dropbox.com/s/do5jptwmgroencn/model.tar.gz
wget https://www.dropbox.com/s/o53i6urucny7q03/fever.train_no_nei.tokenized.jsonl
```### Command
To create masks for the data (add -c to use gpu):
```
python model_predictions.py -f model.tar.gz \
-i fever.train_no_nei.tokenized.jsonl \
-out predictions/fever_train_no_nei.jsonl
```# Citation
If you find this repository helpful, please cite our paper:
```
@inproceedings{shah2020automatic,
title={Automatic Fact-guided Sentence Modification},
author={Darsh J Shah and Tal Schuster and Regina Barzilay},
booktitle={Association for the Advancement of Artificial Intelligence ({AAAI})},
year={2020},
url={https://arxiv.org/pdf/1909.13838.pdf}
}
```