Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/thakur-nandan/poison-texts
CS 886 Project on Adversarial Attacks on NLP models
https://github.com/thakur-nandan/poison-texts
Last synced: 22 days ago
JSON representation
CS 886 Project on Adversarial Attacks on NLP models
- Host: GitHub
- URL: https://github.com/thakur-nandan/poison-texts
- Owner: thakur-nandan
- License: apache-2.0
- Created: 2022-07-08T14:11:08.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-07-25T05:06:09.000Z (over 2 years ago)
- Last Synced: 2024-07-20T03:39:52.132Z (4 months ago)
- Language: Python
- Size: 36 MB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# poison-texts
CS 886 Project on Adversarial Attacks on NLP Models: Sentiment analysis by BERT in PyTorch
The starter code is retrieved from [this repository](https://github.com/vonsovsky/bert-sentiment).BERT is state-of-the-art natural language processing model from Google. Using its latent space, it can be repurpossed for various NLP tasks, such as sentiment analysis.
This simple wrapper based on [Transformers](https://github.com/huggingface/transformers) (for managing BERT model) and PyTorch achieves 92% accuracy on guessing positivity / negativity on IMDB reviews.
We will extend the model to two other datasets: Yelp (5 classes) and Amazon Review (5 classes).
# How to use
## Prepare data
### IMDB
First, you need to prepare IMDB data which are publicly available. Format used here is one review per line, with first 12500 lines being positive, followed by 12500 negative lines. Or you can simply download dataset on my Google Drive [here](https://drive.google.com/drive/folders/1FiRODwhfJt6MpCqdfM7GgHwHqQ9VXFSJ?usp=sharing). Default folder read by script is `data/`.### Yelp
Yelp CSV files can be downloaded [here](https://s3.amazonaws.com/fast-ai-nlp/yelp_review_full_csv.tgz).### Amazon Review
Amazon Review CSV files can be downloaded [here](https://drive.google.com/uc?id=0Bz8a_Dbh9QhbZVhsUnRWRDhETzA).## Train weights
Training with default parameters on IMDB can be performed simply by.
`python script.py --train --dataset imdb`
Optionally, you can change output dir for weights.
To train using a pre-trained model:
`python script.py --train --use_pretrained --dataset imdb`
## Evaluate weights
You can find out how great you are (until your grandma gets her hands on BERT as well) simply by running
`python script.py --evaluate --dataset imdb`
Of course, you need to train your data first or get them from my drive.
## Predict text
`python script.py --predict "It was truly amazing experience." --dataset imdb`
or
`python script.py --predict "It was so terrible and disgusting as coffee topped with ketchup." --dataset imdb`