https://github.com/pliang279/sent_debias
[ACL 2020] Towards Debiasing Sentence Representations
https://github.com/pliang279/sent_debias
fairness-ai machine-learning natural-language-processing representation-learning
Last synced: 5 months ago
JSON representation
[ACL 2020] Towards Debiasing Sentence Representations
- Host: GitHub
- URL: https://github.com/pliang279/sent_debias
- Owner: pliang279
- License: mit
- Created: 2020-01-27T22:11:58.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-11-21T05:02:08.000Z (almost 3 years ago)
- Last Synced: 2025-04-09T03:04:36.158Z (6 months ago)
- Topics: fairness-ai, machine-learning, natural-language-processing, representation-learning
- Language: Python
- Homepage:
- Size: 47.9 MB
- Stars: 64
- Watchers: 3
- Forks: 19
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Towards Debiasing Sentence Representations
> Pytorch implementation for debiasing sentence representations.
This implementation contains code for removing bias from BERT representations and evaluating bias level in BERT representations.
Correspondence to:
- Paul Liang (pliang@cs.cmu.edu)
- Irene Li (mengzeli@cs.cmu.edu)## Paper
[**Towards Debiasing Sentence Representations**](https://www.aclweb.org/anthology/2020.acl-main.488/)
[Paul Pu Liang](http://www.cs.cmu.edu/~pliang/), [Irene Li](https://www.linkedin.com/in/mengze-irene-li-114592130/), [Emily Zheng](https://www.linkedin.com/in/emily-zheng-348190128/), [Yao Chong Lim](https://scholar.google.com/citations?user=R-upoxQAAAAJ&hl=en), [Ruslan Salakhutdinov](https://www.cs.cmu.edu/~rsalakhu/), and [Louis-Philippe Morency](https://www.cs.cmu.edu/~morency/)
ACL 2020If you find this repository useful, please cite our paper:
```
@inproceedings{liang-etal-2020-towards,
title = "Towards Debiasing Sentence Representations",
author = "Liang, Paul Pu and
Li, Irene Mengze and
Zheng, Emily and
Lim, Yao Chong and
Salakhutdinov, Ruslan and
Morency, Louis-Philippe",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.488",
doi = "10.18653/v1/2020.acl-main.488",
pages = "5502--5515",
}
```## Installation
First check that the requirements are satisfied:
Python 3.6
torch 1.2.0
huggingface transformers
numpy 1.18.1
sklearn 0.20.0
matplotlib 3.1.2
gensim 3.8.0
tqdm 4.45.0
regex 2.5.77
pattern3The next step is to clone the repository:
```bash
git clone https://github.com/pliang279/sent_debias.git
```To install bert models, go to `debias-BERT/`, run ```pip install .```
## Data
Download the [GLUE data](https://gluebenchmark.com/tasks) by running this [script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e):
```python
python download_glue_data.py --data_dir glue_data --tasks SST,QNLI,CoLA
```
Unpack it to some directory `$GLUE_DIR`.## Precomputed models and embeddings (optional)
1. Models
* Download https://drive.google.com/file/d/1cAN49-HDHFdNP1GJZn83s2-mEGCeZWjh/view?usp=sharing to `debias-BERT/experiments`.
*
```
tar -xvf acl2020-results.tar.gz
```2. Embeddings
* Download https://drive.google.com/file/d/1ubKn8SCjwnp9pYjQa9SmKWxFojX9a6Bz/view?usp=sharing to `debias-BERT/experiments`.
*
```
tar -xvf saved_embs.tar.gz
```## Usage
If you choose to use precomputed models and embeddings, skip to step B. Otherwise, follow step A and B sequentially.
### A. Fine-tune BERT
1. Go to `debias-BERT/experiments`.
2. Run `export TASK_NAME=SST-2` (task can be one of SST-2, CoLA, and QNLI).
4. Fine tune BERT on `$TASK_NAME`.
* With debiasing
```
python run_classifier.py \
--data_dir $GLUE_DIR/$TASK_NAME/ \
--task_name $TASK_NAME \
--output_dir path/to/results_directory \
--do_train \
--do_eval \
--do_lower_case \
--debias \
--normalize \
--tune_bert
```
* Without debiasing
```
python run_classifier.py \
--data_dir $GLUE_DIR/$TASK_NAME/ \
--task_name $TASK_NAME \
--output_dir path/to/results_directory \
--do_train \
--do_eval \
--do_lower_case \
--normalize \
--tune_bert
```
The fine-tuned model and dev set evaluation results will be stored under the specified `output_dir`.### B. Evaluate bias in BERT representations
1. Go to `debias-BERT/experiments`.
2. Run ` export TASK_NAME=SST-2` (task can be one of SST-2, CoLA, and QNLI).
3. Evaluate fine-tuned BERT on bias level.
* Evaluate debiased fine-tuned BERT.
```
python eval_bias.py \
--debias \
--model_path path/to/model \
--model $TASK_NAME \
--results_dir path/to/results_directory \
--output_name debiased
```
If using precomputed models, set `model_path` to `acl2020-results/$TASK_NAME/debiased`.
* Evaluate biased fine-tuned BERT.
```
python eval_bias.py \
--model_path path/to/model \
--model $TASK_NAME \
--results_dir path/to/results_directory \
--output_name biased
```
If using precomputed models, set `model_path` to `acl2020-results/$TASK_NAME/biased`.The evaluation results will be stored in the file `results_dir/output_name`.
Note: The argument `model_path` should be specified as the `output_dir` corresponding to the fine-tuned model you want to evaluate. Specifically, `model_path` should be a directory containing the following files: `config.json`, `pytorch_model.bin` and `vocab.txt`.
4. Evaluate pretrained BERT on bias level.
* Evaluate debiased pretrained BERT.
```
python eval_bias.py \
--debias \
--model pretrained \
--results_dir path/to/results_directory \
--output_name debiased
```
* Evaluate biased pretrained BERT.
```
python eval_bias.py \
--model pretrained \
--results_dir path/to/results_directory \
--output_name biased
```
Again, the bias evaluation results will be stored in the file `results_dir/output_name`.