https://github.com/pliang279/sent_debias

[ACL 2020] Towards Debiasing Sentence Representations
https://github.com/pliang279/sent_debias

fairness-ai machine-learning natural-language-processing representation-learning

Last synced: 5 months ago
JSON representation

[ACL 2020] Towards Debiasing Sentence Representations

Host: GitHub
URL: https://github.com/pliang279/sent_debias
Owner: pliang279
License: mit
Created: 2020-01-27T22:11:58.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2022-11-21T05:02:08.000Z (almost 3 years ago)
Last Synced: 2025-04-09T03:04:36.158Z (6 months ago)
Topics: fairness-ai, machine-learning, natural-language-processing, representation-learning
Language: Python
Homepage:
Size: 47.9 MB
Stars: 64
Watchers: 3
Forks: 19
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Towards Debiasing Sentence Representations

> Pytorch implementation for debiasing sentence representations.

This implementation contains code for removing bias from BERT representations and evaluating bias level in BERT representations.

Correspondence to:
- Paul Liang (pliang@cs.cmu.edu)
- Irene Li (mengzeli@cs.cmu.edu)

## Paper

[**Towards Debiasing Sentence Representations**](https://www.aclweb.org/anthology/2020.acl-main.488/)

[Paul Pu Liang](http://www.cs.cmu.edu/~pliang/), [Irene Li](https://www.linkedin.com/in/mengze-irene-li-114592130/), [Emily Zheng](https://www.linkedin.com/in/emily-zheng-348190128/), [Yao Chong Lim](https://scholar.google.com/citations?user=R-upoxQAAAAJ&hl=en), [Ruslan Salakhutdinov](https://www.cs.cmu.edu/~rsalakhu/), and [Louis-Philippe Morency](https://www.cs.cmu.edu/~morency/)

ACL 2020

If you find this repository useful, please cite our paper:
```
@inproceedings{liang-etal-2020-towards,
title = "Towards Debiasing Sentence Representations",
author = "Liang, Paul Pu and
Li, Irene Mengze and
Zheng, Emily and
Lim, Yao Chong and
Salakhutdinov, Ruslan and
Morency, Louis-Philippe",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.488",
doi = "10.18653/v1/2020.acl-main.488",
pages = "5502--5515",
}
```

## Installation

First check that the requirements are satisfied:
Python 3.6
torch 1.2.0
huggingface transformers
numpy 1.18.1
sklearn 0.20.0
matplotlib 3.1.2
gensim 3.8.0
tqdm 4.45.0
regex 2.5.77
pattern3

The next step is to clone the repository:
```bash
git clone https://github.com/pliang279/sent_debias.git
```

To install bert models, go to `debias-BERT/`, run ```pip install .```

## Data
Download the [GLUE data](https://gluebenchmark.com/tasks) by running this [script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e):
```python
python download_glue_data.py --data_dir glue_data --tasks SST,QNLI,CoLA
```
Unpack it to some directory `$GLUE_DIR`.

## Precomputed models and embeddings (optional)
1. Models
* Download https://drive.google.com/file/d/1cAN49-HDHFdNP1GJZn83s2-mEGCeZWjh/view?usp=sharing to `debias-BERT/experiments`.
*
```
tar -xvf acl2020-results.tar.gz
```

2. Embeddings
* Download https://drive.google.com/file/d/1ubKn8SCjwnp9pYjQa9SmKWxFojX9a6Bz/view?usp=sharing to `debias-BERT/experiments`.
*
```
tar -xvf saved_embs.tar.gz
```

## Usage

If you choose to use precomputed models and embeddings, skip to step B. Otherwise, follow step A and B sequentially.

### A. Fine-tune BERT

1. Go to `debias-BERT/experiments`.
2. Run `export TASK_NAME=SST-2` (task can be one of SST-2, CoLA, and QNLI).
4. Fine tune BERT on `$TASK_NAME`.
* With debiasing
```
python run_classifier.py \
--data_dir $GLUE_DIR/$TASK_NAME/ \
--task_name $TASK_NAME \
--output_dir path/to/results_directory \
--do_train \
--do_eval \
--do_lower_case \
--debias \
--normalize \
--tune_bert
```
* Without debiasing
```
python run_classifier.py \
--data_dir $GLUE_DIR/$TASK_NAME/ \
--task_name $TASK_NAME \
--output_dir path/to/results_directory \
--do_train \
--do_eval \
--do_lower_case \
--normalize \
--tune_bert
```
The fine-tuned model and dev set evaluation results will be stored under the specified `output_dir`.

### B. Evaluate bias in BERT representations

1. Go to `debias-BERT/experiments`.
2. Run ` export TASK_NAME=SST-2` (task can be one of SST-2, CoLA, and QNLI).
3. Evaluate fine-tuned BERT on bias level.
* Evaluate debiased fine-tuned BERT.
```
python eval_bias.py \
--debias \
--model_path path/to/model \
--model $TASK_NAME \
--results_dir path/to/results_directory \
--output_name debiased
```
If using precomputed models, set `model_path` to `acl2020-results/$TASK_NAME/debiased`.
* Evaluate biased fine-tuned BERT.
```
python eval_bias.py \
--model_path path/to/model \
--model $TASK_NAME \
--results_dir path/to/results_directory \
--output_name biased
```
If using precomputed models, set `model_path` to `acl2020-results/$TASK_NAME/biased`.

The evaluation results will be stored in the file `results_dir/output_name`.

Note: The argument `model_path` should be specified as the `output_dir` corresponding to the fine-tuned model you want to evaluate. Specifically, `model_path` should be a directory containing the following files: `config.json`, `pytorch_model.bin` and `vocab.txt`.
4. Evaluate pretrained BERT on bias level.
* Evaluate debiased pretrained BERT.
```
python eval_bias.py \
--debias \
--model pretrained \
--results_dir path/to/results_directory \
--output_name debiased
```
* Evaluate biased pretrained BERT.
```
python eval_bias.py \
--model pretrained \
--results_dir path/to/results_directory \
--output_name biased
```
Again, the bias evaluation results will be stored in the file `results_dir/output_name`.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pliang279/sent_debias

Awesome Lists containing this project

README