https://github.com/eric11eca/common-bench
EPFL Machine Learning course project 2. Associated with NLP Lab. Commonsense reasoning benchmark and probing for large language models.
https://github.com/eric11eca/common-bench
Last synced: over 1 year ago
JSON representation
EPFL Machine Learning course project 2. Associated with NLP Lab. Commonsense reasoning benchmark and probing for large language models.
- Host: GitHub
- URL: https://github.com/eric11eca/common-bench
- Owner: eric11eca
- License: mit
- Created: 2022-12-08T12:55:48.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-12-22T12:48:19.000Z (over 3 years ago)
- Last Synced: 2025-01-22T06:48:20.519Z (over 1 year ago)
- Language: Python
- Size: 24.1 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ML Project 2: Human-centerd Commonsense Benchmark
EPFL Machine Learning course project 2. Associated with NLP Lab. Commonsense reasoning benchmark and probing for large language models.
### Baseline Models
We employed [T5](https://arxiv.org/pdf/1910.10683.pdf) based models.
* [UnifiedQA](https://arxiv.org/abs/2005.00700)
* [Macaw](https://arxiv.org/abs/2109.02593)
* [FLAN](https://ai.googleblog.com/2021/10/introducing-flan-more-generalizable.html)
* [T0++](https://huggingface.co/bigscience/T0pp)
Also, we employed large language models.
* [OPT66B](https://huggingface.co/facebook/opt-66b/tree/main)
* [GPT3](https://openai.com/api/)
### Human-centered Commonsense Benchmark
We employed 5 different commonsense benchmarks from social interaction to ethical judgment that human could face in every real-life.
* [Theory of Mind Task Dataset](https://arxiv.org/abs/1808.09352)
* [Social Interaction QA](https://arxiv.org/abs/1904.09728)
* [Complementary Commonsense](https://arxiv.org/abs/2106.00969)
* [SCRUPLES](https://paperswithcode.com/paper/scruples-a-corpus-of-community-ethical)
* [COmmonsense Dataset Adversarially-authored by Humans](https://arxiv.org/abs/1904.04365)
## Download Datasets from: [https://drive.google.com/drive/folders/1eSjhEyg7w4wZJS39ptEimIi-4H2stT7h?usp=sharing](https://drive.google.com/drive/folders/1eSjhEyg7w4wZJS39ptEimIi-4H2stT7h?usp=sharing)
## Installation
```
pip install -r requirement.txt
```
We tested our python codes on the interactive mode of RunAI @ EPFL cluster. Please look through if you are new user of [RunAI](https://github.com/sori424/runLLM).
#### WANDB dataset/model versioning and loading
This repo is designed to work with wandb for dataset and model versioning, experimental visualization, etc.. Assuming that you have a [**wandb**](https://wandb.ai/home) account you first need to set your *WANDB_API_KEY*
```bash
export WANDB_API_KEY=XXXXXXXXXXXXXXXX
```
In the code above you can then specify: `--wandb_entity`, `--wandb_project` (the target project), `--wandb_name` (name of experiment), `--wandb_data` (for automatic loading of data), `--wandb_model` (for automatic loading of models). In **RunAI** wandb can be used by adding `WANDB_API_KEY` to the `env` variables.
## Quickstart
To run the code, simply execute the main bash script:
```
bash run.sh
```
For running setup, you can change the configurations below.
```
DATASET="socialiqa"
TASK="socialiqa"
MODEL_TYPE="opt" <-- select from ["t5", "opt", "bloom", "gpt"]
MODEL_NAME_OR_PATH="facebook/opt-66b" <-- volume directory with model checkpoints (.bin) or hugginface download ('facebook/opt-66b').
TRAIN_BATCH_SIZE=4 <-- training batch size
PREDICT_BATCH_SIZE=1 <-- prediction batch size
N_GPU=8 <-- number of GPUs to use
```
## In-context Learning
To run the code for vinalla **In-context Learning**, first modify the running command in `run.sh`:
```
accelerate launch main.py \
--do_inference \
--dataset ${DATASET} \
--task ${TASK} \
--model_type ${MODEL_TYPE} \
--model_name_or_path ${MODEL_NAME_OR_PATH} \
--predict_batch_size ${PREDICT_BATCH_SIZE} \
--wandb_name ${MODEL_NAME_OR_PATH}-${DATASET}-icl-4-rand \
--n_gpu ${N_GPU} \
--max_data 0 \
--do_icl \ <-- **Add this flag**
--num_examples 2 <-- **Number of demonstrations used**
```
Then, execute the script. To use examples pre-selected by the KNN method, modify the running command:
```
accelerate launch main.py \
--do_inference \
--dataset ${DATASET} \
--task ${TASK} \
--model_type ${MODEL_TYPE} \
--model_name_or_path ${MODEL_NAME_OR_PATH} \
--predict_batch_size ${PREDICT_BATCH_SIZE} \
--wandb_name ${MODEL_NAME_OR_PATH}-${DATASET}-icl-4-rand \
--n_gpu ${N_GPU} \
--max_data 0 \
--do_icl \
--num_examples 2
--search <-- **Add this flag**
--encoder simcse <-- **Name of the sentence encoder for embedding**
```
Then, execute the script.
## KNN Example Selection
```
python dynamic_icl.py \
--dataset $DATASET_NAME \
--task $TASK_NAME \
--encoder_name simcse \ <-- nli_mean or simcse
--metric cosine \ <-- cosine or euclidean
--num_neighbors 16
```
The output file will be under the name `$DATA_DIR/$DATASET/train_$ENCODER_NAME.json`