https://github.com/dunzeng/MORE

Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignment
https://github.com/dunzeng/MORE

Last synced: 8 months ago
JSON representation

Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignment

Host: GitHub
URL: https://github.com/dunzeng/MORE
Owner: dunzeng
Created: 2024-02-18T07:13:50.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-08-06T15:24:19.000Z (over 1 year ago)
Last Synced: 2025-04-14T11:07:42.004Z (8 months ago)
Language: Python
Homepage:
Size: 344 KB
Stars: 16
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-RLHF - official

README

          # MORE (Multi-objective Reward Modeling)

[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](https://github.com/Linear95/DSP/blob/main/LICENSE)

[![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg)](https://github.com/Linear95/DSP/blob/main/DATA_LICENSE)

[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/release/python-380/)



  



Code for paper "[On Diversified Preferences of Large Language Model Alignment](https://arxiv.org/abs/2312.07401)".

## Preparation

### 1. Install dependencies: 

```pip install -r requirement.txt```

### 2 Download data:

Please download [data.zip](https://drive.google.com/drive/folders/10Mja3DRiXrFrp9Zg_arOR3wMOD6p8GsG?usp=drive_link) and unzip it to replace `data`.

## Run

Our experiments consist of 5 main steps:

### 1. Reward model training

```

REPO_DIR=./

DATA_DIR=./data

TRAIN_DATA_LIST="${DATA_DIR}/helpful.train.json \

                 ${DATA_DIR}/harmless.train.json \

                 ${DATA_DIR}/oaast1.train.json \

                 ${DATA_DIR}/webgpt.train.json \

                 ${DATA_DIR}/summ.train.json "

TEST_DATA_LIST="${DATA_DIR}/helpful.test.json \

                 ${DATA_DIR}/harmless.test.json \

                 ${DATA_DIR}/oaast1.test.json \

                 ${DATA_DIR}/webgpt.test.json \

                 ${DATA_DIR}/summ.test.json "

OUTPUT_DIR=""

deepspeed --num_gpus 8 train.py \

    --do_train True \

    --report_to tensorboard \

    --eval_at_start False \

    --model_name_or_path  \

    --train_data_path ${TRAIN_DATA_LIST} \

    --eval_data_path ${TEST_DATA_LIST} \

    --remove_unused_columns false \

    --output_dir ${OUTPUT_DIR} \

    --logging_dir ${OUTPUT_DIR} \

    --num_train_epochs 1 \

    --per_device_train_batch_size 1 \

    --per_device_eval_batch_size 8 \

    --gradient_accumulation_steps 2 \

    --evaluation_strategy steps \

    --padding_side right \

    --truncation_side left \

    --pooling_type last \

    --max_length 512 \

    --save_strategy steps \

    --learning_rate 1e-6 \

    --eval_steps 50 \

    --logging_steps 50 \

    --save_steps 1000 \

    --deepspeed ${REPO_DIR}/configs/default_offload_opt_param.json \

    --tf32 false --fp16 false \

    --model_type "" \

    --gradient_checkpointing True \

    --resampling True \

    --resampling_size 40000 \

    --shuffle True \

    --more True \

    --task_num 5 \

    --reweight True \

    --normalize l2 \

    --alpha 0.99 \

    --debug_mode False 

```

Note:

- Set `--more False` and change `per_device_train_batch_size` from 1 to 5 for running the `MultiTask` baseline.

- `--resampling True` will sample data samples from raw datasets. The number of data samples (each preference dataset) will be `resampling_size`.

- `--alpha` is the momentum parameter for stabilizing optimization. Please see `trainer.py` or [paper](https://arxiv.org/pdf/2310.02702.pdf).

### 2. Reject Sampling Inference

```

REPO_DIR=./

accelerate launch --config_file ${REPO_DIR}/configs/inference.yaml ${REPO_DIR}/reward_model_inference.py \

    --model_type pythia \

    --model_name_or_path  \

    --data_path ${REPO_DIR}/data/hh_split_rm_alpaca_v0.sample.json \

    --save_path ${REPO_DIR}/data/inference_data/all_data.json

```

### 3. Reject Sampling Training

```

REPO_DIR=./

DATA_DIR="./data/"

TRAIN_DATA_LIST="${DATA_DIR}/helpful.train.json \

                 ${DATA_DIR}/harmless.train.json"

deepspeed --num_gpus 8 rjs_training.py \

    --model_type llama \

    --do_train True \

    --train_data_path ${TRAIN_DATA_LIST} \

    --model_name_or_path ${REPO_DIR}/lm_base/alpaca-7b \

    --output_dir ${REPO_DIR}/paper_final_checkpoints/alpaca-hh-sft \

    --remove_unused_columns False \

    --max_length 512 \

    --report_to none \

    --per_device_train_batch_size 1 \

    --per_device_eval_batch_size 8 \

    --gradient_accumulation_steps 8 \

    --logging_strategy steps \

    --logging_steps 1 \

    --save_strategy epoch \

    --num_train_epochs 1 \

    --learning_rate 1e-6 \

    --lr_scheduler_type cosine \

    --evaluation_strategy no \

    --warmup_ratio 0.05 \

    --gradient_checkpointing True \

    --deepspeed ${REPO_DIR}/configs/default_offload_opt_param.json

```

### 4. Language Model Inference

```

REPO_DIR=./

MODEL_PATH=

accelerate launch --config_file configs/inference_fp16.yaml llm_inferencing.py \

    --model_type llama \

    --model_name_or_path ${MODEL_PATH} \

    --data_path ${REPO_DIR}/data/ \ 

    --save_path ${REPO_DIR}/data/gpt-eval-data/ \

    --data_type helpful_and_harmless \

    --max_length 512 \

    --chunk_size 64 \

    --sample_num 1

```

### 5. GPT Evaluation

```

REPO_DIR=./

TYPE=

DATA_PATH_A=${REPO_DIR}/data/gpt-eval-data/more_alpaca_${TYPE}.jsonl

DATA_PATH_B=${REPO_DIR}/data/gpt-eval-data/multitask_alpaca_${TYPE}.jsonl

SAVE_PATH=${REPO_DIR}/data/gpt-eval-data/win-rate/multitask-more-${TYPE}.jsonl

python evaluate.py \

    --data_path_A ${DATA_PATH_A} \

    --data_path_B ${DATA_PATH_B} \

    --save_path ${SAVE_PATH} \

    --task_type win_rate \

    --prompt_type ${TYPE} \

    --model_A_name MORE \

    --model_B_name Multitask 

```

## Acknowledgement

Some codes of this repo are modified from: [DSP](https://github.com/Linear95/DSP) and [llm_codebase](https://github.com/underwoodnoble/llm_codebase). 

## Citation

Please cite our paper if you found the code useful.

```

@misc{zeng2024diversified,

      title={On Diversified Preferences of Large Language Model Alignment}, 

      author={Dun Zeng and Yong Dai and Pengyu Cheng and Longyue Wang and Tianhao Hu and Wanshun Chen and Nan Du and Zenglin Xu},

      year={2024},

      eprint={2312.07401},

      archivePrefix={arXiv},

      primaryClass={cs.AI}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dunzeng/MORE

Awesome Lists containing this project

README