Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pipilurj/MLLM-protector

The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"
https://github.com/pipilurj/MLLM-protector

Last synced: about 1 month ago
JSON representation

The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"

Awesome Lists containing this project

README

        

# MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance


MLLM-Protector

Generated by DALL·E 3


This repository contains the code for the paper titled "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance". [[Link to our paper](https://arxiv.org/abs/2401.02906)]

## Install Packages

```

conda create -n mllm_protector python=3.10 -y

conda activate mllm_protector

pip install -e .

```

## Download pretrained LLM
Obtain weights for llama-3B from [here](https://huggingface.co/openlm-research/open_llama_3b_v2)

## Download checkpoint for harm detector and detoxfier
Obtain lora checkpoint for harm detector with open-llama-3b from [here](https://huggingface.co/renjiepi/protector_detector_3b_lora)

Obtain lora checkpoint for harm detector with llama2-7b from [here](https://huggingface.co/renjiepi/protector_detector_7b_lora)

Obtain lora checkpoint for detoxifer from [here](https://huggingface.co/renjiepi/mllm_protector_detoxifier)

You may use the harm detector to check the responses generated by the MLLM to verify the harmfulness, which also serves as a proxy for GPT4 API calls.
## Merge Lora
```
python scripts/merge_peft_adapter.py --base_model_name path-to-llama_3b_v2 --adapter_model_name path-to-lora --output_name path-to-merged-model
```
## Download augmented training data
You may obtain the augmented dataset from [here](https://huggingface.co/datasets/renjiepi/harmful_vs_unharmful)

## Prepare evaluation data

```
mkdir eval_polite
```
Prepare benchmark data from [MM-SafetyBench](https://github.com/isXinLiu/MM-SafetyBench).

Here is the data structure:

```
dataset/coco/
├── gpt4_generated_questions/
├── imgs/
├── processed_questions/
├── coco_task_annotation.json
```
## Train Harm Detector

```
bash scripts/train_harm_detector.sh
```

## Train Detoxifier

```
bash scripts/train_detoxifier.sh
```

## Generate reponses in parallel
```
bash llava/eval/eval_multi_safeguard.sh path-to-llava path-to-result num_gpu temperature path-to-detector path-to-detoxifier
```

## Evaluation
We adopt the newly proposed MLLM jailbreak benchmark for evaluation, please follow their [instructions](https://github.com/isXinLiu/MM-SafetyBench) for setting up the evaluation bench. Thanks for the great work!
## Acknowledgement
The project is built on top of the amazing multimodal large language model [LLaVA](https://github.com/haotian-liu/LLaVA).
Thanks for these great work!

If you find our work useful for your research or applications, please cite using this BibTeX:
```bibtex
@misc{pi2024mllmprotector,
title={MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance},
author={Renjie Pi and Tianyang Han and Yueqi Xie and Rui Pan and Qing Lian and Hanze Dong and Jipeng Zhang and Tong Zhang},
year={2024},
eprint={2401.02906},
archivePrefix={arXiv},
primaryClass={cs.CR}
}
```