https://github.com/pipilurj/MLLM-protector

The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"
https://github.com/pipilurj/MLLM-protector

Last synced: 5 months ago
JSON representation

The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"

Host: GitHub
URL: https://github.com/pipilurj/MLLM-protector
Owner: pipilurj
License: apache-2.0
Created: 2024-01-05T08:43:19.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-04-21T04:08:28.000Z (about 1 year ago)
Last Synced: 2024-08-12T08:13:07.284Z (9 months ago)
Language: Python
Size: 1.92 MB
Stars: 29
Watchers: 1
Forks: 2
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

Awesome-LVLM-Attack - Github
Awesome-MLLM-Safety - Github - protector.svg?style=social&label=Star) (Defense)

README

# MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance

Generated by DALL·E 3

This repository contains the code for the paper titled "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance". [[Link to our paper](https://arxiv.org/abs/2401.02906)]

## Install Packages

```

conda create -n mllm_protector python=3.10 -y

conda activate mllm_protector

pip install -e .

```

## Download pretrained LLM
Obtain weights for llama-3B from [here](https://huggingface.co/openlm-research/open_llama_3b_v2)

## Download checkpoint for harm detector and detoxfier
Obtain lora checkpoint for harm detector with open-llama-3b from [here](https://huggingface.co/renjiepi/protector_detector_3b_lora)

Obtain lora checkpoint for harm detector with llama2-7b from [here](https://huggingface.co/renjiepi/protector_detector_7b_lora)

Obtain lora checkpoint for detoxifer from [here](https://huggingface.co/renjiepi/mllm_protector_detoxifier)

You may use the harm detector to check the responses generated by the MLLM to verify the harmfulness, which also serves as a proxy for GPT4 API calls.
## Merge Lora
```
python scripts/merge_peft_adapter.py --base_model_name path-to-llama_3b_v2 --adapter_model_name path-to-lora --output_name path-to-merged-model
```
## Download augmented training data
You may obtain the augmented dataset from [here](https://huggingface.co/datasets/renjiepi/harmful_vs_unharmful)

## Prepare evaluation data

```
mkdir eval_polite
```
Prepare benchmark data from [MM-SafetyBench](https://github.com/isXinLiu/MM-SafetyBench).

Here is the data structure:

```
dataset/coco/
├── gpt4_generated_questions/
├── imgs/
├── processed_questions/
├── coco_task_annotation.json
```
## Train Harm Detector

```
bash scripts/train_harm_detector.sh
```

## Train Detoxifier

```
bash scripts/train_detoxifier.sh
```

## Generate reponses in parallel
```
bash llava/eval/eval_multi_safeguard.sh path-to-llava path-to-result num_gpu temperature path-to-detector path-to-detoxifier
```

## Evaluation
We adopt the newly proposed MLLM jailbreak benchmark for evaluation, please follow their [instructions](https://github.com/isXinLiu/MM-SafetyBench) for setting up the evaluation bench. Thanks for the great work!
## Acknowledgement
The project is built on top of the amazing multimodal large language model [LLaVA](https://github.com/haotian-liu/LLaVA).
Thanks for these great work!

If you find our work useful for your research or applications, please cite using this BibTeX:
```bibtex
@misc{pi2024mllmprotector,
title={MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance},
author={Renjie Pi and Tianyang Han and Yueqi Xie and Rui Pan and Qing Lian and Hanze Dong and Jipeng Zhang and Tong Zhang},
year={2024},
eprint={2401.02906},
archivePrefix={arXiv},
primaryClass={cs.CR}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pipilurj/MLLM-protector

Awesome Lists containing this project

README