Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pipilurj/MLLM-protector
The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"
https://github.com/pipilurj/MLLM-protector
Last synced: 10 days ago
JSON representation
The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"
- Host: GitHub
- URL: https://github.com/pipilurj/MLLM-protector
- Owner: pipilurj
- License: apache-2.0
- Created: 2024-01-05T08:43:19.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-04-21T04:08:28.000Z (8 months ago)
- Last Synced: 2024-08-12T08:13:07.284Z (4 months ago)
- Language: Python
- Size: 1.92 MB
- Stars: 29
- Watchers: 1
- Forks: 2
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-LVLM-Attack - Github
- Awesome-MLLM-Safety - Github - protector.svg?style=social&label=Star) (Defense)
README
# MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance
This repository contains the code for the paper titled "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance". [[Link to our paper](https://arxiv.org/abs/2401.02906)]
## Install Packages
```
conda create -n mllm_protector python=3.10 -y
conda activate mllm_protector
pip install -e .
```
## Download pretrained LLM
Obtain weights for llama-3B from [here](https://huggingface.co/openlm-research/open_llama_3b_v2)## Download checkpoint for harm detector and detoxfier
Obtain lora checkpoint for harm detector with open-llama-3b from [here](https://huggingface.co/renjiepi/protector_detector_3b_lora)Obtain lora checkpoint for harm detector with llama2-7b from [here](https://huggingface.co/renjiepi/protector_detector_7b_lora)
Obtain lora checkpoint for detoxifer from [here](https://huggingface.co/renjiepi/mllm_protector_detoxifier)
You may use the harm detector to check the responses generated by the MLLM to verify the harmfulness, which also serves as a proxy for GPT4 API calls.
## Merge Lora
```
python scripts/merge_peft_adapter.py --base_model_name path-to-llama_3b_v2 --adapter_model_name path-to-lora --output_name path-to-merged-model
```
## Download augmented training data
You may obtain the augmented dataset from [here](https://huggingface.co/datasets/renjiepi/harmful_vs_unharmful)## Prepare evaluation data
```
mkdir eval_polite
```
Prepare benchmark data from [MM-SafetyBench](https://github.com/isXinLiu/MM-SafetyBench).Here is the data structure:
```
dataset/coco/
├── gpt4_generated_questions/
├── imgs/
├── processed_questions/
├── coco_task_annotation.json
```
## Train Harm Detector```
bash scripts/train_harm_detector.sh
```## Train Detoxifier
```
bash scripts/train_detoxifier.sh
```## Generate reponses in parallel
```
bash llava/eval/eval_multi_safeguard.sh path-to-llava path-to-result num_gpu temperature path-to-detector path-to-detoxifier
```## Evaluation
We adopt the newly proposed MLLM jailbreak benchmark for evaluation, please follow their [instructions](https://github.com/isXinLiu/MM-SafetyBench) for setting up the evaluation bench. Thanks for the great work!
## Acknowledgement
The project is built on top of the amazing multimodal large language model [LLaVA](https://github.com/haotian-liu/LLaVA).
Thanks for these great work!If you find our work useful for your research or applications, please cite using this BibTeX:
```bibtex
@misc{pi2024mllmprotector,
title={MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance},
author={Renjie Pi and Tianyang Han and Yueqi Xie and Rui Pan and Qing Lian and Hanze Dong and Jipeng Zhang and Tong Zhang},
year={2024},
eprint={2401.02906},
archivePrefix={arXiv},
primaryClass={cs.CR}
}
```