https://github.com/GasolSun36/MVP
Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning
https://github.com/GasolSun36/MVP
Last synced: 6 months ago
JSON representation
Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning
- Host: GitHub
- URL: https://github.com/GasolSun36/MVP
- Owner: GasolSun36
- Created: 2024-04-12T13:56:59.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-05T03:33:49.000Z (10 months ago)
- Last Synced: 2024-08-05T04:44:50.271Z (10 months ago)
- Language: Python
- Homepage:
- Size: 7.94 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
- awesome-mllm-hallucination - [paper
README
# Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning
This repository is the official implementation of paper [Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning](https://arxiv.org/pdf/2408.17150).
## Pipeline of MVP

## Requirements
To install requirements:
```setup
pip install -r requirements.txt
```### Datasets
Datasets are in `MVP/benchmark`. Before inference, you need to download the images into the `MVP/data` folder.## Image Caption
In MVP framework, we need to caption the image first, and you can use the following command in `caption.sh`:
```bash
python caption/llava_caption.py \
--model-path liuhaotian/llava-v1.5-7b \
--image-folder MVP/data/coco \
--question-file MVP/benchmark/POPE/coco/coco_pope_popular.json \
--answers-file MVP/output/coco_pope_popular_caption_llava_bottom-up.jsonl \
--perspective bottom-up \
--temperature 0.7 \
--top_p 0.95 \
--max_new_tokens 512 \
--num_beams 1 --seed 336
```
This will create a file under the `output` folder that stores all the captions. Of course, you need to execute `(bottom-up, top-down, regular)` separately under the `perspective` parameter.**We have prepared the caption file and can use it directly in the `output` folder.**
### MVP
To employ MVP, you can use the following command in `MVP_llava.sh`:
```bash
#!/bin/bashdeclare -a files=("MVP_llava")
declare -a perspectives=("bottom-up" "top-down" "regular")
declare -a question_files=("coco")
declare -a question_types=("popular")for file in "${files[@]}"; do
for perspective in "${perspectives[@]}"; do
for dataset in "${question_files[@]}"; do
for type in "${question_types[@]}"; do
question_file="MVP/benchmark/POPE/dataset/{dataset}/{dataset}_pope_${type}.json"
output_file="MVP/output/(basename "(basename "file" .py)_{perspective}_{perspective}_{dataset}_${type}_pope.jsonl"
log_file="MVP/logs/(basename "(basename "file" .py)_{perspective}_{perspective}_{dataset}_${type}_pope.log"nohup srun -p -n1 -N1 --gres=gpu:1 --quotatype=reserved python "MVP/$file" \
--model-path liuhaotian/llava-v1.5-7b \
--image-folder "MVP/data/${dataset}" \
--question-file "$question_file" \
--perspective "$perspective" \
--answers-file "$output_file" \
--temperature 0.7 \
--top_p 1.0 --topk 3 \
--max_new_tokens 50 \
--num_beams 1 --seed 336
1>"$log_file" 2>&1 &sleep 3
done
done
done
done
```
After that, you can obtain the result files in the `output` folder.### Important arguments
- `--perspective`: the caption perspective.
- `--topk`: employ topk's reasoning paths.## Evaluation
To evaluate the performance of MVP, you can use the following command in `eval_pope.sh`:
```bash
python eval/eval_pope.py \
--gt_files MVP/benchmark/POPE/coco/coco_pope_popular.json \
--gen_files_bottom_up MVP/output/MVP_llava_bottom-up_coco_popular_pope.jsonl \
--gen_files_top_down MVP/output/MVP_llava_top-down_coco_popular_pope.jsonl \
--gen_files_regular MVP/output/MVP_llava_regular_coco_popular_pope.jsonl \
--a 0.4 --b 0.4 --c 0.2
```### Important arguments
- `--a`: the weight of bottom-up path.
- `--b`: the weight of top-down path.
- `--c`: the weight of regular path.## Experiment Results
MVP's performance on POPE:

MVP's performance on MME:

## Case Study

## How to cite
If you interested or inspired by this work, you can cite us by:
```sh
@misc{qu2024lookcomparedecidealleviating,
title={Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning},
author={Xiaoye Qu and Jiashuo Sun and Wei Wei and Yu Cheng},
year={2024},
eprint={2408.17150},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.17150},
}
```