An open API service indexing awesome lists of open source software.

https://github.com/ruiyang-061x/vl-uncertainty

πŸ”ŽOfficial code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".
https://github.com/ruiyang-061x/vl-uncertainty

hallucination hallucination-detection hallucination-evaluation large-vision-language-model multi-modal multi-modal-large-language-model uncertainty uncertainty-analysis uncertainty-estimation uncertainty-quantification vision-language vision-language-model

Last synced: 4 months ago
JSON representation

πŸ”ŽOfficial code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".

Awesome Lists containing this project

README

          




# πŸ”Ž VL-Uncertainty

[Ruiyang Zhang](https://ruiyang-061x.github.io/), [Hu Zhang](https://huzhangcs.github.io/), [Zhedong Zheng*](https://www.zdzheng.xyz/)

**[Website](https://vl-uncertainty.github.io/)** | **[Paper](https://arxiv.org/abs/2411.11919)** | **[Code](https://github.com/Ruiyang-061X/VL-Uncertainty)**

## πŸ”₯ News

- 2025.3.16: ✨ Welcome to check out our newest work: [Uncertainty-o](https://github.com/Ruiyang-061X/Uncertainty-o), unveiling uncertainty in Large Multimodal Models (LMMs) in a model-agnostic manner, supporting both Large Comprehension Models and Large Generation Models.
- 2024.12.19: 🐣 Source code of [VL-Uncertainty](https://arxiv.org/abs/2411.11919) is released!

## ⚑ Overview

![](.asset/img/overview.png)

## πŸ› οΈ Install

- Create conda environment.

```
conda create -n VL-Uncertainty python=3.11;

conda activate VL-Uncertainty;
```

- Install denpendency.

```
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121;

pip install transformers datasets flash-attn accelerate timm numpy sentencepiece protobuf qwen_vl_utils;
```

(Tested on NVIDIA H100 PCIe-80G, NVIDIA A100-PCIE-40GB, and A6000-48G)

## πŸš€ Quick Start

- Run our demo code.

```
python demo.py;
```

- This should produce the results below. **VL-Uncertainty** can successfully estimate high uncertainty for wrong LVLM answer and thereby detect hallucination!

```
--------------------------------------------------
- Demo image: .asset/img/titanic.png
- Question: What is the name of this movie?
- GT answer: Titanic.
--------------------------------------------------
- LVLM answer: The movie in the image is "Coco."
- LVLM answer accuracy: Wrong
--------------------------------------------------
- Estimated uncertianty: 2.321928094887362
- Uncertianty threshold: 1.0
--------------------------------------------------
- Hallucination prediction: Is hallucination
- Hallucination detection: Success!
--------------------------------------------------
```

## πŸ“ˆ Run

- For MM-Vet (Free-form benchmark)

```
bash run/run_MMVet.sh;
```

- For LLaVABench (Free-form benchmark)

```
bash run/run_LLaVABench.sh;
```

- For MMMU (Mutli-choice benchmark)

```
bash run/run_MMMU.sh;
```

- For ScienceQA (Mutli-choice benchmark)

```
bash run/run_ScienceQA.sh;
```

## πŸ„ Examples

- VL-Uncertainty successfully detects LVLM hallucination:

![](.asset/img/example_1.png)

- VL-Uncertainty can also assign low uncertainty for correct answer and identify it as non-hallucinatory:

![](.asset/img/example_2.png)

- VL-Uncertainty effectively generalizes to physical-world scenario. (The following picture is my laptop captured by iPhone)

![](.asset/img/example_3.png)

## ⌨️ Code Structure

- Code strucuture of this repostory is as follow:

```
β”œβ”€β”€ VL-Uncertainty/
β”‚ β”œβ”€β”€ .asset/
β”‚ β”‚ β”œβ”€β”€ img/
β”‚ β”‚ β”‚ β”œβ”€β”€ logo.png
β”‚ β”‚ β”‚ β”œβ”€β”€ titanic.png # For demo
β”‚ β”œβ”€β”€ benchmark/
β”‚ β”‚ β”œβ”€β”€ LLaVABench.py # Free-form benchmark
β”‚ β”‚ β”œβ”€β”€ MMMU.py # Multi-choice benchmark
β”‚ β”‚ β”œβ”€β”€ MMVet.py # Free-form benchmark
β”‚ β”‚ β”œβ”€β”€ ScienceQA.py # Multi-choice benchmark
β”‚ β”œβ”€β”€ llm/
β”‚ β”‚ β”œβ”€β”€ Qwen.py # LLM class
β”‚ β”œβ”€β”€ lvlm/
β”‚ β”‚ β”œβ”€β”€ InternVL.py # Support 26B, 8B, and 1B
β”‚ β”‚ β”œβ”€β”€ LLaVA.py # Support 13B, 7B
β”‚ β”‚ β”œβ”€β”€ LLaVANeXT.py # Support 13B, 7B
β”‚ β”‚ β”œβ”€β”€ Qwen2VL.py # Support 72B, 7B, 2B
β”‚ β”œβ”€β”€ run/
β”‚ β”‚ β”œβ”€β”€ run_LLaVABench.sh # Benchmark VL-Uncertainty on LLaVABench
β”‚ β”‚ β”œβ”€β”€ run_MMMU.sh # Benchmark VL-Uncertainty on MMMU
β”‚ β”‚ β”œβ”€β”€ run_MMVet.sh # Benchmark VL-Uncertainty on MMVet
β”‚ β”‚ β”œβ”€β”€ run_ScienceQA.sh # Benchmark VL-Uncertainty on ScienceQA
β”‚ β”œβ”€β”€ util/
β”‚ β”‚ β”œβ”€β”€ misc.py # Helper function
β”‚ β”‚ β”œβ”€β”€ textual_perturbation.py # Various textural perturbation
β”‚ β”‚ β”œβ”€β”€ visual_perturbation.py # Various visual perturbation
β”‚ β”œβ”€β”€ .gitignore
β”‚ β”œβ”€β”€ README.md
β”‚ β”œβ”€β”€ VL-Uncertainty.py # Include semantic-equvialent perturbation, uncertainty estimation, and hallucination detection
β”‚ β”œβ”€β”€ demo.py # Quick start demo
```

## ✨ Acknowledgement

- [LLaVA](https://github.com/haotian-liu/LLaVA), [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT), [InternVL](https://github.com/OpenGVLab/InternVL), [Qwen2-VL](https://github.com/QwenLM/Qwen2-VL): Thanks a lot for those foundamental efforts!
- [semantic_uncertainty](https://github.com/jlko/semantic_uncertainty): We are inspired a lot by this work!

## πŸ“Ž Citation

If you find our work useful for your research and application, please cite using this BibTeX:

```bibtex
@article{zhang2024vl,
title={VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation},
author={Zhang, Ruiyang and Zhang, Hu and Zheng, Zhedong},
journal={arXiv preprint arXiv:2411.11919},
year={2024}
}
```