https://github.com/ruiyang-061x/vl-uncertainty
πOfficial code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".
https://github.com/ruiyang-061x/vl-uncertainty
hallucination hallucination-detection hallucination-evaluation large-vision-language-model multi-modal multi-modal-large-language-model uncertainty uncertainty-analysis uncertainty-estimation uncertainty-quantification vision-language vision-language-model
Last synced: 4 months ago
JSON representation
πOfficial code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".
- Host: GitHub
- URL: https://github.com/ruiyang-061x/vl-uncertainty
- Owner: Ruiyang-061X
- Created: 2024-11-07T08:28:25.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-03-18T13:11:12.000Z (7 months ago)
- Last Synced: 2025-04-10T00:06:30.490Z (6 months ago)
- Topics: hallucination, hallucination-detection, hallucination-evaluation, large-vision-language-model, multi-modal, multi-modal-large-language-model, uncertainty, uncertainty-analysis, uncertainty-estimation, uncertainty-quantification, vision-language, vision-language-model
- Language: Python
- Homepage: https://vl-uncertainty.github.io/
- Size: 7.12 MB
- Stars: 31
- Watchers: 2
- Forks: 2
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
![]()
# π VL-Uncertainty
[Ruiyang Zhang](https://ruiyang-061x.github.io/), [Hu Zhang](https://huzhangcs.github.io/), [Zhedong Zheng*](https://www.zdzheng.xyz/)
**[Website](https://vl-uncertainty.github.io/)** | **[Paper](https://arxiv.org/abs/2411.11919)** | **[Code](https://github.com/Ruiyang-061X/VL-Uncertainty)**
## π₯ News
- 2025.3.16: β¨ Welcome to check out our newest work: [Uncertainty-o](https://github.com/Ruiyang-061X/Uncertainty-o), unveiling uncertainty in Large Multimodal Models (LMMs) in a model-agnostic manner, supporting both Large Comprehension Models and Large Generation Models.
- 2024.12.19: π£ Source code of [VL-Uncertainty](https://arxiv.org/abs/2411.11919) is released!## β‘ Overview

## π οΈ Install
- Create conda environment.
```
conda create -n VL-Uncertainty python=3.11;conda activate VL-Uncertainty;
```- Install denpendency.
```
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121;pip install transformers datasets flash-attn accelerate timm numpy sentencepiece protobuf qwen_vl_utils;
```(Tested on NVIDIA H100 PCIe-80G, NVIDIA A100-PCIE-40GB, and A6000-48G)
## π Quick Start
- Run our demo code.
```
python demo.py;
```- This should produce the results below. **VL-Uncertainty** can successfully estimate high uncertainty for wrong LVLM answer and thereby detect hallucination!
```
--------------------------------------------------
- Demo image: .asset/img/titanic.png
- Question: What is the name of this movie?
- GT answer: Titanic.
--------------------------------------------------
- LVLM answer: The movie in the image is "Coco."
- LVLM answer accuracy: Wrong
--------------------------------------------------
- Estimated uncertianty: 2.321928094887362
- Uncertianty threshold: 1.0
--------------------------------------------------
- Hallucination prediction: Is hallucination
- Hallucination detection: Success!
--------------------------------------------------
```## π Run
- For MM-Vet (Free-form benchmark)
```
bash run/run_MMVet.sh;
```- For LLaVABench (Free-form benchmark)
```
bash run/run_LLaVABench.sh;
```- For MMMU (Mutli-choice benchmark)
```
bash run/run_MMMU.sh;
```- For ScienceQA (Mutli-choice benchmark)
```
bash run/run_ScienceQA.sh;
```## π Examples
- VL-Uncertainty successfully detects LVLM hallucination:

- VL-Uncertainty can also assign low uncertainty for correct answer and identify it as non-hallucinatory:

- VL-Uncertainty effectively generalizes to physical-world scenario. (The following picture is my laptop captured by iPhone)

## β¨οΈ Code Structure
- Code strucuture of this repostory is as follow:
```
βββ VL-Uncertainty/
β βββ .asset/
β β βββ img/
β β β βββ logo.png
β β β βββ titanic.png # For demo
β βββ benchmark/
β β βββ LLaVABench.py # Free-form benchmark
β β βββ MMMU.py # Multi-choice benchmark
β β βββ MMVet.py # Free-form benchmark
β β βββ ScienceQA.py # Multi-choice benchmark
β βββ llm/
β β βββ Qwen.py # LLM class
β βββ lvlm/
β β βββ InternVL.py # Support 26B, 8B, and 1B
β β βββ LLaVA.py # Support 13B, 7B
β β βββ LLaVANeXT.py # Support 13B, 7B
β β βββ Qwen2VL.py # Support 72B, 7B, 2B
β βββ run/
β β βββ run_LLaVABench.sh # Benchmark VL-Uncertainty on LLaVABench
β β βββ run_MMMU.sh # Benchmark VL-Uncertainty on MMMU
β β βββ run_MMVet.sh # Benchmark VL-Uncertainty on MMVet
β β βββ run_ScienceQA.sh # Benchmark VL-Uncertainty on ScienceQA
β βββ util/
β β βββ misc.py # Helper function
β β βββ textual_perturbation.py # Various textural perturbation
β β βββ visual_perturbation.py # Various visual perturbation
β βββ .gitignore
β βββ README.md
β βββ VL-Uncertainty.py # Include semantic-equvialent perturbation, uncertainty estimation, and hallucination detection
β βββ demo.py # Quick start demo
```## β¨ Acknowledgement
- [LLaVA](https://github.com/haotian-liu/LLaVA), [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT), [InternVL](https://github.com/OpenGVLab/InternVL), [Qwen2-VL](https://github.com/QwenLM/Qwen2-VL): Thanks a lot for those foundamental efforts!
- [semantic_uncertainty](https://github.com/jlko/semantic_uncertainty): We are inspired a lot by this work!## π Citation
If you find our work useful for your research and application, please cite using this BibTeX:
```bibtex
@article{zhang2024vl,
title={VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation},
author={Zhang, Ruiyang and Zhang, Hu and Zheng, Zhedong},
journal={arXiv preprint arXiv:2411.11919},
year={2024}
}
```