https://github.com/yuhui-zh15/autoconverter

Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 2025)
https://github.com/yuhui-zh15/autoconverter

computer-vision machine-learning natural-language-processing vision-language vision-language-model

Last synced: 7 months ago
JSON representation

Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 2025)

Host: GitHub
URL: https://github.com/yuhui-zh15/autoconverter
Owner: yuhui-zh15
Created: 2024-09-16T18:29:20.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-03-17T20:34:53.000Z (over 1 year ago)
Last Synced: 2025-03-17T21:36:47.358Z (over 1 year ago)
Topics: computer-vision, machine-learning, natural-language-processing, vision-language, vision-language-model
Language: Python
Homepage: https://yuhui-zh15.github.io/AutoConverter-Website/
Size: 46.7 MB
Stars: 24
Watchers: 1
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Automated Generation of Challenging Multiple Choice Questions for Vision Language Model Evaluation

[![MIT license](https://img.shields.io/badge/License-MIT-blue.svg)](https://lbesson.mit-license.org/)
[![Python](https://img.shields.io/badge/python-3.11-blue.svg)](https://www.python.org/downloads/release/python-311/)
[![Pytorch](https://img.shields.io/badge/Pytorch-2.5-red.svg)](https://pytorch.org/get-started/previous-versions/#v25)
[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)

This repo provides the PyTorch source code of our paper: [Automated Generation of Challenging Multiple Choice Questions for Vision Language Model Evaluation](https://arxiv.org/abs/2501.03225) (**CVPR 2025**). Check out project page [here](https://yuhui-zh15.github.io/AutoConverter-Website/)!

## 🔮 Abstract

The rapid development of vision language models (VLMs) demands rigorous and reliable evaluation. However, current visual question answering (VQA) benchmarks often depend on open-ended questions, making accurate evaluation difficult due to the variability in natural language responses. To address this, we introduce AutoConverter, an agentic framework that automatically converts these open-ended questions into multiple-choice format, enabling objective evaluation while reducing the costly question creation process. Our experiments demonstrate that AutoConverter can generate correct and challenging multiple-choice questions, with VLMs demonstrating consistently similar or lower accuracy on these questions compared to human-created ones. Using AutoConverter, we construct VMCBench, a benchmark created by transforming 20 existing VQA datasets into a unified multiple-choice format, totaling 9,018 questions. We comprehensively evaluate 28 state-of-the-art VLMs on VMCBench, setting a new standard for scalable, consistent, and reproducible VLM evaluation.

**Overview.** *(Left)* We analyze existing open-ended VQA evaluation metrics, underscoring their limitations in providing accurate and reproducible assessments. *(Middle)* We introduce AutoConverter, a multi-agent system that automatically converts open-ended questions into multiple-choice format, enabling objective assessment while reducing the costly question creation process. *(Right)* Using AutoConverter, we convert and refine 20 existing VQA datasets into a unified multiple-choice benchmark to support future VLM research.

## 🛠️ Method: AutoConverter

Check out [main.py](main.py) for the implementation of AutoConverter.

## 💎 Dataset: VMCBench

Dataset is available at [Huggingface](https://huggingface.co/datasets/suyc21/VMCBench).

## 📈 Evaluation of VMCBench

VMCBench is officially supported by [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) and [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval). Here are the running commands:

- VLMEvalKit: `python run.py --data VMCBench_DEV --model llava_v1.5_7b`
- lmms-eval: `python -m accelerate.commands.launch -m lmms_eval --model llava --model_args pretrained="liuhaotian/llava-v1.5-7b" --tasks vmcbench`

## 🎯 Citation

If you use this repo in your research, please cite it as follows:
```
@inproceedings{AutoConverter,
title={Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation},
author={Yuhui Zhang and Yuchang Su and Yiming Liu and Xiaohan Wang and James Burgess and Elaine Sui and Chenyu Wang and Josiah Aklilu and Alejandro Lozano and Anjiang Wei and Ludwig Schmidt and Serena Yeung-Levy},
booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/yuhui-zh15/autoconverter

Awesome Lists containing this project

README