https://tiger-ai-lab.github.io/ImagenWorld/

Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks [ICLR 2026]
https://tiger-ai-lab.github.io/ImagenWorld/

genai generation image

Last synced: 2 days ago
JSON representation

Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks [ICLR 2026]

Host: GitHub
URL: https://tiger-ai-lab.github.io/ImagenWorld/
Owner: TIGER-AI-Lab
License: mit
Created: 2025-09-28T17:40:49.000Z (9 months ago)
Default Branch: main
Last Pushed: 2026-04-02T03:00:29.000Z (3 months ago)
Last Synced: 2026-04-02T15:36:05.968Z (3 months ago)
Topics: genai, generation, image
Language: Python
Homepage: https://tiger-ai-lab.github.io/ImagenWorld/
Size: 18.1 MB
Stars: 31
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# 🖼️ ImagenWorld
[![arXiv](https://img.shields.io/badge/arXiv-2310.01596-b31b1b.svg)](https://arxiv.org/abs/2603.27862)

ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks

[ICLR Paper](https://openreview.net/forum?id=bld9g6jFh9)

**ImagenWorld** is a large-scale, human-centric benchmark designed to stress-test image generation models in real-world scenarios.
- **Broad coverage across 6 domains:** Artworks, Photorealistic Images, Information Graphics, Textual Graphics, Computer Graphics, and Screenshots.
- **Rich supervision:** ~3.6K condition sets and ~20K fine-grained human annotations enable comprehensive, reproducible evaluation.
- **Explainable evaluation pipeline:** We decompose generated outputs via object/segment extraction to identify entities (objects, fine-grained regions), supporting both scalar ratings and object-/segment-level failure tags.
- **Diverse model suite:** We evaluate **14 models** in total — **4 unified** (GPT-Image-1, Gemini 2.0 Flash, BAGEL, OmniGen2) and **10 task-specific** baselines (SDXL, Flux.1-Krea-dev, Flux.1-Kontext-dev, Qwen-Image, Infinity, Janus Pro, UNO, Step1X-Edit, IC-Edit, InstructPix2Pix).

[🌐 Project Page] [📄 Paper] [💾 Datasets] [🏛️ ImagenWorld-Visualizer]

## 📰 News
* 2025 Jan 25: Accepted to ICLR 2026!
* 2025 Oct 16: ComfyUI Blog on [https://blog.comfy.org/p/introducing-imagenworld](https://blog.comfy.org/p/introducing-imagenworld)
* 2025 Oct 13: Preprint released on Github.

## 📖 Introduction

This repository contains the code for the paper [ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks]().
In this paper, We introduce **ImagenWorld**, a large-scale, human-centric benchmark designed to stress-test image generation models in real-world scenarios. Unlike prior evaluations that focus on isolated tasks or narrow domains, ImagenWorld is organized into six domains: Artworks, Photorealistic Images, Information Graphics, Textual Graphics, Computer Graphics, and Screenshots, and six tasks: Text-to-Image Generation (TIG), Single-Reference Image Generation (SRIG), Multi-Reference Image Generation (MRIG), Text-to-Image Editing (TIE), Single-Reference Image Editing (SRIE), and Multi-Reference Image Editing (MRIE). The benchmark includes 3.6K condition sets and 20K fine-grained human annotations, providing a comprehensive testbed for generative models. To support explainable evaluation, ImagenWorld applies object- and segment-level extraction to generated outputs, identifying entities such as objects and fine-grained regions. This structured decomposition enables human annotators to provide not only scalar ratings but also detailed tags of object-level and segment-level failures.

Teaser

## 🚀 Quick Start — Inference

**Tasks:** `TIG` (Text→Image Generation), `TIE` (Text→Image Editing), `SRIG`, `SRIE`, `MRIG`, `MRIE`
**Datasets:** assumes `ImagenWorld//...` layout (adjust `--task_path` as needed)

---

### Open-Source Models

**Directory:** `inference/open-source/`
**Entrypoint:** `main.py`
**Model registry:** `inference/open-source/config.py`
**Batch helper:** `open_models.sh`

All open-source and close-source runners follow a unified CLI:
```bash
python main.py --task --model --task_path --limit --verbose
```

#### 🔹 Example: TIG (Text→Image Generation) with UNO
```bash
cd inference/open-source

python main.py --task TIG --model UNO --task_path /path/to/ImagenWorld/TIG --limit 5 --verbose
```

**Explanation**
- Loads the **UNO** open-source generator from the registry (`config.py`)
- Runs the **TIG** (Text→Image Generation) task using samples from `/path/to/ImagenWorld/TIG`
- Saves results to `model_outputs/model_name.png`
- Prints per-sample logs if `--verbose` is enabled

---

### Closed-Source Models

**Directory:** `inference/closed-source/`
**Entrypoint:** `main.py`
**Model registry:** `inference/closed-source/config.py`
**Batch helper:** `closed_models.sh`

Available closed-source APIs and outputs:
- `GPT-Image-1` → saves `gpt-image-1.png`
- `Gemini2Flash` → saves `gemini.png`

#### 🔧 Setup Environment
Set your API keys before running:
```bash
export OPENAI_API_KEY="sk-..." # for GPT-Image-1
export GEMINI_API_KEY="..." # for Gemini 2.5 Flash Image Preview
```

#### 🔹 Example: TIE (Text→Image Editing) with Gemini 2.5 Flash
```bash
cd inference/closed-source

python main.py --task TIE --model Gemini2Flash --task_path /path/to/ImagenWorld/TIE --limit 5 --verbose
```

**Explanation**
- Loads the selected **closed-source API model** (via OpenAI or Gemini)
- Runs the specified task on samples from `/path/to/ImagenWorld/`
- Stores generated images (e.g., `gpt-image-1.png`, `gemini.png`)

---

### Batch Execution (Optional)

Each inference type includes a shell helper for multi-task runs:

```bash
# open-source batch
cd inference/open-source
bash open_models.sh

# closed-source batch
cd inference/closed-source
bash closed_models.sh
```

In both scripts:
- Set `BASE_PATH` → dataset root (e.g., `/path/to/ImagenWorld`)
- Define `TASK_MODELS` to map each task to a model
- Set API keys for closed-source models

## Citation

If you find our work useful for your research, please consider citing our paper:

```bibtex
@inproceedings{
sani2026imagenworld,
title={ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks},
author={Samin Mahdizadeh Sani and Max Ku and Nima Jamali and Matina Mahdizadeh Sani and Paria Khoshtab and Wei-Chieh Sun and Parnian Fazel and Zhi Rui Tam and Thomas Chong and Edisy Kin Wai Chan and Donald Wai Tong Tsang and Chiao-Wei Hsu and Lam Ting Wai and Ho Yin Sam Ng and Chiafeng Chu and Chak-Wing Mak and Keming Wu and Hiu Tung Wong and Yik Chun Ho and Chi Ruan and Zhuofeng Li and I-Sheng Fang and Shih-Ying Yeh and Ho Kei Cheng and Ping Nie and Wenhu Chen},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=bld9g6jFh9}
}

@misc{sani2026imagenworldstresstestingimagegeneration,
title={ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks},
author={Samin Mahdizadeh Sani and Max Ku and Nima Jamali and Matina Mahdizadeh Sani and Paria Khoshtab and Wei-Chieh Sun and Parnian Fazel and Zhi Rui Tam and Thomas Chong and Edisy Kin Wai Chan and Donald Wai Tong Tsang and Chiao-Wei Hsu and Ting Wai Lam and Ho Yin Sam Ng and Chiafeng Chu and Chak-Wing Mak and Keming Wu and Hiu Tung Wong and Yik Chun Ho and Chi Ruan and Zhuofeng Li and I-Sheng Fang and Shih-Ying Yeh and Ho Kei Cheng and Ping Nie and Wenhu Chen},
year={2026},
eprint={2603.27862},
archivePrefix={arXiv},
primaryClass={cs.GR},
url={https://arxiv.org/abs/2603.27862},
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://tiger-ai-lab.github.io/ImagenWorld/

Awesome Lists containing this project

README