https://github.com/aimagelab/awesome-captioning-evaluation
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
https://github.com/aimagelab/awesome-captioning-evaluation
List: awesome-captioning-evaluation
Last synced: about 1 month ago
JSON representation
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
- Host: GitHub
- URL: https://github.com/aimagelab/awesome-captioning-evaluation
- Owner: aimagelab
- Created: 2025-03-18T13:29:44.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-03-18T19:32:11.000Z (3 months ago)
- Last Synced: 2025-03-18T20:32:34.167Z (3 months ago)
- Homepage:
- Size: 1.35 MB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- ultimate-awesome - awesome-captioning-evaluation - Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives. (Other Lists / Julia Lists)
README
# Image Captioning Evaluation [](https://awesome.re)
This repository contains a curated list of research papers and resources focusing on image captioning evaluation.
![]()
β Latest Update: 18 March 2025.
βThis repo is a work in progress. New updates coming soon, stay tuned!! :construction:## π©βπ» π Code for Reproducing Metric Scores
We leverage publicly available codes and have designed a unified framework that enables the reproduction of all metrics within a single repository.
**π Coming soon! π**
## π₯π₯ Our Survey
Image Captioning Evaluation in the Age of Multimodal LLMs:
Challenges and Future Perspectives**Authors:**
[**Sara Sarto**](https://scholar.google.com/citations?user=Y-DaVvEAAAAJ&hl=it),
[**Marcella Cornia**](https://scholar.google.com/citations?user=DzgmSJEAAAAJ&hl=it&oi=ao),
[**Rita Cucchiara**](https://scholar.google.com/citations?user=OM3sZEoAAAAJ&hl=it&oi=ao)
![]()
[](https://arxiv.org/abs/2503.14604)Please cite with the following BibTeX:
```
@inproceedings{sarto2025image,
title={{Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives}},
author={Sarto, Sara and Cornia, Marcella and Cucchiara, Rita},
booktitle={arxiv}
year={2025}
}
```
![]()
![]()
# π Table of Contents
- **The Evolution of Captioning Metrics**
-
Rule-based Metrics
| **Year** | **Conference / Journal** | **Title** | **Authors** | **Links** |
|:--------:|:----------------------:|:-----------------------------------------------:|:-----------------:|:------------------------------------------------------------:|
| 2002 | ACL | BLEU: A method for automatic evaluation of machine translation | *Kishore Papineni et al.* | [π Paper](https://aclanthology.org/P02-1040.pdf) |
| 2004 | ACLW | ROUGE: A package for automatic evaluation of summaries | *Chin-Yew Lin* | [π Paper](https://aclanthology.org/W04-1013.pdf) |
| 2005 | ACLW | METEOR: An automatic metric for MT evaluation with improved correlation with human judgments | *Satanjeev Banerjee et al.* | [π Paper](https://aclanthology.org/W05-0909.pdf) |
| 2015 | CVPR | CIDEr: Consensus-based Image Description Evaluation | *Ramakrishna Vedantam et al.* | [π Paper](https://openaccess.thecvf.com/content_cvpr_2015/papers/Vedantam_CIDEr_Consensus-Based_Image_2015_CVPR_paper.pdf) |
| 2016 | ECCV | SPICE: Semantic Propositional Image Caption Evaluation | *Peter Anderson et al.* | [π Paper](https://arxiv.org/pdf/1607.08822) |
- **Learnable Metrics**
-
Unsupervised Metrics
| **Year** | **Conference / Journal** | **Title** | **Authors** | **Links** |
|:--------:|:----------------------:|:-----------------------------------------------:|:-----------------:|:------------------------------------------------------------:|
| 2019 | EMNLP | TIGEr: Text-to-Image Grounding for Image Caption Evaluation | *Ming Jiang et al.* | [π Paper](https://arxiv.org/pdf/1909.02050) |
| 2020 | ICLR | BERTScore: Evaluating Text Generation with BERT | *Tianyi Zhang et al.* | [π Paper](https://openreview.net/pdf?id=SkeHuCVFDr) |
| 2020 | ACL | Improving Image Captioning Evaluation by Considering Inter References Variance | *Yanzhi Yi et al.* | [π Paper](https://aclanthology.org/2020.acl-main.93.pdf) |
| 2020 | EMNLP | ViLBERTScore: Evaluating Image Caption Using Vision-and-Language BERT | *Hwanhee Lee et al.* | [π Paper](https://aclanthology.org/2020.eval4nlp-1.4.pdf) |
| 2021 | EMNLP | CLIPScore: A Reference-free Evaluation Metric for Image Captioning | *Jack Hessel et al.* | [π Paper](https://arxiv.org/pdf/2104.08718) |
| 2021 | CVPR | FAIEr: Fidelity and Adequacy Ensured Image Caption Evaluation | *Sijin Wang et al.* | [π Paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Wang_FAIEr_Fidelity_and_Adequacy_Ensured_Image_Caption_Evaluation_CVPR_2021_paper.pdf) |
| 2021 | ACL | UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning | *Hwanhee Lee et al.* | [π Paper](https://arxiv.org/pdf/2106.14019) |
| 2022 | NeurIPS | Mutual Information Divergence: A Unified Metric for Multimodal Generative Models | *Jin-Hwa Kim et al.* | [π Paper](https://arxiv.org/pdf/2205.13445) |
| 2023 | CVPR | PAC-S: Improving CLIP for Image Caption Evaluation via Positive Augmentations | *Sara Sarto et al.* | [π Paper](https://arxiv.org/pdf/2303.12112) |
-
Supervised Metrics
| **Year** | **Conference / Journal** | **Title** | **Authors** | **Links** |
|:--------:|:----------------------:|:-----------------------------------------------:|:-----------------:|:------------------------------------------------------------:|
| 2024 | CVPR | Polos: Multimodal Metric Learning from Human Feedback for Image Captioning | *Yuiga Wada et al.* | [π Paper](https://arxiv.org/pdf/2402.18091) |
| 2024 | ACCV | DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning | *Kazuki Matsuda et al.* | [π Paper](https://arxiv.org/pdf/2409.19255) |
-
Fine-grained Oriented Metrics
| **Year** | **Conference / Journal** | **Title** | **Authors** | **Links** |
|:--------:|:----------------------:|:-----------------------------------------------:|:-----------------:|:------------------------------------------------------------:|
| 2023 | ACL | InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation | *Anwen Hu et al.* | [π Paper](https://arxiv.org/pdf/2305.06002) |
| 2024 | ACM MM | HICEScore: A Hierarchical Metric for Image Captioning Evaluation | *Zequn Zeng et al.* | [π Paper](https://arxiv.org/pdf/2407.18589) |
| 2024 | ECCV | BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues | *Sara Sarto et al.* | [π Paper](https://arxiv.org/pdf/2407.20341) |
| 2024 | ECCV | HiFi-Score: Fine-Grained Image Description Evaluation with Hierarchical Parsing Graphs | *Ziwei Yao et al.* | [π Paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07957.pdf) |
-
LLMs-based Metrics
| **Year** | **Conference / Journal** | **Title** | **Authors** | **Links** |
|:--------:|:----------------------:|:-----------------------------------------------:|:-----------------:|:------------------------------------------------------------:|
| 2023 | EMNLP | CLAIR: Evaluating Image Captions with Large Language Models | *David Chan et al.* | [π Paper](https://arxiv.org/pdf/2310.12971) |
| 2024 | ACL | FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model | *Yebin Lee et al.* | [π Paper](https://arxiv.org/pdf/2406.06004) |
-
Hallucinations-based Metrics
| **Year** | **Conference / Journal** | **Title** | **Authors** | **Links** |
|:--------:|:----------------------:|:-----------------------------------------------:|:-----------------:|:------------------------------------------------------------:|
| 2018 | EMNLP | Object Hallucination in Image Captioning | *Anna Rohrbach et al.* | [π Paper](https://arxiv.org/pdf/1809.02156) |
| 2024 | NAACL | ALOHa: A New Measure for Hallucination in Captioning Models | *Suzanne Petryk et al.* | [π Paper](https://arxiv.org/pdf/2404.02904) |
- **Datasets & Benchmarks ππ**
-
Correlation with Human Judgment
- [Flickr8k-Expert and Flickr8k-CF](https://www.ijcai.org/Proceedings/15/Papers/593.pdf)
- [Composite](https://arxiv.org/pdf/1511.03292)
- [Polaris](https://huggingface.co/datasets/yuwd/Polaris)
- [Nebula](https://arxiv.org/pdf/2409.19255)
-
Pairwise Ranking
- [Pascal50-S](https://arxiv.org/pdf/1411.5726)
-
Sensitivity to Object Hallucinations
- [FOIL dataset](https://foilunitn.github.io/)
# How to Contribute π
1. Fork this repository and clone it locally.
2. Create a new branch for your changes: `git checkout -b feature-name`.
3. Make your changes and commit them: `git commit -m 'Description of the changes'`.
4. Push to your fork: `git push origin feature-name`.
5. Open a pull request on the original repository by providing a description of your changes.This project is in constant development, and we welcome contributions to include the latest research papers in the field or report issues π₯.