Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/davidmchan/aloha
A new reliable, localizable, and generalizable metric for hallucination detection in image captioning models.
https://github.com/davidmchan/aloha
Last synced: 4 months ago
JSON representation
A new reliable, localizable, and generalizable metric for hallucination detection in image captioning models.
- Host: GitHub
- URL: https://github.com/davidmchan/aloha
- Owner: DavidMChan
- Created: 2024-03-30T04:25:27.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-07-22T22:27:57.000Z (7 months ago)
- Last Synced: 2024-07-23T02:12:39.934Z (7 months ago)
- Language: Python
- Size: 15.6 MB
- Stars: 4
- Watchers: 4
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ALOHa: A New Measure for Hallucination in Captioning Models
### [Project](https://davidmchan.github.io/aloha/) | [Paper](https://arxiv.org/abs/2404.02904)
Official implementation of the paper: ["ALOHa: A New Measure for Hallucination in Captioning Models"](https://arxiv.org/abs/2404.02904).
Despite recent advances in multimodal pre-training for visual description, state-of-the-art models still produce captions containing errors, such as hallucinating objects not present in a scene. The existing prominent metric for object hallucination, CHAIR, is limited to a fixed set of MS COCO objects and synonyms. In this work, we propose a modernized open-vocabulary metric, ALOHa, which leverages large language models (LLMs) to measure object hallucinations. Specifically, we use an LLM to extract groundable objects from a candidate caption, measure their semantic similarity to reference objects from captions and object detections, and use Hungarian matching to produce a final hallucination score. We show that ALOHa correctly identifies 13.6\% more hallucinated objects than CHAIR on HAT, a new gold-standard subset of MS COCO Captions annotated for hallucinations, and 30.8% more on nocaps, where objects extend beyond MS COCO categories.
## Getting started
### Setup
```bash
# Install this package from github
pip install git+https://github.com/DavidMChan/aloha.git# Install the Spacy model if you haven't already
pip install -U spacy
python -m spacy download en_core_web_lg
```### Usage
To compute the ALOHa score for a single caption:
```python
from aloha.metrics import ALOHa
from aloha.object_parser import GPT35TurboObjectParser
from aloha.string_similarity import MPNetSimilarity# Initialize the ALOHa metric
evaluator = ALOHa(
name="aloha",
object_parser=GPT35TurboObjectParser,
similarity_measure=MPNetSimilarity,
num_reference_examples=3,
num_target_examples=3,
detect_objects=True,
)candidate_caption = "A cat is sitting on a table"
reference_captions = ["A dog is sitting on a table", "A hound is sitting on a table"]
optional_image_path = None
optional_precomputed_detections = None# Compute the ALOHa score
score, matches = evaluator(
target=candidate_caption,
references=reference_captions,
image_path=optional_image_path,
object_detections=optional_precomputed_detections,
)print(score)
# 0.6081229448318481print(matches)
# {'matches': [{'ref_word': 'table', 'similarity': 1.0, 'target_word': 'table'},
# {'ref_word': 'dog',
# 'similarity': 0.6081229448318481,
# 'target_word': 'cat'}],
# 'reference_objects': [['dog'],
# ['dog'],
# ['table'],
# ['table'],
# ['hound'],
# ['hound']],
# 'target_objects': [['cat'], ['table']],
# 'unparsed_reference_objects': '- dog\n- table\n- hound',
# 'unparsed_target_objects': '- cat\n- table'}
```To compute it for a full dataset of samples, you can use the `evaluate-dataset` script. First, prepare your dataset in
a JSON file with the following format:```json
[
{
"caption": "A caption",
"references": ["Ref 1", "Ref 2", ...],
"image_path": "path/to/image.jpg",
},
...
]
```Then, run the following command:
```bash
aloha evaluate-dataset -m aloha path/to/dataset.json
```The above command has many options to customize the evaluation. You can see them by running:
```bash
aloha evaluate-dataset --help
```## Citation
If you find this repository useful, please cite our paper:
```bibtex
@inproceedings{petryk2024aloha,
title = "ALOHa: A New Measure for Hallucination in Captioning Models",
author = "Petryk, Suzanne and
Chan, David M and
Kachinthaya, Anish and
Zou, Haodi and
Canny, John and
Gonzalez, Joseph E and
Darrell, Trevor",
booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
year = "2024",
publisher = "Association for Computational Linguistics",
}
```