https://github.com/lyan62/FoodieQA

Official Repo for FoodieQA paper (EMNLP 2024)
https://github.com/lyan62/FoodieQA
Last synced: 2 months ago
JSON representation
Official Repo for FoodieQA paper (EMNLP 2024)
Host: GitHub
URL: https://github.com/lyan62/FoodieQA
Owner: lyan62
Created: 2024-09-26T08:49:46.000Z (8 months ago)
Default Branch: main
Last Pushed: 2024-11-19T16:42:36.000Z (6 months ago)
Last Synced: 2024-11-19T17:44:38.172Z (6 months ago)
Language: Python
Size: 18.7 MB
Stars: 11
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

awesome-cultural-nlp - Code
README

        # FoodieQA

Official repo for EMNLP 2024 paper [FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture](https://aclanthology.org/2024.emnlp-main.1063/)

Evaluate Chinese food culture understanding of LLMs/VLMs with the FoodieQA benchmark

## Benchmark

![](foodie-img.jpeg)

🤗 Available on HuggingFace [lyan62/FoodieQA](https://huggingface.co/datasets/lyan62/FoodieQA)

License: [CC-BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en)

## Models

### Models and results for the VQA tasks

| Evaluation          | Multi-image VQA (ZH) | Multi-image VQA (EN) | Single-image VQA (ZH) | Single-image VQA (EN) |

|---------------------|:--------------------:|:--------------------:|:---------------------:|:---------------------:|

| **Human**           | 91.69                | 77.22†               | 74.41                 | 46.53†                |

| **Phi-3-vision-4.2B** | 29.03               | 33.75                | 42.58                 | 44.53                 |

| **Idefics2-8B**     | **50.87**            | 41.69                | 46.87                 | **52.73**             |

| **Mantis-8B**       | 46.65                | **43.67**            | 41.80                 | 47.66                 |

| **Qwen-VL-12B**     | 32.26                | 27.54                | 48.83                 | 42.97                 |

| **Yi-VL-6B**        | -                    | -                    | **49.61**             | 41.41                 |

| **Yi-VL-34B**       | -                    | -                    | 52.73                 | 48.05                 |

| **GPT-4V**          | 78.92                | 69.23                | 63.67                 | 60.16                 |

| **GPT-4o**          | **86.35**            | **80.64**            | **72.66**             | **67.97**             |

### Models and results for the TextQA task

| Model               | Best Accuracy | Prompt |

|---------------------|:-------------:|:------:|

| Phi-3-medium        | 41.28         | 1      |

| Mistral-7B-instruct | 35.18         | 1      |

| Llama3-8B-Chinese   | 47.38         | 1      |

| YI-6B               | 25.53         | 3      |

| YI-34B              | 46.38         | 3      |

| Qwen2-7B-instruct   | 68.23         | 3      |

| GPT-4               | 60.99         | 1      |

## UI for creating and validating questions

`cd ui`

- See [UI README](ui/README.md) for using the UI to create questions and ground truth labels for Multi-image VQA task.

`cd val-ui`

- See [Val-UI README](val-ui/README_Validation) for using the UI for question validation.

## Model evaluation

- `mkdir `

- Download the dataset from Huggingface [lyan62/FoodieQA](https://huggingface.co/datasets/lyan62/FoodieQA). We use a gated repo to prevent data contamination from web crawlers. Please submit a request for using the dataset.

    - data structure:

        -   /images

        -   sivqa_tidy.json

        -   mivqa_tidy.json

        -   textqa_tidy.json

### Evaluation scripts

- See evaluation scripts in `model-eval/scripts/`

- See example bash file in `model-eval/run`

###  Run open-source models

- set up 

    ```

    conda create -n foodie python=3.9

    pip install -r requirements.txt

    ```

- multi-image VQA:

    For example evaluate the mantis_idefics model:

    ```

    cd model-eval

    python scripts/eval_mantis_idefics.py 

    --data_dir  --out_dir 

    --cache_dir  --prompt 0

    ```

- single-image VQA:

    evaluate idefics2-8b, mantis-idefics2:

        ```

        python eval_idefics_sivqa.py

        --data_dir  --out_dir  --cache_dir  --template 0 --model-name 

        ```

     - `model_name` can be "TIGER-Lab/Mantis-8B-Idefics2" or "HuggingFaceM4/idefics2-8b"

     - template can be 0-3 to indicate which prompt to use

           

    similarly run Qwen-VL model with 

    ```

    python eval_qwen_sivqa.py --data_dir  --out_dir  --cache_dir  --template 0 --eval_file sivqa_tidy.json

    ```

    evaluate Yi-models, `cd model-eval/Yi/VL`, then run

    

    ```

    python foodie_inference.py --model_path models/Yi-VL-6B --data_dir  --out_dir  --cache_dir  --template 0 --eval_file sivqa_tidy.json

    ```

### Prompts

#### Multi-image VQA:

```

pgeneral = 请从给定选项ABCD中选择一个最合适的答案。

prompt 0 

根据以上四张图回答问题，他们分别为图A, 图B, 图C, 图D, (pgeneral), 问题：{}, 答案为：图

promtp 1

图A

图B

图C

图D

根据以上四张图回答问题, (pgeneral), 问题：{}, 答案为：图

prompt 2

根据以下四张图回答问题,(pgeneral),

图A

图B

图C

图D

问题：{}, 答案为：图

prompt 3

Human: 问题{}，选项有: 

图A

图B

图C

图D

Assistant: 如果从给定选项ABCD中选择一个最合适的答案， 答案为：图

```

English prompts:

```

prompt 0 

"Answer the following question according to the provided four images, they corresponds 

to Option (A), Option (B), Option (C), Option (D). Choose one best answer from the given options.

Question: {}, your answer is: Option ("

promtp 1

"Answer the following question according to the provided four images which corresponds 

to Option (A), Option (B), Option (C), Option (D). Choose one best answer from the given options.

The options are:

Option (A)

Option (B)

Option (C)

Option (D)

Question: {}, your answer is: Option ("

prompt 2

"Answer the following question according to the provided four images, 

and choose one best answer from the given options.

The options are:

Option (A)

Option (B)

Option (C)

Option (D)

Question: {}, your answer is: Option ("

prompt 3

"Human: Question{} The options are: 

Option (A)

Option (B)

Option (C)

Option (D)

Assistant: If I have to choose one best answer from the given options， the answer is：Option ("

```

#### Single-image VQA

See `format_text_prompt()` in `model-eval/scripts/sivqa_utils.py` 

https://github.com/lyan62/foodie-eval/blob/76a22ee16fb58bb090c0ad3eb1f35e39fc71687e/model-eval/scripts/sivqa_utils.py#L30

## Citation

```

@inproceedings{li-etal-2024-foodieqa,

    title = "{F}oodie{QA}: A Multimodal Dataset for Fine-Grained Understanding of {C}hinese Food Culture",

    author = "Li, Wenyan  and

      Zhang, Crystina  and

      Li, Jiaang  and

      Peng, Qiwei  and

      Tang, Raphael  and

      Zhou, Li  and

      Zhang, Weijia  and

      Hu, Guimin  and

      Yuan, Yifei  and

      S{\o}gaard, Anders  and

      Hershcovich, Daniel  and

      Elliott, Desmond",

    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",

    year = "2024",

    url = "https://aclanthology.org/2024.emnlp-main.1063",

    }

```
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lyan62/FoodieQA

Awesome Lists containing this project

README