Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mbzuai-oryx/camel-bench

CAMEL-Bench is an Arabic benchmark for evaluating multimodal models across eight domains with 29,000 questions.
https://github.com/mbzuai-oryx/camel-bench

arabic benchmark large-multimodal-models mbzuai multimodal-learning visual-question-answering vqa

Last synced: about 1 month ago
JSON representation

CAMEL-Bench is an Arabic benchmark for evaluating multimodal models across eight domains with 29,000 questions.

Host: GitHub
URL: https://github.com/mbzuai-oryx/camel-bench
Owner: mbzuai-oryx
License: mit
Created: 2024-10-23T15:48:46.000Z (about 2 months ago)
Default Branch: main
Last Pushed: 2024-10-27T13:02:21.000Z (about 2 months ago)
Last Synced: 2024-10-27T23:10:51.463Z (about 2 months ago)
Topics: arabic, benchmark, large-multimodal-models, mbzuai, multimodal-learning, visual-question-answering, vqa
Language: Python
Homepage: https://mbzuai-oryx.github.io/Camel-Bench/
Size: 14 MB
Stars: 13
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# CAMEL-Bench: A Comprehensive Arabic LMM Benchmark

[![GitHub issues](https://img.shields.io/github/issues/mbzuai-oryx/Camel-Bench)](https://github.com/mbzuai-oryx/Camel-Bench/issues) [![GitHub stars](https://img.shields.io/github/stars/mbzuai-oryx/Camel-Bench)](https://github.com/mbzuai-oryx/Camel-Bench/stargazers) [![GitHub license](https://img.shields.io/github/license/mbzuai-oryx/Camel-Bench)](https://github.com/mbzuai-oryx/Camel-Bench/blob/main/LICENSE) [![Dataset on HuggingFace](https://img.shields.io/badge/HuggingFace-Dataset-orange)](https://huggingface.co/collections/ahmedheakl/camel-bench-670750f3998395452cd3b7b1)

## Overview

CAMEL-Bench is a **Comprehensive Arabic LMM Benchmark** designed to evaluate and improve the capabilities of **Large Multimodal Models (LMMs)** in the Arabic language. Our benchmark aims to bridge the gap in multimodal model evaluation for Arabic, which represents a large population of over 400 million speakers worldwide.

![CAMEL-Bench Diversity](assets/CAMEL-B.png)

The benchmark includes **eight diverse domains** and **38 sub-domains** to rigorously assess the performance of LMMs in visual reasoning and understanding tasks. It comprises over **29K questions**, curated by native Arabic speakers, ensuring high-quality evaluation.

## Key Features

- **Eight Domains of Evaluation:** Multimodal Understanding and Reasoning, OCR and Document Understanding, Chart and Diagram Understanding, Video Understanding, Cultural-Specific Understanding, Medical Imaging, Agricultural Image Understanding, and Remote Sensing Understanding.
- **Over 29,000 Questions:** Carefully curated by native Arabic speakers to ensure quality and accuracy.
- **Broad Scope:** Evaluates models in domains such as **medical imaging**, **cultural-specific understanding**, and **remote sensing**.
- **Open and Closed Source Evaluation:** We provide a leaderboard featuring results from both closed-source models (e.g., GPT-4o) and open-source LMMs.

## Leaderboard

[![Leaderboard on HuggingFace](https://img.shields.io/badge/HuggingFace-Leaderboard-yellow)](https://huggingface.co/spaces/ahmedheakl/CAMEL-Bench-leaderboard)

Our leaderboard provides a performance comparison of different models evaluated on CAMEL-Bench. Current top performers include **GPT-4o** with an overall score of **62%** and other notable models such as **Gemini-1.5-Pro**.

![Leaderboard Snapshot](assets/eval-samples-figure.png)

## Installation

To get started with CAMEL-Bench, clone the repository and install the dependencies:

```sh
$ git clone https://github.com/mbzuai-oryx/Camel-Bench.git
$ cd Camel-Bench
$ pip install -r requirements.txt
```

## Getting Started

The benchmark can be easily executed using the provided scripts:

```sh
$ python scripts/eval_qwen.py
```

To evaluate on your model, just modify the `generate_qwen` function in `scripts/eval_qwen.py`.

## Dataset

Our dataset is hosted on HuggingFace, and can be accessed here: [CAMEL-Bench Dataset 🤗](https://huggingface.co/collections/ahmedheakl/camel-bench-670750f3998395452cd3b7b1).

## Citation

If you use CAMEL-Bench in your research, please consider citing:

```bibtex
@article{ghaboura2024camelbench,
title={CAMEL-Bench: A Comprehensive Arabic LMM Benchmark},
author={Sara Ghaboura, Ahmed Heakl, Omkar Thawakar, Ali Alharthi, Ines Riahi, Abduljalil Saif, Jorma Laaksonen, Fahad S. Khan, Salman Khan, Rao M. Anwer},
journal={arXiv preprint arXiv:2410.18976},
year={2024}
}
```

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Join Us!

We welcome contributions to CAMEL-Bench! Just push a pull request or issue to get started.

[![GitHub forks](https://img.shields.io/github/forks/mbzuai-oryx/Camel-Bench?style=social)](https://github.com/mbzuai-oryx/Camel-Bench/network/members)
[![GitHub stars](https://img.shields.io/github/stars/mbzuai-oryx/Camel-Bench?style=social)](https://github.com/mbzuai-oryx/Camel-Bench/stargazers)

## Contact
- Ahmed Heakl: [[email protected]](mailto:[email protected])

For questions or suggestions, feel free to reach out to us on [GitHub Discussions](https://github.com/mbzuai-oryx/Camel-Bench/discussions).