An open API service indexing awesome lists of open source software.

https://github.com/vila-lab/mobile-mmlu

Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark. Paper at: https://arxiv.org/abs/2503.20786
https://github.com/vila-lab/mobile-mmlu

ai benchmark llm mobile

Last synced: about 1 year ago
JSON representation

Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark. Paper at: https://arxiv.org/abs/2503.20786

Awesome Lists containing this project

README

          


Owl










# Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark

## Overview

Mobile-MMLU is a comprehensive benchmark designed to evaluate mobile-compatible Large Language Models (LLMs) across 80 diverse fields including Education, Healthcare, and Technology. Our benchmark is redefining mobile intelligence evaluation for a smarter future, with a focus on real-world applicability and performance metrics that matter in mobile environments.

We also introduce the **Mobile-MMLU-Pro**, which is a more compact and sophisticated version of Mobile-MMLU. You can look at both datasets at the Hugging Face ([Mobile-MMLU](https://huggingface.co/datasets/MBZUAI-LLM/Mobile-MMLU), [Mobile-MMLU-Pro](https://huggingface.co/datasets/MBZUAI-LLM/Mobile-MMLU-Pro)).

## Key Features

- **Comprehensive Coverage**: Spans 80 distinct fields with carefully curated questions
- **Mobile-Optimized**: Specifically designed for evaluating mobile-compatible LLMs
- **16,186 Questions**: Extensive dataset including scenario-based questions
- **Rigorous Evaluation**: Systematic assessment of performance, efficiency, and accuracy
- **Real-world Applications**: Focus on practical use cases in everyday scenarios

## Leaderboard

Visit our [live leaderboard](https://huggingface.co/spaces/SondosMB/Mobile-MMLU) to see the latest performance rankings of various mobile LLMs across different categories and metrics.

## Getting Started

### Backends

We currently support the following `backends` for model inference:

* `hf`: [HF Tranformers](https://github.com/huggingface/transformers)
* `gptqmodel`: [GPTQModel](https://github.com/ModelCloud/GPTQModel) for gptq quantized models

### Response Generation

1. Install required packages:
```bash
pip install torch transformers datasets pandas tqdm
```

1. Generate responses using your model:
```bash
python generate_answers.py \
--model_name your_model_name \
--batch_size 32 \
--device cuda
```

The script supports various arguments:
- `--model_name`: Name or path of the model (required)
- `--batch_size`: Batch size for processing (default: 32)
- `--device`: Device to run the model on (default: `auto` = use cuda if available else cpu)
- `--backend`: Load Model on (default: `hf`). Use `gptqmodel` for gptq quantized models.

### Response Format

The script will generate a CSV file with the following format:
```csv
question_id,predicted_answer
q1,A
q2,B
q3,C
...
```

Each row contains:
- `question_id`: The unique identifier for each question
- `predicted_answer`: The model's prediction (A, B, C, or D)

### Submission

1. After generating the CSV file with your model's predictions, submit it through our evaluation portal at [link](https://huggingface.co/spaces/SondosMB/Mobile-MMLU)