https://github.com/vila-lab/mobile-mmlu
Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark. Paper at: https://arxiv.org/abs/2503.20786
https://github.com/vila-lab/mobile-mmlu
ai benchmark llm mobile
Last synced: about 1 year ago
JSON representation
Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark. Paper at: https://arxiv.org/abs/2503.20786
- Host: GitHub
- URL: https://github.com/vila-lab/mobile-mmlu
- Owner: VILA-Lab
- License: mit
- Created: 2024-12-25T15:06:55.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-27T08:39:18.000Z (about 1 year ago)
- Last Synced: 2025-04-04T08:19:34.112Z (about 1 year ago)
- Topics: ai, benchmark, llm, mobile
- Language: Python
- Homepage: https://vila-lab.github.io/Mobile_MMLU/
- Size: 18.4 MB
- Stars: 10
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark
## Overview
Mobile-MMLU is a comprehensive benchmark designed to evaluate mobile-compatible Large Language Models (LLMs) across 80 diverse fields including Education, Healthcare, and Technology. Our benchmark is redefining mobile intelligence evaluation for a smarter future, with a focus on real-world applicability and performance metrics that matter in mobile environments.
We also introduce the **Mobile-MMLU-Pro**, which is a more compact and sophisticated version of Mobile-MMLU. You can look at both datasets at the Hugging Face ([Mobile-MMLU](https://huggingface.co/datasets/MBZUAI-LLM/Mobile-MMLU), [Mobile-MMLU-Pro](https://huggingface.co/datasets/MBZUAI-LLM/Mobile-MMLU-Pro)).
## Key Features
- **Comprehensive Coverage**: Spans 80 distinct fields with carefully curated questions
- **Mobile-Optimized**: Specifically designed for evaluating mobile-compatible LLMs
- **16,186 Questions**: Extensive dataset including scenario-based questions
- **Rigorous Evaluation**: Systematic assessment of performance, efficiency, and accuracy
- **Real-world Applications**: Focus on practical use cases in everyday scenarios
## Leaderboard
Visit our [live leaderboard](https://huggingface.co/spaces/SondosMB/Mobile-MMLU) to see the latest performance rankings of various mobile LLMs across different categories and metrics.
## Getting Started
### Backends
We currently support the following `backends` for model inference:
* `hf`: [HF Tranformers](https://github.com/huggingface/transformers)
* `gptqmodel`: [GPTQModel](https://github.com/ModelCloud/GPTQModel) for gptq quantized models
### Response Generation
1. Install required packages:
```bash
pip install torch transformers datasets pandas tqdm
```
1. Generate responses using your model:
```bash
python generate_answers.py \
--model_name your_model_name \
--batch_size 32 \
--device cuda
```
The script supports various arguments:
- `--model_name`: Name or path of the model (required)
- `--batch_size`: Batch size for processing (default: 32)
- `--device`: Device to run the model on (default: `auto` = use cuda if available else cpu)
- `--backend`: Load Model on (default: `hf`). Use `gptqmodel` for gptq quantized models.
### Response Format
The script will generate a CSV file with the following format:
```csv
question_id,predicted_answer
q1,A
q2,B
q3,C
...
```
Each row contains:
- `question_id`: The unique identifier for each question
- `predicted_answer`: The model's prediction (A, B, C, or D)
### Submission
1. After generating the CSV file with your model's predictions, submit it through our evaluation portal at [link](https://huggingface.co/spaces/SondosMB/Mobile-MMLU)