https://github.com/dvampire/multimodalmath2

Last synced: 9 months ago
JSON representation

Host: GitHub
URL: https://github.com/dvampire/multimodalmath2
Owner: DVampire
License: apache-2.0
Created: 2025-02-26T04:07:15.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-03-11T10:47:57.000Z (about 1 year ago)
Last Synced: 2025-03-11T11:36:35.034Z (about 1 year ago)
Language: Python
Size: 310 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework

This project is a clean fork of the original [veRL](https://github.com/volcengine/verl) project to support vision language models, we thank all the authors for providing such a high-performance RL training framework.

EasyR1 is efficient and scalable due to the design of **[HybirdEngine](https://arxiv.org/abs/2409.19256)** and the latest release of **[vLLM](https://github.com/vllm-project/vllm)**'s SPMD mode.

## Features

- Supported models

  - Qwen2/Qwen2.5 language models

  - Qwen2/Qwen2.5-VL vision language models

  - DeepSeek-R1 distill models

- Supported algorithms

  - GRPO

  - others RL algorithms (comming soon)

- Supported datasets

  - Any text, vision-text dataset in a [specific format](#custom-dataset).

## Requirements

### Software Requirements

- Python 3.9+

- transformers>=4.49.0

- flash-attn>=2.4.3

- vllm>=0.7.3

We provide a [Dockerfile](./Dockerfile) to easily build environments.

### Hardware Requirements

\* *estimated*

| Method                   | Bits |  1.5B  |   3B   |   7B   |

| ------------------------ | ---- | ------ | ------ | ------ |

| GRPO Full Fine-Tuning    |  AMP | 2*24GB | 2*40GB | 4*40GB |

> [!NOTE]

> We are working hard to reduce the VRAM in RL training, LoRA support will be integrated in next updates.

## Tutorial: Run Qwen2.5-VL GRPO on [Geometry3K](https://huggingface.co/datasets/hiyouga/geometry3k) Dataset in Just 3 Steps

![image](assets/qwen2_5_vl_7b_geo.png)

### Installation

```bash

git clone https://github.com/hiyouga/EasyR1.git

cd EasyR1

pip install -e .

pip install git+https://github.com/hiyouga/MathRuler.git

```

### GRPO Training

```bash

bash examples/run_qwen2_5_vl_7b_geo.sh

```

### Merge Checkpoint in Hugging Face Format

```bash

python3 scripts/model_merger.py --local_dir path_to_your_last_actor_checkpoint

```

> [!NOTE]

> If you encounter issues with connecting to Hugging Face, consider using `export HF_ENDPOINT=https://hf-mirror.com`.

>

> If you want to use SwanLab logger, consider using `bash examples/run_qwen2_5_vl_7b_geo_swanlab.sh`.

## Custom Dataset

The dataset should strictly follow the example data format.

- Text dataset: https://huggingface.co/datasets/hiyouga/math12k

    - Required columns: problem, answer

- Vision-text dataset: https://huggingface.co/datasets/hiyouga/geometry3k

    - Required columns: images, problem, answer

## TODO

- Support PPO, Remax, Reinforce++ and RLOO for VLMs.

- Support padding-free training for VLMs.

- Support ulysses parallelism for VLMs.

- Support more VLM architectures.

### Known bugs

These features are temporarily disabled for now, we plan to fix them one-by-one in the future updates.

- Vision language models are not compatible with padding-free training and ulysses parallelism yet.

- Vision language models are not compatible with `enable_chunked_prefill` unless [vLLM v1](https://blog.vllm.ai/2025/01/27/v1-alpha-release.html) is supported.

## Discussion Group

👋 Join our [WeChat group](assets/wechat.jpg).

## Citation

Core contributors: [Yaowei Zheng](https://github.com/hiyouga), [Junting Lu](https://github.com/AL-377), [Shenzhi Wang](https://github.com/Shenzhi-Wang) and Yuwen Xiong

We also thank Guangming Sheng and Chi Zhang for helpful discussions.

```bibtex

@misc{zheng2025easyr1,

  title        = {EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework},

  author       = {Yaowei Zheng, Junting Lu, Shenzhi Wang, Yuwen Xiong},

  howpublished = {\url{https://github.com/hiyouga/EasyR1}},

  year         = {2025}

}

```

We recommend to also cite the original work.

```bibtex

@article{sheng2024hybridflow,

  title   = {HybridFlow: A Flexible and Efficient RLHF Framework},

  author  = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},

  year    = {2024},

  journal = {arXiv preprint arXiv: 2409.19256}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dvampire/multimodalmath2

Awesome Lists containing this project

README