https://github.com/thuml/rlvr-world

Official repository for "RLVR-World: Training World Models with Reinforcement Learning", https://arxiv.org/abs/2505.13934
https://github.com/thuml/rlvr-world

grpo real2sim reinforcement-learning-with-verifiable-rewards rlvr robotic-manipulation text-game verl video-generation video-gpt video-prediction web-agent world-model

Last synced: about 2 months ago
JSON representation

Official repository for "RLVR-World: Training World Models with Reinforcement Learning", https://arxiv.org/abs/2505.13934

Host: GitHub
URL: https://github.com/thuml/rlvr-world
Owner: thuml
License: mit
Created: 2025-05-17T03:39:30.000Z (5 months ago)
Default Branch: main
Last Pushed: 2025-06-09T12:11:33.000Z (4 months ago)
Last Synced: 2025-06-20T20:50:53.315Z (4 months ago)
Topics: grpo, real2sim, reinforcement-learning-with-verifiable-rewards, rlvr, robotic-manipulation, text-game, verl, video-generation, video-gpt, video-prediction, web-agent, world-model
Language: Python
Homepage: https://thuml.github.io/RLVR-World/
Size: 13.5 MB
Stars: 45
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # RLVR-World: Training World Models with Reinforcement Learning

[![Project Page](https://img.shields.io/badge/Project_Page-blue)](https://thuml.github.io/RLVR-World/)

[![Paper](https://img.shields.io/badge/arXiv-Paper-b31b1b.svg?logo=arxiv)](https://arxiv.org/abs/2505.13934)

[![Hugging Face](https://img.shields.io/badge/Hugging_Face-Models_&_Datasets-F8D44E.svg?logo=huggingface)](https://huggingface.co/collections/thuml/rlvr-world-682f331c75a904b8febc366a)

This is the official code base for the paper [RLVR-World: Training World Models with Reinforcement Learning](https://arxiv.org/abs/2505.13934).

Give it a star 🌟 if you find our work useful!

## 🔥 News

- 🚩 **2025.05.26**: We release all models and datasets.

- 🚩 **2025.05.21**: We open-source our training codes.

- 🚩 **2025.05.21**: Our paper is released on [arXiv](https://arxiv.org/abs/2505.13934).

## 📋 TL;DR

We pioneer training world models through RLVR:

- World models across various modalities (particularly, language and videos) are unified under a sequence modeling formulation;

- Task-specific prediction metrics serve as verifiable rewards directly optimized by RL.

![concept](assets/concept.png)

## 🤗 Models and Datasets

At the moment, we provide the following models and datasets:

| Modality | Type        | Domain             | Name                                                         |

| -------- | ----------- | ------------------ | ------------------------------------------------------------ |

| Language | Dataset     | Text game          | [bytesized32-world-model-cot](https://huggingface.co/datasets/thuml/bytesized32-world-model-cot) |

| Language | World model | Text game          | [bytesized32-world-model-sft](https://huggingface.co/thuml/bytesized32-world-model-sft) |

| Language | World model | Text game          | [bytesized32-world-model-rlvr-binary-reward](https://huggingface.co/thuml/bytesized32-world-model-rlvr-binary-reward) |

| Language | World model | Text game          | [bytesized32-world-model-rlvr-task-specific-reward](https://huggingface.co/thuml/bytesized32-world-model-rlvr-task-specific-reward) |

| Language | Dataset     | Web navigation     | [webarena-world-model-cot](https://huggingface.co/datasets/thuml/webarena-world-model-cot) |

| Language | World model | Web navigation     | [webarena-world-model-sft](https://huggingface.co/thuml/webarena-world-model-sft) |

| Language | World model | Web navigation     | [webarena-world-model-rlvr](https://huggingface.co/thuml/webarena-world-model-rlvr) |

| Video    | Tokenizer   | Robot manipulation | [rt1-frame-tokenizer](https://huggingface.co/thuml/rt1-frame-tokenizer) |

| Video    | World model | Robot manipulation | [rt1-world-model-single-step-base](https://huggingface.co/thuml/rt1-world-model-single-step-base) |

| Video    | World model | Robot manipulation | [rt1-world-model-single-step-rlvr](https://huggingface.co/thuml/rt1-world-model-single-step-rlvr) |

| Video    | Tokenizer   | Robot manipulation | [rt1-compressive-tokenizer](https://huggingface.co/thuml/rt1-compressive-tokenizer) |

| Video    | World model | Robot manipulation | [rt1-world-model-multi-step-base](https://huggingface.co/thuml/rt1-world-model-multi-step-base) |

| Video    | World model | Robot manipulation | [rt1-world-model-multi-step-rlvr](https://huggingface.co/thuml/rt1-world-model-multi-step-rlvr) |

## 💬 Evaluating Language World Models

See [`lang_wm`](/lang_wm):

- Text game state prediction

- Web page state prediction

- Application: Model predictive control for web agents

## 🎇 Evaluating Video World Models

See [`vid_wm`](/vid_wm):

- Robot manipulation trajectory prediction

- Application: Real2sim policy evaluation

## 🎥 Showcases

![showcase](assets/showcase.png)

## 🚀 Release Progress

- [x] Video world model with RLVR

- [x] Pre-trained & post-trained video world model weights

- [x] Real2sim policy evaluation with video world models

- [x] Text game SFT data

- [x] Web page SFT data

- [x] Language world model on text games with RLVR

- [x] Language world model on web pages with RLVR

- [x] Post-trained language world model weights

- [x] Web agents with language world models

## 📜 Citation

If you find this project useful, please cite our paper as:

```

@article{wu2025rlvr,

    title={RLVR-World: Training World Models with Reinforcement Learning}, 

    author={Jialong Wu and Shaofeng Yin and Ningya Feng and Mingsheng Long},

    journal={arXiv preprint arXiv:2505.13934},

    year={2025},

}

```

## 🤝 Contact

If you have any questions, please contact wujialong0229@gmail.com.

## 💡 Acknowledgement

We sincerely appreciate the following github repos for their valuable codebase we build upon:

- https://github.com/volcengine/verl

- https://github.com/thuml/iVideoGPT

- https://github.com/kyle8581/WMA-Agents

- https://github.com/cognitiveailab/GPT-simulator

- https://github.com/web-arena-x/webarena

- https://github.com/simpler-env/SimplerEnv

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/thuml/rlvr-world

Awesome Lists containing this project

README