https://github.com/RLinf/RLinf

RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning.
https://github.com/RLinf/RLinf

agentic-ai ai-infra embodied-ai large-language-models reinforcement-learning rl-infra rlinf vla-rl

Last synced: 4 months ago
JSON representation

RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning.

Host: GitHub
URL: https://github.com/RLinf/RLinf
Owner: RLinf
License: apache-2.0
Created: 2025-08-14T03:52:19.000Z (5 months ago)
Default Branch: main
Last Pushed: 2025-09-04T05:56:32.000Z (4 months ago)
Last Synced: 2025-09-04T07:28:05.437Z (4 months ago)
Topics: agentic-ai, ai-infra, embodied-ai, large-language-models, reinforcement-learning, rl-infra, rlinf, vla-rl
Language: Python
Homepage: https://rlinf.readthedocs.io/en/latest/
Size: 776 KB
Stars: 323
Watchers: 3
Forks: 35
Open Issues: 12
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-production-machine-learning - RLinf - RLinf is a reinforcement learning library. (Industry Strength Reinforcement Learning)

README

          


  

















  _{RLinf: Reinforcement Learning Infrastructure for Agentic AI}



RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models via reinforcement learning. The 'inf' in RLinf stands for `Infrastructure`, highlighting its role as a robust backbone for next-generation training. It also stands for `Infinite`, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.



  



## What's NEW!

- [2025/08] RLinf is open-sourced. The formal v0.1 will be released soon. The paper [RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation]() will also be released accordingly. 

## Key Features

**RLinf is unique with:**

- Macro-to-Micro Flow: a new paradigm M2Flow, which executes macro-level logical flows through micro-level execution flows, decoupling logical workflow construction (programmable) from physical communication and scheduling (efficiency).

- Flexible Execution Modes

  - Collocated mode: shares all GPUs across all workers.

  - Disaggregated mode: enables fine-grained pipelining.

  - Hybrid mode: a customizable combination of different placement modes, integrating both collocated and disaggregated modes.

- Auto-scheduling Strategy: automatically selects the most suitable execution mode based on the training workload, without the need for manual resource allocation.

  

- Embodied Agent Support

  - Fast adaptation support for mainstream VLA models: [OpenVLA](https://github.com/openvla/openvla), [OpenVLA-OFT](https://github.com/moojink/openvla-oft), and [π₀](https://github.com/Physical-Intelligence/openpi).

  - Support for mainstream CPU & GPU-based simulators via standardized RL interfaces: [ManiSkill3](https://github.com/haosulab/ManiSkill), [LIBERO](https://github.com/Lifelong-Robot-Learning/LIBERO).

  - Enabling the first RL fine-tuning of the $\pi_0$ model family with a flow-matching action expert.

**RLinf is fast with:**

- Hybrid mode with fine-grained pipelining: achieves a **120%+** throughput improvement compared to other frameworks.

- Automatic Online Scaling Strategy: dynamically scales training resources, with GPU switching completed within seconds, further improving efficiency by 20–40% while preserving the on-policy nature of RL algorithms.

**RLinf is flexible and easy to use with:**

- Multiple Backend Integrations

  - FSDP + Hugging Face: rapid adaptation to new models and algorithms, ideal for beginners and fast prototyping.

  - Megatron + SGLang: optimized for large-scale training, delivering maximum efficiency for expert users with demanding workloads.

- Adaptive communication via the asynchronous communication channel

- Built-in support for popular RL methods, including [PPO](https://arxiv.org/abs/1707.06347), [GRPO](https://arxiv.org/abs/2402.03300), [DAPO](https://arxiv.org/abs/2503.14476), [Reinforce++](https://arxiv.org/abs/2501.03262), and more.

## Roadmap

### 1. System-Level Enhancements

- [ ] Support for heterogeneous GPUs  

- [ ] Support for asynchronous pipeline execution  

- [ ] Support for Mixture of Experts (MoE)  

- [ ] Support for vLLM inference backend

### 2. Application-Level Extensions

- [ ] Support for Vision-Language Models (VLMs) training  

- [ ] Support for deep searcher agent training  

- [ ] Support for multi-agent training  

- [ ] Support for integration with more embodied simulators (e.g., [Meta-World](https://github.com/Farama-Foundation/Metaworld), [GENESIS](https://github.com/Genesis-Embodied-AI/Genesis))  

- [ ] Support for more Vision Language Action models (VLAs), such as [GR00T](https://github.com/NVIDIA/Isaac-GR00T)

- [ ] Support for world model   

- [ ] Support for real-world RL embodied intelligence

## Getting Started 

Complete documentation for RLinf can be found [**Here**](https://rlinf.readthedocs.io/en/latest/).

**Quickstart**

  - [Installation](https://rlinf.readthedocs.io/en/latest/rst_source/start/installation.html)

  - [Quickstart 1: PPO Training of VLAs on Maniskill3](https://rlinf.readthedocs.io/en/latest/rst_source/start/vla.html)

  - [Quickstart 2: GRPO Training of LLMs on MATH](https://rlinf.readthedocs.io/en/latest/rst_source/start/llm.html)

  - [Multi-node Training](https://rlinf.readthedocs.io/en/latest/rst_source/start/distribute.html)

  - [Model Evaluation](https://rlinf.readthedocs.io/en/latest/rst_source/start/eval.html)

**Key Design**

  - [Unified User Interface Usage](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/user/index.html)

  - [Flexible Execution Modes](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/mode/index.html)

  - [Enable Automatic Scheduling](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/scheduler/index.html)

  - [Elastic Communication](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/communication/index.html)

**Example Gallery**

  - [Embodied Intelligence Vision-Language-Action Model training](https://rlinf.readthedocs.io/en/latest/rst_source/examples/embodied.html)

  - [Math Reasoning Model Training](https://rlinf.readthedocs.io/en/latest/rst_source/examples/reasoning.html)

**Advanced Features**

  - [5D Parallelism Configuration for Megatron-LM](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/advance/5D.html)

  - [LoRA Integration for efficient fine-tuning](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/advance/lora.html)

  - [Switch between different versions of SGLang](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/advance/version.html)

  - [Checkpoint Resume and Recovery Support](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/advance/resume.html)

**Extending The Framework:**

  - [Adding new Environments](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/extend/new_env.html)

  - [Adding new Models with FSDP+Huggingface backend](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/extend/new_model_fsdp.html)

  - [Adding new Models with Megatron+SGLang backend](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/extend/new_model_megatron.html)

**Blogs**

  - [Comparison with VeRL](https://rlinf.readthedocs.io/en/latest/rst_source/blog/compare_with_verl.html)

## Build Status

| Type             | Status |

| :--------------: | :----: |

| Reasoning RL-MATH | [![Build Status](https://github.com/RLinf/RLinf/actions/workflows/math_e2e.yml/badge.svg)](https://github.com/RLinf/RLinf/actions/workflows/math_e2e.yml) |

| Embodied RL-VLA   | [![Build Status](https://github.com/RLinf/RLinf/actions/workflows/embodied_e2e.yml/badge.svg)](https://github.com/RLinf/RLinf/actions/workflows/embodied_e2e.yml) |

## Contribution Guidelines

We welcome contributions to RLinf. Please read [contribution guide](https://rlinf.readthedocs.io/en/latest/index.html#contribution-guidelines) before taking action.

## Citation and Acknowledgement

If you find **RLinf** helpful, please cite the GitHub repository:

```bibtex

@misc{RLinf_repo,

  title        = {RLinf: Reinforcement Learning Infrastructure for Agentic AI},

  howpublished = {\url{https://github.com/RLinf/RLinf}},

  note         = {GitHub repository},

  year         = {2025}

}

```

**Paper**: A full paper describing RLinf will be released by **September 20, 2025**. We will update this section with the official citation and BibTeX when they become available.

**Acknowledgements**

RLinf has been inspired by, and benefits from, the ideas and tooling of the broader open-source community.

In particular, we would like to thank the teams and contributors behind VeRL, AReaL, Megatron-LM, SGLang, and PyTorch Fully Sharded Data Parallel (FSDP), and if we have inadvertently missed your project or contribution, please open an issue or a pull request so we can properly credit you.

**Contact:**

We welcome applications from Postdocs, PhD/Master's students, and interns. Join us in shaping the future of RL infrastructure and embodied AI!

- Chao Yu: zoeyuchao@gmail.com

- Yu Wang: yu-wang@tsinghua.edu.cn

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/RLinf/RLinf

Awesome Lists containing this project

README

_{RLinf: Reinforcement Learning Infrastructure for Agentic AI}