https://github.com/RLinf/RLinf
RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning.
https://github.com/RLinf/RLinf
agentic-ai ai-infra embodied-ai large-language-models reinforcement-learning rl-infra rlinf vla-rl
Last synced: 4 months ago
JSON representation
RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning.
- Host: GitHub
- URL: https://github.com/RLinf/RLinf
- Owner: RLinf
- License: apache-2.0
- Created: 2025-08-14T03:52:19.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-09-04T05:56:32.000Z (4 months ago)
- Last Synced: 2025-09-04T07:28:05.437Z (4 months ago)
- Topics: agentic-ai, ai-infra, embodied-ai, large-language-models, reinforcement-learning, rl-infra, rlinf, vla-rl
- Language: Python
- Homepage: https://rlinf.readthedocs.io/en/latest/
- Size: 776 KB
- Stars: 323
- Watchers: 3
- Forks: 35
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-production-machine-learning - RLinf - RLinf is a reinforcement learning library. (Industry Strength Reinforcement Learning)
README
RLinf: Reinforcement Learning Infrastructure for Agentic AI
RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models via reinforcement learning. The 'inf' in RLinf stands for `Infrastructure`, highlighting its role as a robust backbone for next-generation training. It also stands for `Infinite`, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.
## What's NEW!
- [2025/08] RLinf is open-sourced. The formal v0.1 will be released soon. The paper [RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation]() will also be released accordingly.
## Key Features
**RLinf is unique with:**
- Macro-to-Micro Flow: a new paradigm M2Flow, which executes macro-level logical flows through micro-level execution flows, decoupling logical workflow construction (programmable) from physical communication and scheduling (efficiency).
- Flexible Execution Modes
- Collocated mode: shares all GPUs across all workers.
- Disaggregated mode: enables fine-grained pipelining.
- Hybrid mode: a customizable combination of different placement modes, integrating both collocated and disaggregated modes.
- Auto-scheduling Strategy: automatically selects the most suitable execution mode based on the training workload, without the need for manual resource allocation.
- Embodied Agent Support
- Fast adaptation support for mainstream VLA models: [OpenVLA](https://github.com/openvla/openvla), [OpenVLA-OFT](https://github.com/moojink/openvla-oft), and [π₀](https://github.com/Physical-Intelligence/openpi).
- Support for mainstream CPU & GPU-based simulators via standardized RL interfaces: [ManiSkill3](https://github.com/haosulab/ManiSkill), [LIBERO](https://github.com/Lifelong-Robot-Learning/LIBERO).
- Enabling the first RL fine-tuning of the $\pi_0$ model family with a flow-matching action expert.
**RLinf is fast with:**
- Hybrid mode with fine-grained pipelining: achieves a **120%+** throughput improvement compared to other frameworks.
- Automatic Online Scaling Strategy: dynamically scales training resources, with GPU switching completed within seconds, further improving efficiency by 20–40% while preserving the on-policy nature of RL algorithms.
**RLinf is flexible and easy to use with:**
- Multiple Backend Integrations
- FSDP + Hugging Face: rapid adaptation to new models and algorithms, ideal for beginners and fast prototyping.
- Megatron + SGLang: optimized for large-scale training, delivering maximum efficiency for expert users with demanding workloads.
- Adaptive communication via the asynchronous communication channel
- Built-in support for popular RL methods, including [PPO](https://arxiv.org/abs/1707.06347), [GRPO](https://arxiv.org/abs/2402.03300), [DAPO](https://arxiv.org/abs/2503.14476), [Reinforce++](https://arxiv.org/abs/2501.03262), and more.
## Roadmap
### 1. System-Level Enhancements
- [ ] Support for heterogeneous GPUs
- [ ] Support for asynchronous pipeline execution
- [ ] Support for Mixture of Experts (MoE)
- [ ] Support for vLLM inference backend
### 2. Application-Level Extensions
- [ ] Support for Vision-Language Models (VLMs) training
- [ ] Support for deep searcher agent training
- [ ] Support for multi-agent training
- [ ] Support for integration with more embodied simulators (e.g., [Meta-World](https://github.com/Farama-Foundation/Metaworld), [GENESIS](https://github.com/Genesis-Embodied-AI/Genesis))
- [ ] Support for more Vision Language Action models (VLAs), such as [GR00T](https://github.com/NVIDIA/Isaac-GR00T)
- [ ] Support for world model
- [ ] Support for real-world RL embodied intelligence
## Getting Started
Complete documentation for RLinf can be found [**Here**](https://rlinf.readthedocs.io/en/latest/).
**Quickstart**
- [Installation](https://rlinf.readthedocs.io/en/latest/rst_source/start/installation.html)
- [Quickstart 1: PPO Training of VLAs on Maniskill3](https://rlinf.readthedocs.io/en/latest/rst_source/start/vla.html)
- [Quickstart 2: GRPO Training of LLMs on MATH](https://rlinf.readthedocs.io/en/latest/rst_source/start/llm.html)
- [Multi-node Training](https://rlinf.readthedocs.io/en/latest/rst_source/start/distribute.html)
- [Model Evaluation](https://rlinf.readthedocs.io/en/latest/rst_source/start/eval.html)
**Key Design**
- [Unified User Interface Usage](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/user/index.html)
- [Flexible Execution Modes](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/mode/index.html)
- [Enable Automatic Scheduling](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/scheduler/index.html)
- [Elastic Communication](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/communication/index.html)
**Example Gallery**
- [Embodied Intelligence Vision-Language-Action Model training](https://rlinf.readthedocs.io/en/latest/rst_source/examples/embodied.html)
- [Math Reasoning Model Training](https://rlinf.readthedocs.io/en/latest/rst_source/examples/reasoning.html)
**Advanced Features**
- [5D Parallelism Configuration for Megatron-LM](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/advance/5D.html)
- [LoRA Integration for efficient fine-tuning](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/advance/lora.html)
- [Switch between different versions of SGLang](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/advance/version.html)
- [Checkpoint Resume and Recovery Support](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/advance/resume.html)
**Extending The Framework:**
- [Adding new Environments](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/extend/new_env.html)
- [Adding new Models with FSDP+Huggingface backend](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/extend/new_model_fsdp.html)
- [Adding new Models with Megatron+SGLang backend](https://rlinf.readthedocs.io/en/latest/rst_source/tutorials/extend/new_model_megatron.html)
**Blogs**
- [Comparison with VeRL](https://rlinf.readthedocs.io/en/latest/rst_source/blog/compare_with_verl.html)
## Build Status
| Type | Status |
| :--------------: | :----: |
| Reasoning RL-MATH | [](https://github.com/RLinf/RLinf/actions/workflows/math_e2e.yml) |
| Embodied RL-VLA | [](https://github.com/RLinf/RLinf/actions/workflows/embodied_e2e.yml) |
## Contribution Guidelines
We welcome contributions to RLinf. Please read [contribution guide](https://rlinf.readthedocs.io/en/latest/index.html#contribution-guidelines) before taking action.
## Citation and Acknowledgement
If you find **RLinf** helpful, please cite the GitHub repository:
```bibtex
@misc{RLinf_repo,
title = {RLinf: Reinforcement Learning Infrastructure for Agentic AI},
howpublished = {\url{https://github.com/RLinf/RLinf}},
note = {GitHub repository},
year = {2025}
}
```
**Paper**: A full paper describing RLinf will be released by **September 20, 2025**. We will update this section with the official citation and BibTeX when they become available.
**Acknowledgements**
RLinf has been inspired by, and benefits from, the ideas and tooling of the broader open-source community.
In particular, we would like to thank the teams and contributors behind VeRL, AReaL, Megatron-LM, SGLang, and PyTorch Fully Sharded Data Parallel (FSDP), and if we have inadvertently missed your project or contribution, please open an issue or a pull request so we can properly credit you.
**Contact:**
We welcome applications from Postdocs, PhD/Master's students, and interns. Join us in shaping the future of RL infrastructure and embodied AI!
- Chao Yu: zoeyuchao@gmail.com
- Yu Wang: yu-wang@tsinghua.edu.cn