An open API service indexing awesome lists of open source software.

https://github.com/necolizer/awesome-rl-for-agents

A curated list of reinforcement learning (RL) for agents.
https://github.com/necolizer/awesome-rl-for-agents

List: awesome-rl-for-agents

agents ai-agent awesome awesome-list reinforcement-learning

Last synced: about 2 months ago
JSON representation

A curated list of reinforcement learning (RL) for agents.

Awesome Lists containing this project

README

        

# Awesome RL for Agents [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)

A curated list of reinforcement learning (RL) for agents.

> This list collects papers, tools, and demos that demonstrate how reinforcement learning can be applied to train or tune agents โ€” with a primary focus on computer-using agents (e.g. GUI, web, and MCP agents), and supplementary coverage of related topics within the broader scope of RL for agents.

---

## Table of Contents

- [๐Ÿ“š Papers & Research](#-papers--research)
- [๐Ÿ•น๏ธ Benchmarks](#-benchmarks)
- [๐Ÿงช Demos & Projects](#-demos--projects)
- [๐Ÿงฐ Toolkits & Frameworks](#-toolkits--frameworks)
- [๐Ÿ“„ Tutorials & Blog Posts](#-tutorials--blog-posts)
- [๐Ÿ”— Related Awesome Lists](#-related-awesome-lists)
- [๐Ÿค Contributing](#-contributing)

---

## ๐Ÿ“š Papers & Research
### RL for Computer-using Agents
- **UI-R1**: Enhancing Action Prediction of GUI Agents by Reinforcement Learning [[Preprint'25]](https://arxiv.org/abs//2503.21620) [[Code]](https://github.com/lll6gg/UI-R1)
- **Digi-Q**: Learning Q-Value Functions for Training Device-Control Agents [[Preprint'25]](https://arxiv.org/abs/2502.15760) [[Code]](https://github.com/DigiRL-agent/digiq)

### RL for Tool-using Problem Solver
- **Agent models**: Internalizing Chain-of-Action Generation into Reasoning models [[Preprint'25]](https://arxiv.org/abs/2503.06580) [[Code]](https://github.com/ADaM-BJTU/AutoCoA)
- **TORL**: Scaling Tool-Integrated RL [[Preprint'25]](https://arxiv.org/pdf/2503.23383) [[Code]](https://github.com/GAIR-NLP/ToRL)

### RL for Agent Planning
- **MPO**: Boosting LLM Agents with Meta Plan Optimization [[Preprint'25]](https://arxiv.org/abs/2503.02682) [[Code]](https://github.com/WeiminXiong/MPO)

### Reinforcement Learning Scaling
- **VAPO**: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks [[Preprint'25]](https://arxiv.org/abs/2504.05118)
- **DAPO**: An Open-Source LLM Reinforcement Learning System at Scale [[Preprint'25]](https://arxiv.org/abs/2503.14476v1) [[Code]](https://github.com/BytedTsinghua-SIA/DAPO)
- **LIMR**: Less is More for RL Scaling [[Preprint'25]](https://arxiv.org/abs/2502.11886) [[Code]](https://github.com/GAIR-NLP/LIMR)
- **DeepSeek-R1**: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [[Preprint'25]](https://arxiv.org/abs/2501.12948)
- **Kimi k1.5**: Scaling Reinforcement Learning with LLMs [[Preprint'25]](https://arxiv.org/abs/2501.12599)

## ๐Ÿ•น Benchmarks
- **ScreenSpot-Pro**: GUI Grounding for Professional High-Resolution Computer Use [[Paper]](https://likaixin2000.github.io/papers/ScreenSpot_Pro.pdf) [[Code]](https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding)
- **OSWorld**: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments [[NeurIPS'24]](https://proceedings.neurips.cc/paper_files/paper/2024/hash/5d413e48f84dc61244b6be550f1cd8f5-Abstract-Datasets_and_Benchmarks_Track.html) [[Code]](https://github.com/xlang-ai/OSWorld)
- **SeeClick**: Harnessing GUI Grounding for Advanced Visual GUI Agents [[ACL'24]](https://aclanthology.org/2024.acl-long.505.pdf) [[Code]](https://github.com/njucckevin/SeeClick)

## ๐Ÿงช Demos & Projects

### RL-based LLM agent tuning
- **OpenManus-RL** [[Code]](https://github.com/OpenManus/OpenManus-RL) & **OpenManus** [[Code]](https://github.com/mannaandpoem/OpenManus)
- **RAGEN**: Training Agents by Reinforcing Reasoning [[Code]](https://github.com/ZihanWang314/ragen)

### RL-based LLM tuning
- **Open-Reasoner-Zero**: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model [[Preprint'25]](https://arxiv.org/abs/2503.24290) [[Code]](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero)
- **simple_GRPO** [[Code]](https://github.com/lsdefine/simple_GRPO)

### MCP Agents
- **mcp-agent** [[Code]](https://github.com/lastmile-ai/mcp-agent)

## ๐Ÿงฐ Toolkits & Frameworks
- **verl**: Volcano Engine Reinforcement Learning for LLM [[Code]](https://github.com/volcengine/verl)

## ๐Ÿ“„ Tutorials & Blog Posts
> (Coming soon...)

## ๐Ÿ”— Related Awesome Lists
- **Awesome-Agent-RL** [[List]](https://github.com/0russwest0/Awesome-Agent-RL) - covering RL for research agents
- **awesome-ml-agents** [[List]](https://github.com/tokarev-i-v/awesome-llm-rl-agents) - covering rl and agents before 2023

## ๐Ÿค Contributing

Contributions are warmly welcome!

If you know a paper, tool, environment, or demo relevant to **RL for Agents**, feel free to open a pull request.

### Guidelines:
- Make sure the resource is publicly accessible and active.
- Use the same format as existing entries: `- **Name**: Title [Paper](link) [Code](link) โ€“ short description (optional).`
- Add entries under the most appropriate section.
- Avoid duplicates or resources that are already well-covered elsewhere.

We aim to keep this list high-quality, practical, and focused. Thank you for helping improve it! โœจ