{"id":28737393,"url":"https://github.com/thinkwee/agentsmeetrl","last_synced_at":"2025-06-16T02:10:39.690Z","repository":{"id":298284264,"uuid":"998903307","full_name":"thinkwee/AgentsMeetRL","owner":"thinkwee","description":"An Awesome List of Reinforcement Learning-based Large Language Agent Works. Collect directly from official code base.","archived":false,"fork":false,"pushed_at":"2025-06-10T09:24:21.000Z","size":2461,"stargazers_count":12,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-10T10:35:41.705Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thinkwee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-09T12:42:36.000Z","updated_at":"2025-06-10T09:24:25.000Z","dependencies_parsed_at":"2025-06-10T10:47:35.790Z","dependency_job_id":null,"html_url":"https://github.com/thinkwee/AgentsMeetRL","commit_stats":null,"previous_names":["thinkwee/agentsmeetrl"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/thinkwee/AgentsMeetRL","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thinkwee%2FAgentsMeetRL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thinkwee%2FAgentsMeetRL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thinkwee%2FAgentsMeetRL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thinkwee%2FAgentsMeetRL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thinkwee","download_url":"https://codeload.github.com/thinkwee/AgentsMeetRL/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thinkwee%2FAgentsMeetRL/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260083859,"owners_count":22956409,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-16T02:10:39.195Z","updated_at":"2025-06-16T02:10:39.678Z","avatar_url":"https://github.com/thinkwee.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"logo.png\" alt=\"NOVER Logo\" width=\"500\"\u003e\n\u003c/div\u003e\n\n# When LLM Agents Meet Reinforcement Learning\n\nThis is an awesome list that summarizes open-source repositories for training LLM Agents using reinforcement learning:\n - The criteria for identifying an agent project is that it must have at least one of the following: multi-turn interactions or tool use.\n - This project is based on code analysis from open-source repositories using GitHub Copilot Agent, which may contain unfaithful cases. Although manually reviewed, there may still be omissions. If you find any errors, please don't hesitate to let us know immediately through issues or PRs - we warmly welcome them!\n - We particularly focus on the reinforcement learning frameworks, RL algorithms, rewards, and environments that projects depend on, for everyone's reference on how these excellent open-source projects make their technical choices.\n - Enumeration for Reward Type: External Verifier/Simple Rule/Model Based/Custom.\n - Feel free to submit your own projects anytime - we welcome contributions!\n\n---\n\n## Base Framework\n| Github Repo | Stars | Date | Org | Paper Link | RL Framework |  RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |\n| :----: | :----: | :----: |  :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| [verl](https://github.com/volcengine/verl) | ![](https://img.shields.io/github/stars/volcengine/verl.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2024-10 | ByteDance | [paper](https://arxiv.org/pdf/2409.19256) | veRL | PPO/GRPO | Single | Outcome | Both | Math/QA/Reasoning/Search | All | Yes |\n| [trl](https://github.com/huggingface/trl) | ![](https://img.shields.io/github/stars/huggingface/trl.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2019-11 | HuggingFace | -- | TRL | PPO/GRPO/DPO | Single | Both | Single | QA | Custom | No |\n| [verifiers](https://github.com/willccbb/verifiers) | ![](https://img.shields.io/github/stars/willccbb/verifiers.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-03 | Individual | -- | HuggingFace | GRPO | Multi | Outcome | Both | Reasoning/Math/Code | All | Code |\n| [oat](https://github.com/sail-sg/oat) | ![](https://img.shields.io/github/stars/sail-sg/oat.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2024-11 | NUS/Sea AI | [paper](https://arxiv.org/pdf/2411.01493) | Custom | PPO/GRPO | Single | Outcome | Multi | Math/Alignment | External | No |\n| [ROLL](https://github.com/alibaba/ROLL) | ![](https://img.shields.io/github/stars/alibaba/ROLL.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-06 | Alibaba | [paper](https://arxiv.org/pdf/2506.06122) | Custom | PPO/GRPO/Reinforce++/TOPR/RAFT++ | Multi | Both | Multi | Math/QA/Code/Alignment | All | Yes |\n| [MARTI](https://github.com/TsinghuaC3I/MARTI) | ![](https://img.shields.io/github/stars/TsinghuaC3I/MARTI.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-05 | Tsinghua | -- | Custom | PPO/GRPO/REINFORCE++/TTRL | Multi | Both | Multi | Math | All | Yes |\n| [AReaL](https://github.com/inclusionAI/AReaL) | ![](https://img.shields.io/github/stars/inclusionAI/AReaL.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-06 | AntGroup/Tsinghua | [paper](https://arxiv.org/pdf/2505.24298) | Custom | PPO | Both | Outcome | Both | Math/Code | External | Yes |\n\n## Search/Research/Web\n| Github Repo | Stars | Date | Org | Paper Link | RL Framework |  RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |\n| :----: | :----: | :----: |  :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| [WebThinker](https://github.com/RUC-NLPIR/WebThinker) | ![](https://img.shields.io/github/stars/RUC-NLPIR/WebThinker.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-04 | RUC | [paper](https://arxiv.org/pdf/2504.21776) | Custom | DPO | Single | Outcome | Multi | Reasoning/QA/Research | Model/External | Web Browsing |\n| [DeepResearcher](https://github.com/GAIR-NLP/DeepResearcher) | ![](https://img.shields.io/github/stars/GAIR-NLP/DeepResearcher.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-04 | SJTU | [paper](https://arxiv.org/pdf/2504.03160) | veRL | PPO/GRPO | Multi | Outcome | Multi | Research | All | Yes |\n| [Search-R1](https://github.com/PeterGriffinJin/Search-R1) | ![](https://img.shields.io/github/stars/PeterGriffinJin/Search-R1.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-03 | UIUC/Google | [paper1](https://arxiv.org/pdf/2503.09516), [paper2](https://arxiv.org/pdf/2505.15117) | veRL | PPO/GRPO | Single | Outcome | Multi | Search | All | Search |\n| [AutoRefine](https://github.com/syr-cn/AutoRefine) | ![](https://img.shields.io/github/stars/syr-cn/AutoRefine.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-05 | USTC | [paper](https://www.arxiv.org/pdf/2505.11277) | veRL | PPO/GRPO | Multi | Both | Multi | RAG QA | Rule | Search |\n| [StepSearch](https://github.com/Zillwang/StepSearch) | ![](https://img.shields.io/github/stars/Zillwang/StepSearch.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-05 | SenseTime | [paper](https://arxiv.org/pdf/2505.15107) | veRL | PPO | Single | Process | Multi | QA | Model | Search |\n| [R1-Searcher-plus](https://github.com/RUCAIBox/R1-Searcher-plus) | ![](https://img.shields.io/github/stars/RUCAIBox/R1-Searcher-plus.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-05 | RUC | [paper](https://arxiv.org/pdf/2505.17005) | Custom | Custom | Single | Outcome | Multi | Search | Model | Search |\n| [ZeroSearch](https://github.com/Alibaba-NLP/ZeroSearch) | ![](https://img.shields.io/github/stars/Alibaba-NLP/ZeroSearch.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-05 | Alibaba |[paper](https://arxiv.org/pdf/2505.04588) | veRL | PPO/GRPO/REINFORCE | Single | Outcome | Multi | QA/Search | Rule | Yes |\n| [R1-Searcher](https://github.com/RUCAIBox/R1-Searcher) | ![](https://img.shields.io/github/stars/RUCAIBox/R1-Searcher.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-03 | RUC | [paper](https://arxiv.org/pdf/2503.05592) | OpenRLHF | PPO/DPO | Single | Both | Multi | Search | All | Yes |\n| [WebAgent](https://github.com/Alibaba-NLP/WebAgent) | ![](https://img.shields.io/github/stars/Alibaba-NLP/WebAgent.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-01 | Alibaba | [paper1](https://arxiv.org/pdf/2501.07572), [paper2](https://arxiv.org/pdf/2505.22648) | LLaMA-Factory | DAPO | Multi | Process | Multi | Web | Model | Yes |\n| [R-Search](https://github.com/QingFei1/R-Search) | ![](https://img.shields.io/github/stars/QingFei1/R-Search.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-06 | Individual | -- | veRL | PPO/GRPO | Single | Both | Multi | QA/Search | All | Yes |\n| [TTI](https://github.com/test-time-interaction/TTI) | ![](https://img.shields.io/github/stars/test-time-interaction/TTI.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-06 | CMU | [paper](https://arxiv.org/abs/2506.07976) | Custom | REINFORCE/BC | Single | Outcome | Multi | Web | External | Web Browsing |\n\n\n## GUI\n| Github Repo | Stars | Date | Org | Paper Link | RL Framework |  RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |\n| :----: | :----: | :----: |  :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| [GUI-R1](https://github.com/ritzz-ai/GUI-R1) | ![](https://img.shields.io/github/stars/ritzz-ai/GUI-R1.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-04 | CAS/NUS | [paper](https://arxiv.org/pdf/2504.10458) | veRL | GRPO | Single | Outcome | Multi | GUI | Rule | No |\n| [UI-R1](https://github.com/lll6gg/UI-R1) | ![](https://img.shields.io/github/stars/lll6gg/UI-R1.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-03 | vivo/CUHK | [paper](https://arxiv.org/pdf/2503.21620) | TRL | GRPO | Single | Process | Both | GUI | Rule | Computer/Phone Use |\n| [AgentCPM-GUI](https://github.com/OpenBMB/AgentCPM-GUI) | ![](https://img.shields.io/github/stars/OpenBMB/AgentCPM-GUI.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-06 | OpenBMB/Tsinghua/RUC | [paper](https://arxiv.org/pdf/2506.01391) | Huggingface | GRPO | Single | Outcome | Multi | Mobile GUI | Model | Yes |\n| [GUI-G1](https://github.com/Yuqi-Zhou/GUI-G1) | ![](https://img.shields.io/github/stars/Yuqi-Zhou/GUI-G1.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-05 | RUC | [paper](https://arxiv.org/pdf/2505.15810) | TRL | GRPO | Single | Outcome | Single | GUI | Rule/External | No |\n| [ARPO](https://github.com/dvlab-research/ARPO) | ![](https://img.shields.io/github/stars/dvlab-research/ARPO.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-05 | CUHK/HKUST | [paper](https://arxiv.org/pdf/2505.16282) | veRL | GRPO | Single | Outcome | Multi | GUI | External | Computer Use |\n\n\n## Tool\n| Github Repo | Stars | Date | Org | Paper Link | RL Framework |  RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |\n| :----: | :----: | :----: |  :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| [ReTool](https://github.com/ReTool-RL/ReTool) | ![](https://img.shields.io/github/stars/ReTool-RL/ReTool.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-04 | ByteDance | [paper](https://arxiv.org/pdf/2504.11536) | veRL | PPO | Single | Outcome | Multi | Math | External | Code |\n| [Tool-N1](https://github.com/NVlabs/Tool-N1) | ![](https://img.shields.io/github/stars/NVlabs/Tool-N1.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-05 | NVIDIA | [paper](https://arxiv.org/pdf/2505.00024) | veRL | PPO | Single | Outcome | Multi | Math/Dialogue | All | Yes |\n| [Tool-Star](https://github.com/dongguanting/Tool-Star) | ![](https://img.shields.io/github/stars/dongguanting/Tool-Star.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-05 | RUC | [paper](https://arxiv.org/pdf/2505.16410) | LLaMA-Factory | PPO/DPO/ORPO/SimPO/KTO | Single | Outcome | Multi | Multi-modal/Tool Use/Dialogue | Model/External | Yes |\n| [verl-tool](https://github.com/TIGER-AI-Lab/verl-tool) | ![](https://img.shields.io/github/stars/TIGER-AI-Lab/verl-tool.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-06 | TIGER-Lab | [X](https://x.com/DongfuJiang/status/1929198238017720379) | veRL | PPO/GRPO | Single | Both | Both | Math/Code | Rule/External | Yes |\n| [RL-Factory](https://github.com/Simple-Efficient/RL-Factory) | ![](https://img.shields.io/github/stars/Simple-Efficient/RL-Factory.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-05 | Simple-Efficient | [model](https://huggingface.co/Simple-Efficient/RLFactory-Qwen3-8B-GRPO) | veRL | GRPO | Multi | Both | Multi | Tool-use/NL2SQL | All | MCP |\n| [Agent-R1](https://github.com/0russwest0/Agent-R1) | ![](https://img.shields.io/github/stars/0russwest0/Agent-R1.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-03 | USTC | -- | veRL | PPO/GRPO | Single | Both | Multi | Tool-use/QA | Model | Yes |\n| [Multi-Turn-RL-Agent](https://github.com/SiliangZeng/Multi-Turn-RL-Agent) | ![](https://img.shields.io/github/stars/SiliangZeng/Multi-Turn-RL-Agent.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-05 | University of Minnesota | [paper](https://arxiv.org/pdf/2505.11821) | Custom | GRPO | Single | Both | Multi | Tool-use/Math | Rule/External | Yes |\n| [ReCall](https://github.com/Agent-RL/ReCall) | ![](https://img.shields.io/github/stars/Agent-RL/ReCall.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-03 | BaiChuan | [paper](https://arxiv.org/pdf/2503.19470) | veRL | PPO/GRPO/RLOO/REINFORCE++/ReMax | Single | Outcome | Multi | Tool-use/Math/QA | All | Yes |\n\n\n## TextGame\n| Github Repo | Stars | Date | Org | Paper Link | RL Framework |  RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |\n| :----: | :----: | :----: |  :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| [ART](https://github.com/OpenPipe/ART) | ![](https://img.shields.io/github/stars/OpenPipe/ART.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-03 | OpenPipe | [paper](https://github.com/OpenPipe/ART#-citation) | TRL | GRPO | Multi | Both | Multi | TextGame | All | Yes |\n| [verl-agent](https://github.com/langfengQ/verl-agent) | ![](https://img.shields.io/github/stars/langfengQ/verl-agent.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-05 | NTU/Skywork | [paper](https://arxiv.org/pdf/2505.10978) | veRL | PPO/GRPO/GiGPO/DAPO/RLOO/REINFORCE++ | Multi | Both | Multi | Phone Use/Math/Code/Web/TextGame | All | Yes |\n| [SPA-RL-Agent](https://github.com/WangHanLinHenry/SPA-RL-Agent) | ![](https://img.shields.io/github/stars/WangHanLinHenry/SPA-RL-Agent.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-05 | PolyU | [paper](https://arxiv.org/pdf/2505.20732) | TRL | PPO | Single | Process | Multi | Navigation/Web/TextGame | Model | No |\n| [Trinity-RFT](https://github.com/modelscope/Trinity-RFT) | ![](https://img.shields.io/github/stars/modelscope/Trinity-RFT.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-05 | Alibaba | [paper](https://arxiv.org/pdf/2505.17826) | veRL | PPO/GRPO | Single | Outcome | Both | Math/TextGame/Web | All | Yes |\n| [RAGEN](https://github.com/RAGEN-AI/RAGEN) | ![](https://img.shields.io/github/stars/RAGEN-AI/RAGEN.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-01 | RAGEN-AI | [paper](https://arxiv.org/pdf/2504.20073) | veRL |PPO/GRPO | Single | Both | Multi | TextGame | All | Yes |\n| [VAGEN](https://github.com/RAGEN-AI/VAGEN) | ![](https://img.shields.io/github/stars/RAGEN-AI/VAGEN.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-03 | RAGEN-AI | [paper](https://www.notion.so/VAGEN-Training-VLM-Agents-with-Multi-Turn-Reinforcement-Learning-1bfde13afb6e80b792f6d80c7c2fcad0) | veRL | PPO/GRPO | Single | Both | Multi | TextGame/Navigation | All | Yes |\n| [OpenManus-RL](https://github.com/OpenManus/OpenManus-RL) | ![](https://img.shields.io/github/stars/OpenManus/OpenManus-RL.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-03 | UIUC/MetaGPT | -- | Custom | PPO/DPO/GRPO | Multi | Outcome | Multi | TextGame | All | Yes |\n\n\n## QA(Reasoning/Math/Code)\n| Github Repo | Stars | Date | Org | Paper Link | RL Framework |  RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |\n| :----: | :----: | :----: |  :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| [sweet_rl](https://github.com/facebookresearch/sweet_rl) | ![](https://img.shields.io/github/stars/facebookresearch/sweet_rl.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-03 | Meta/UCB | [paper](https://arxiv.org/pdf/2503.15478) | OpenRLHF | DPO | Multi | Process | Multi | Design/Code | Model | Web Browsing |\n| [Agentic-Reasoning](https://github.com/theworldofagents/Agentic-Reasoning) | ![](https://img.shields.io/github/stars/theworldofagents/Agentic-Reasoning.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-02 | Oxford | [paper](https://arxiv.org/pdf/2502.04644) | Custom | Custom | Single | Process | Multi | QA/Math | External | Web Browsing |\n| [SkyRL](https://github.com/NovaSky-AI/SkyRL) | ![](https://img.shields.io/github/stars/NovaSky-AI/SkyRL.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-04 | NovaSky | -- | veRL | PPO/GRPO | Single | Outcome | Multi | Math/Code | All | Code |\n| [openrlhf_async_pipline](https://github.com/yyht/openrlhf_async_pipline) | ![](https://img.shields.io/github/stars/yyht/openrlhf_async_pipline.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2024-05 | OpenRLHF | [paper](https://arxiv.org/pdf/2405.11143) | OpenRLHF | PPO/REINFORCE++/DPO/RLOO | Single | Outcome | Multi | Dialogue/Reasoning/QA | All | No |\n| [Time-R1](https://github.com/ulab-uiuc/Time-R1) | ![](https://img.shields.io/github/stars/ulab-uiuc/Time-R1.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-05 | UIUC | [paper](https://arxiv.org/pdf/2505.13508) | veRL | PPO/GRPO/DPO | Multi | Outcome | Multi | Temporal | All | Code |\n| [agent-distillation](https://github.com/Nardien/agent-distillation) | ![](https://img.shields.io/github/stars/Nardien/agent-distillation.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-05 | KAIST | [paper](https://arxiv.org/pdf/2505.17612) | Custom | PPO | Single | Process | Multi | QA/Math | External | Yes |\n| [VDeepEyes](https://github.com/Visual-Agent/DeepEyes) | ![](https://img.shields.io/github/stars/Visual-Agent/DeepEyes.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-05 | Xiaohongshu/XJTU | [paper](https://arxiv.org/pdf/2505.14362) | veRL | PPO/GRPO | Multi | Process | Multi | VQA | All | Yes |\n| [ML-Agent](https://github.com/MASWorks/ML-Agent) | ![](https://img.shields.io/github/stars/MASWorks/ML-Agent.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-05 | MASWorks | [paper](https://arxiv.org/pdf/2505.23723) | Custom | Custom | Single | Process | Multi | Code | All | Yes |\n| [CURE](https://github.com/Gen-Verse/CURE) | ![](https://img.shields.io/github/stars/Gen-Verse/CURE.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-06 | University of Chicago/Princeton/ByteDance | [paper](https://arxiv.org/pdf/2506.03136) | Huggingface | PPO | Single | Outcome | Single | Code | External | No |\n| [MedAgentGym](https://github.com/wshi83/MedAgentGym) | ![](https://img.shields.io/github/stars/wshi83/MedAgentGym.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-06 | Emory/Georgia Tech | [paper](https://arxiv.org/pdf/2506.04405) | Hugginface | SFT/DPO/PPO/GRPO | Single | Outcome | Multi | Medical/Code | External | Yes |\n| [open-r1](https://github.com/huggingface/open-r1) | ![](https://img.shields.io/github/stars/huggingface/open-r1.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-01 | HuggingFace | -- | TRL | GRPO | Single | Outcome | Single | Math/Code | All | Yes |\n| [EasyR1](https://github.com/hiyouga/EasyR1) | ![](https://img.shields.io/github/stars/hiyouga/EasyR1.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-04 | Individual | [repo1](https://github.com/hiyouga/EasyR1)/[paper2](https://arxiv.org/pdf/2409.19256) | veRL | GRPO | Single | Process | Multi | Vision-Language | Model | Yes |\n| [MASLab](https://github.com/MASWorks/MASLab) | ![](https://img.shields.io/github/stars/MASWorks/MASLab.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-05 | MASWorks | [paper](https://arxiv.org/pdf/2505.16988) | Custom | NO RL | Multi | Outcome | Multi | Code/Math/Reasoning | External | Yes |\n| [AutoCoA](https://github.com/ADaM-BJTU/AutoCoA) | ![](https://img.shields.io/github/stars/ADaM-BJTU/AutoCoA.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-03 | BJTU | [paper](https://arxiv.org/pdf/2503.06580) | veRL | GRPO | Multi | Outcome | Multi | Reasoning/Math/QA | All | Yes |\n| [ToRL](https://github.com/GAIR-NLP/ToRL) | ![](https://img.shields.io/github/stars/GAIR-NLP/ToRL.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-03 | SJTU | [paper](https://arxiv.org/pdf/2503.23383) | veRL | GRPO | Single | Outcome | Single | Math | Rule/External | Yes |\n\n\n\n## Environment\n| Github Repo | Stars | Date | Org | Paper Link | RL Framework |  RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |\n| :----: | :----: | :----: |  :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| [InternBootcamp](https://github.com/InternLM/InternBootcamp) | ![](https://img.shields.io/github/stars/InternLM/InternBootcamp.svg?color=F4B0A5\u0026logo=Undertale\u0026logoColor=FB6571) | 2025-04 | InternBootcamp | -- | Environment | -- | -- | -- | -- | Reasoning/Code/Puzzle/Algorithm/TextGame | Rule/External | yes |\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthinkwee%2Fagentsmeetrl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthinkwee%2Fagentsmeetrl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthinkwee%2Fagentsmeetrl/lists"}