{"id":29903781,"url":"https://github.com/rllm-org/rllm","last_synced_at":"2025-08-01T17:04:59.995Z","repository":{"id":276876014,"uuid":"922406522","full_name":"rllm-org/rllm","owner":"rllm-org","description":"Democratizing Reinforcement Learning for LLMs","archived":false,"fork":false,"pushed_at":"2025-07-28T21:03:35.000Z","size":92345,"stargazers_count":3913,"open_issues_count":103,"forks_count":364,"subscribers_count":30,"default_branch":"main","last_synced_at":"2025-07-28T23:08:51.945Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://www.agentica-project.com","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rllm-org.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/contributing.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-01-26T05:31:40.000Z","updated_at":"2025-07-28T21:03:40.000Z","dependencies_parsed_at":null,"dependency_job_id":"46aadf00-153f-4be2-a7a3-30b3b198f552","html_url":"https://github.com/rllm-org/rllm","commit_stats":null,"previous_names":["agentica-project/deepscaler","agentica-project/rllm","rllm-org/rllm"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/rllm-org/rllm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rllm-org%2Frllm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rllm-org%2Frllm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rllm-org%2Frllm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rllm-org%2Frllm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rllm-org","download_url":"https://codeload.github.com/rllm-org/rllm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rllm-org%2Frllm/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268265789,"owners_count":24222524,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-01T02:00:08.611Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-01T17:02:07.859Z","updated_at":"2025-08-01T17:04:59.987Z","avatar_url":"https://github.com/rllm-org.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook","Python","🧰 Toolkits \u0026 Frameworks","7. Training \u0026 Fine-tuning Ecosystem","Training","🛠️ Frameworks \u0026 Toolkits"],"sub_categories":["MCP Agents","Agentic RL","🔁 Iterative Self-Bootstrapping"],"readme":"\u003cdiv align=\"center\"\u003e\n\n# rLLM\n\n\u003cdiv\u003e\n🚀 Reinforcement Learning for Language Agents🌟\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv\u003e\n\u003cbr\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n[![Documentation](https://img.shields.io/badge/Documentation-black?style=for-the-badge\u0026logo=googledocs\u0026logoColor=white)](https://rllm-project.readthedocs.io/en/latest)\n[![Discord](https://img.shields.io/badge/Discord-5865F2?style=for-the-badge\u0026logo=discord\u0026logoColor=white)](https://discord.gg/BDH46HT9en)\n[![Website](https://img.shields.io/badge/Site-%23000000.svg?style=for-the-badge\u0026logo=semanticweb\u0026logoColor=white)](https://www.agentica-project.com) \n[![Twitter/X](https://img.shields.io/badge/Agentica-white?style=for-the-badge\u0026logo=X\u0026logoColor=000\u0026color=000\u0026labelColor=white)](https://x.com/Agentica_)\n[![Github](https://img.shields.io/badge/RLLM-000000?style=for-the-badge\u0026logo=github\u0026logoColor=000\u0026logoColor=white)](https://github.com/rllm-org/rllm)\n[![Hugging Face Collection](https://img.shields.io/badge/Agentica-fcd022?style=for-the-badge\u0026logo=huggingface\u0026logoColor=000\u0026labelColor)](https://huggingface.co/agentica-org)\n\n\u003c/div\u003e\n\n\u003c/div\u003e\n\nrLLM is an open-source framework for post-training language agents via reinforcement learning. With rLLM, you can easily build your custom agents and environments, train them with reinforcement learning, and deploy them for real-world workloads. \n\n\n## Releases  📰\n\n\u003cstrong\u003e[2025/07/01]\u003c/strong\u003e We release [`DeepSWE-Preview`](https://pretty-radio-b75.notion.site/DeepSWE-Training-a-Fully-Open-sourced-State-of-the-Art[…]-by-Scaling-RL-22281902c1468193aabbe9a8c59bbe33?pvs=73\n), a 32B software engineering agent (SWE) trained with purely RL that achieves 59% on SWEBench-Verified with test-time scaling,(42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. \n- 🍽️ An In-Depth Blog Post on our [SWE Agents and RL Training Recipes](https://pretty-radio-b75.notion.site/DeepSWE-Training-a-Fully-Open-sourced-State-of-the-Art[…]-by-Scaling-RL-22281902c1468193aabbe9a8c59bbe33?pvs=73)\n- 🤗 HF Model [`DeepSWE-Preview`](https://huggingface.co/agentica-org/DeepSWE-Preview)\n- 🤗 HF Dataset [`R2E-Gym-Subset`](https://huggingface.co/datasets/R2E-Gym/R2E-Gym-Subset)\n- 📄 [Training Scripts](https://github.com/rllm-org/rllm/tree/main/examples/swe)\n- 📈 [Wandb Training Logs](https://wandb.ai/mluo/deepswe)—All training runs and ablations.\n- 🔎 [Evaluation Logs](https://drive.google.com/file/d/10LIwpJeaFuiX6Y-qEG2a4a335PEuQJeS/view?usp=sharing)—16 passes over SWE-Bench-Verified.\n\n\u003cstrong\u003e[2025/04/08]\u003c/strong\u003e We release [`DeepCoder-14B-Preview`](https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51), a 14B coding model that achieves an impressive **60.6%** Pass@1 accuracy on LiveCodeBench (+8% improvement), matching the performance of `o3-mini-2025-01-031 (Low)` and `o1-2024-12-17`. \n- ⬆️ An In-Depth Blog Post on our [Training Recipe and Insights](https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51)\n- 🤗 HF Model [`DeepCoder-14B-Preview`](https://huggingface.co/agentica-org/DeepCoder-14B-Preview), [`DeepCoder-1.5B-Preview`](https://huggingface.co/agentica-org/DeepCoder-1.5B-Preview)\n- 🤗 HF Dataset [`DeepCoder-Preview-Dataset`](https://huggingface.co/datasets/agentica-org/DeepCoder-Preview-Dataset)\n- 📄 [Training Scripts](https://github.com/rllm-org/rllm/tree/main/scripts/deepcoder/train)—Exact hyperparameters we used to achieve `o3-mini` performance.\n- 📈 [Wandb Training Logs](https://wandb.ai/mluo/deepcoder)—All training runs and ablations.\n- 🔎 [Evaluation Logs](https://drive.google.com/file/d/1tr_xXvCJnjU0tLO7DNtFL85GIr3aGYln/view?usp=sharing)—LiveCodeBench and Codeforces logs for DeepCoder.\n\n\u003cstrong\u003e[2025/02/10]\u003c/strong\u003e We release [`DeepScaleR-1.5B-Preview`](https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2), a 1.5B model that surpasses O1-Preview and achieves \u003cstrong\u003e43.1% Pass@1\u003c/strong\u003e on AIME. We achieve this by iteratively scaling Deepseek's GRPO algorithm from 8K→16K-\u003e24K context length for thinking.\n- 🍗 An In-Depth Blog Post on our [Training Recipe and Insights](https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2)\n- 🤗 HF Model [`DeepScaleR-1.5B-Preview`](https://huggingface.co/agentica-org/DeepScaleR-1.5B-Preview)\n- 🤗 HF Dataset [`DeepScaleR-Preview-Dataset`](https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset) / 🗂️  [JSON Dataset](https://github.com/agentica-project/deepscaler/tree/main/deepscaler/data)\n- 📄 [Training Scripts](https://github.com/agentica-project/deepscaler/tree/main/scripts/train)—Exact hyperparameters we used to achieve 43.1% on AIME.\n- 📈 [Wandb Training Logs](https://wandb.ai/mluo/deepscaler-1.5b)—All training runs and ablations.\n  - Due to Wandb migration bugs, the 8k training run is compressed to 400-500 steps. The data is identical, but our original run was 1600 steps.\n- 🔎 [Evaluation Logs](https://drive.google.com/file/d/1V_rYKoL35WmubbmWN6PeFg4zo5QOug8X/view?pli=1)—DeepScaleR, Deepseek Distill, and Still 1.5B generations over 1000+ math problems.\n\n\n## Getting Started 🎯\n### Installation\n\n```bash\n# Clone the repository\ngit clone --recurse-submodules https://github.com/rllm-org/rllm.git\ncd rllm\n\n# create a conda environment\nconda create -n rllm python=3.10\nconda activate rllm\n\n# Install all dependencies\npip install -e ./verl\npip install -e .\n```\n\n\n## Acknowledgements\n\n- Our training experiments are powered by our heavily modified fork of [verl](https://github.com/volcengine/verl), an open-source RLHF library.\n- Our models are trained on top of [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B), [`DeepSeek-R1-Distill-Qwen-14B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), and [`Qwen3-32B`](https://huggingface.co/Qwen/Qwen3-32b).\n- Our work is done as part of  [Berkeley Sky Computing Lab](https://skycomputing.berkeley.edu/), [Berkeley AI Research](https://bair.berkeley.edu/), and a successful collaboration with Together AI.\n\n\n## Citation\nCiting rLLM:\n```bibtex\n@misc{rllm2025,\n  title={rLLM: A Framework for Post-Training Language Agents},\n  author={Sijun Tan and Michael Luo and Colin Cai and Tarun Venkat and Kyle Montgomery and Aaron Hao and Tianhao Wu and Arnav Balyan and Manan Roongta and Chenguang Wang and Li Erran Li and Raluca Ada Popa and Ion Stoica},\n  year={2025},\n  howpublished={\\url{https://pretty-radio-b75.notion.site/rLLM-A-Framework-for-Post-Training-Language-Agents-21b81902c146819db63cd98a54ba5f31}},\n  note={Notion Blog}\n  year={2025}\n}\n```\n\nCiting DeepSWE:\n```bibtex\n@misc{deepswe2025,\n  title={DeepSWE: Training a State-of-the-Art Coding Agent from Scratch by Scaling RL},\n  author={Michael Luo and Naman Jain and Jaskirat Singh and Sijun Tan and Ameen Patel and Qingyang Wu and Alpay Ariyak and Colin Cai and Tarun Venkat and Shang Zhu and Ben Athiwaratkun and Manan Roongta and Ce Zhang and Li Erran Li and Raluca Ada Popa and Koushik Sen and Ion Stoica},\n  howpublished={\\url{https://pretty-radio-b75.notion.site/DeepSWE-Training-a-Fully-Open-sourced-State-of-the-Art-Coding-Agent-by-Scaling-RL-22281902c1468193aabbe9a8c59bbe33}},\n  note={Notion Blog},\n  year={2025}\n}\n```\n\nCiting DeepCoder:\n```bibtex\n@misc{deepcoder2025,\n  title={DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level},\n  author={Michael Luo and Sijun Tan and Roy Huang and Ameen Patel and Alpay Ariyak and Qingyang Wu and Xiaoxiang Shi and Rachel Xin and Colin Cai and Maurice Weber and Ce Zhang and Li Erran Li and Raluca Ada Popa and Ion Stoica},\n  howpublished={\\url{https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51}},\n  note={Notion Blog},\n  year={2025}\n}\n```\n\nCiting DeepScaleR:\n```bibtex\n@misc{deepscaler2025,\n  title={DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL},\n  author={Michael Luo and Sijun Tan and Justin Wong and Xiaoxiang Shi and William Y. Tang and Manan Roongta and Colin Cai and Jeffrey Luo and Li Erran Li and Raluca Ada Popa and Ion Stoica},\n  year={2025},\n  howpublished={\\url{https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2}},\n  note={Notion Blog}\n  year={2025}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frllm-org%2Frllm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frllm-org%2Frllm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frllm-org%2Frllm/lists"}