{"id":28718246,"url":"https://github.com/NovaSky-AI/SkyRL","last_synced_at":"2025-06-15T05:01:45.664Z","repository":{"id":292049930,"uuid":"970890722","full_name":"NovaSky-AI/SkyRL","owner":"NovaSky-AI","description":"SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning","archived":false,"fork":false,"pushed_at":"2025-06-09T04:25:18.000Z","size":2527,"stargazers_count":393,"open_issues_count":4,"forks_count":39,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-06-09T09:18:49.666Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://novasky-ai.notion.site/skyrl-v0","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NovaSky-AI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-22T17:33:31.000Z","updated_at":"2025-06-09T02:42:58.000Z","dependencies_parsed_at":"2025-05-29T09:15:50.665Z","dependency_job_id":null,"html_url":"https://github.com/NovaSky-AI/SkyRL","commit_stats":null,"previous_names":["novasky-ai/skyrl"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/NovaSky-AI/SkyRL","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NovaSky-AI%2FSkyRL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NovaSky-AI%2FSkyRL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NovaSky-AI%2FSkyRL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NovaSky-AI%2FSkyRL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NovaSky-AI","download_url":"https://codeload.github.com/NovaSky-AI/SkyRL/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NovaSky-AI%2FSkyRL/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259924666,"owners_count":22932780,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-15T05:00:56.873Z","updated_at":"2025-06-15T05:01:45.650Z","avatar_url":"https://github.com/NovaSky-AI.png","language":"Python","funding_links":[],"categories":["Industry Strength Reinforcement Learning","Models and Projects","Open-source","TL;DR — pick the right framework","Frameworks, Tools, and Implementations","Frameworks","Python","🛠️ Frameworks \u0026 Toolkits"],"sub_categories":["Ray + LLM","Codebase","Training Frameworks","Scaling and Open-Source","🔁 Iterative Self-Bootstrapping"],"readme":"\u003cdiv align=\"center\"\u003e\n\n# SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning\n\n[![🌐 NovaSky](https://img.shields.io/badge/-Visit%20Website-5865F2?style=for-the-badge)](https://novasky-ai.github.io/) [![Github](https://img.shields.io/badge/SkyRL-000000?style=for-the-badge\u0026logo=github\u0026logoColor=000\u0026logoColor=white)](https://github.com/NovaSky-AI/SkyRL) [![Twitter](https://img.shields.io/badge/NovaSky-white?style=for-the-badge\u0026logo=X\u0026logoColor=000\u0026color=000\u0026labelColor=white)](https://x.com/NovaSkyAI) [![Hugging Face Collection](https://img.shields.io/badge/NovaSky-fcd022?style=for-the-badge\u0026logo=huggingface\u0026logoColor=000\u0026labelColor)](https://huggingface.co/NovaSky-AI) [![Discord](https://img.shields.io/badge/NovaSky-5865F2?style=for-the-badge\u0026logo=discord\u0026logoColor=white)](https://discord.gg/RBAjeWSA)\n\n\n\u003cdiv align=\"center\" style=\"font-family: Arial, sans-serif;\"\u003e\n  \u003cp\u003e\n    \u003ca href=\"#news\" style=\"text-decoration: none; font-weight: bold;\"\u003eNews\u003c/a\u003e •\n    \u003ca href=\"#links\" style=\"text-decoration: none; font-weight: bold;\"\u003eLinks\u003c/a\u003e •\n    \u003ca href=\"#getting-started\" style=\"text-decoration: none; font-weight: bold;\"\u003eGetting Started\u003c/a\u003e •\n    \u003ca href=\"#evaluation\" style=\"text-decoration: none; font-weight: bold;\"\u003eEvaluation\u003c/a\u003e •\n    \u003ca href=\"#citation\" style=\"text-decoration: none; font-weight: bold;\"\u003eCitation\u003c/a\u003e •\n    \u003ca href=\"#acknowledgement\" style=\"text-decoration: none; font-weight: bold;\"\u003eAcknowledgement\u003c/a\u003e \n  \u003c/p\u003e\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n\n# News\n- **[2025/05/20]** 🎉 We released SkyRL-SQL: a multi-turn RL training pipeline for Text-to-SQL, along with SkyRL-SQL-7B — a model trained on just 653 samples that outperforms both GPT-4o and o4-mini!\n- **[2025/05/06]** 🎉 We released SkyRL-v0: our open RL training pipeline for multi-turn tool use LLMs, optimized for long-horizon, real-environment tasks like SWE-Bench!\n\n# Links\n- 📜 [SkyRL-SQL Blog Post](https://novasky-ai.notion.site/skyrl-sql)\n- 📜 [SkyRL-v0 Blog Post](https://novasky-ai.notion.site/skyrl-v0)\n\n# Getting Started\nThis repository contains training code for the `SkyRL-v0` release. Our implementation is a fork of [VeRL](https://github.com/volcengine/verl).   \nThe repo is currently utilizing the SGLang async rollout feature introduced to VeRL in this draft [PR](https://github.com/volcengine/verl/pull/917), based on this [commit](https://github.com/volcengine/verl/commits/436530d6ec2b449e035fd2e823ad5b363cbf908e). We will refactor the code soon so that the codebase can easily keep up with VeRL main branch.\n\n## Installation\n\nThe first step is to clone our repository:\n\n```bash \ngit clone --recurse-submodules https://github.com/NovaSky-AI/SkyRL\n```\n\nFor detailed installation instructions, please refer to [INSTALL.md](./INSTALL.md)\n\n## Scripts for Reproduction\n\n### SkyRL-Agent \n\nFor reproducing our results for SkyRL-Agent-14B-v0, SkyRL-Agent-8B-v0, and SkyRL-Agent-7B-v0 you can refer to [examples/sky/swebench](./examples/sky/swebench/README.md).\n\n### SkyRL-SQL-7B\nFor reproducing our results for SkyRL-SQL-7B, you can refer to [examples/sky/sql](./examples/sky/sql/README.md).\n\n# Evaluation\nWe report evaluation results of different downstream tasks as below. \n\n## SWE-Bench\nWe report the evaluation result on SWE-Bench-Verified below.\n\n| Model              | Base                 | Base Performance | Performance | Training Time |\n|--------------------|----------------------|------------------|-------------|---------------|\n| SkyRL-Agent-7B-v0  | OpenHands-7B-Agent   | 11%              | 14.6%       | 16hrs 8xH100  |\n| SkyRL-Agent-8B-v0  | Qwen3-8B no thinking | 3.6%             | 9.4%        | 27hrs 8xH200  |\n| SkyRL-Agent-14B-v0 | Qwen3-14B thinking   | 18%              | 21.6%       | 20hrs 8xH200  |\n\n## Text-to-SQL \nWe report the evaluation result on a range of Spider benchmarks (evaluated in 5 turns) below. \n| Model                        | Spider-Dev | Spider-Test | Spider-Realistic | Spider-DK | Spider-Syn | Avg  |\n|-----------------------------|------------|-------------|------------------|-----------|------------|---------------|\n| Qwen-2.5-Coder-7B-Instruct  | 77.1       | 79.6        | 74.2             | 62.8      | 66.2       | 72.0          |\n| o4-mini                     | 80.6       | 81.8        | 81.2             | 70.8      | 72.1       | 77.3          |\n| GPT-4o                      | 81.3       | 82.4        | 80.1             | 72.1      | 71.9       | 77.6          |\n| SkyRL-SQL-7B         | 83.9 (+6.8%)| 85.2 (+5.6%)  | 81.1  (+6.9%)     | 72.0 (+9.2%)| 73.7 (+7.5%)| 79.2 (+7.2%)      |\n\n\n# Acknowledgement\nThis work is done at [Berkeley Sky Computing Lab](https://sky.cs.berkeley.edu/), with the amazing compute support from [Anyscale](https://www.anyscale.com/), [Databricks](https://www.databricks.com/), [NVIDIA](https://developer.nvidia.com/brev), [Lambda Labs](https://lambdalabs.com/service/gpu-cloud?srsltid=AfmBOop5FnmEFTkavVtdZDsLWvHWNg6peXtat-OXJ9MW5GMNsk756PE5), and [AMD](https://www.amd.com/).\n\nHuge thanks to the contributors of the SGLang async rollout feature in VeRL: Hancheng Zhang, Rui Lu, Haoran Wang from Tsinghua University, Xiang Long from OpenBMB/ModelBest.\n\nWe would also like to thank [Ying Sheng](https://sites.google.com/view/yingsheng/home), [Chenyang Zhao](https://zhaochenyang20.github.io/Chayenne/) from [SGLang](https://github.com/sgl-project/sglang) team for supporting SGLang async rollout integration, and [Kaichao You](https://youkaichao.github.io/research), [Simon Mo](https://github.com/simon-mo) from [vLLM](https://github.com/vllm-project/vllm) team for supporting vLLM performance optimization.\n\n# Citation\nThe code in this repository is mostly described in the post below. Please consider citing this work if you find the repository helpful. \n\n```bibtex\n@misc{cao2025skyrl,\n  title     = {SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning},\n  author    = {Shiyi Cao and Sumanth Hegde and Dacheng Li and Tyler Griggs and Shu Liu and Eric Tang and Jiayi Pan and Xingyao Wang and Akshay Malik and Graham Neubig and Kourosh Hakhamaneshi and Richard Liaw and Philipp Moritz and Matei Zaharia and Joseph E. Gonzalez and Ion Stoica},\n  year      = {2025},\n}\n```\n\n```bibtex\n@misc{liu2025skyrlsql,\n      title={SkyRL-SQL: Matching GPT-4o and o4-mini on Text2SQL with Multi-Turn RL},\n      author={Shu Liu and Sumanth Hegde and Shiyi Cao and Alan Zhu and Dacheng Li and Tyler Griggs and Eric Tang and Akshay Malik and Kourosh Hakhamaneshi and Richard Liaw and Philipp Moritz and Matei Zaharia and Joseph E. Gonzalez and Ion Stoica},\n      year={2025},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNovaSky-AI%2FSkyRL","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FNovaSky-AI%2FSkyRL","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNovaSky-AI%2FSkyRL/lists"}