{"id":28408837,"url":"https://github.com/inclusionai/areal","last_synced_at":"2026-03-04T14:01:01.332Z","repository":{"id":279268751,"uuid":"937952934","full_name":"inclusionAI/AReaL","owner":"inclusionAI","description":"Lightning-Fast RL for LLM Reasoning and Agents. Made Simple \u0026 Flexible.","archived":false,"fork":false,"pushed_at":"2026-03-04T10:36:00.000Z","size":186612,"stargazers_count":3749,"open_issues_count":33,"forks_count":320,"subscribers_count":31,"default_branch":"main","last_synced_at":"2026-03-04T11:00:55.234Z","etag":null,"topics":["agent","llm","llm-agent","llm-reasoning","machine-learning-systems","mlsys","reinforcement-learning","rl"],"latest_commit_sha":null,"homepage":"https://inclusionai.github.io/AReaL/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/inclusionAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2025-02-24T07:23:43.000Z","updated_at":"2026-03-04T10:54:46.000Z","dependencies_parsed_at":"2025-04-07T14:33:37.061Z","dependency_job_id":"a31bac06-861e-4663-b809-79c88d79f471","html_url":"https://github.com/inclusionAI/AReaL","commit_stats":null,"previous_names":["inclusionai/areal"],"tags_count":21,"template":false,"template_full_name":null,"purl":"pkg:github/inclusionAI/AReaL","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/inclusionAI%2FAReaL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/inclusionAI%2FAReaL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/inclusionAI%2FAReaL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/inclusionAI%2FAReaL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/inclusionAI","download_url":"https://codeload.github.com/inclusionAI/AReaL/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/inclusionAI%2FAReaL/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30082995,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-04T13:22:36.021Z","status":"ssl_error","status_checked_at":"2026-03-04T13:20:45.750Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","llm","llm-agent","llm-reasoning","machine-learning-systems","mlsys","reinforcement-learning","rl"],"created_at":"2025-06-02T04:37:50.732Z","updated_at":"2026-03-04T14:01:01.316Z","avatar_url":"https://github.com/inclusionAI.png","language":"Python","readme":"\u003ch1 align=\"center\"\u003e\n\u003cem\u003eAReaL\u003c/em\u003e: A Large-Scale Asynchronous Reinforcement Learning System\n\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n| \u003ca href=\"https://arxiv.org/pdf/2505.24298\"\u003e\u003cb\u003ePaper\u003c/b\u003e\u003c/a\u003e | \u003ca href=\"https://inclusionai.github.io/AReaL/\"\u003e\u003cb\u003eDocumentation\u003c/b\u003e\u003c/a\u003e | \u003ca href=\"https://deepwiki.com/inclusionAI/AReaL\"\u003e\u003cb\u003eAsk DeepWiki\u003c/b\u003e\u003c/a\u003e | \u003ca href=\"https://huggingface.co/collections/inclusionAI/\"\u003e\u003cb\u003e🤗 Models \u0026 Data\u003c/b\u003e\u003c/a\u003e |\n\u003ca href=\"./assets/wechat_qrcode.png\" target=\"_blank\"\u003e\u003cimg src=\"./assets/wechat_icon.png\" width=\"20\" style=\"vertical-align: middle;\"\u003e \u003cb\u003eWeChat (微信) Group\u003c/b\u003e\u003c/a\u003e |\n\u003c/p\u003e\n\n\u003cimg align=\"right\" alt=\"ReaL\" src=\"/assets/logo.png\" width=\"20%\"\u003e\n\nAReaL is an open-source **fully asynchronous** reinforcement learning training system\nfor large **reasoning and agentic models**, developed by members from Tsinghua IIIS and\nthe AReaL Team at Ant Group. Built upon the open-source project\n[ReaLHF](https://github.com/openpsi-project/ReaLHF), we are fully committed to\nopen-source principles by providing the training details, data, and infrastructure\nrequired to reproduce our results, along with the models themselves. AReaL aims to help\neveryone build their own AI agents easily and affordably. Our team loves milk tea\nbecause it's delicious, customizable, and affordable—we hope you enjoy our project just\nas much as you'd enjoy real milk tea. Cheers!\n\n**AReaL Highlights**\n\n- ⚡ **Flexibility**: Seamless customization for\n  [agentic RL](https://inclusionai.github.io/AReaL/tutorial/agentic_rl.html) and\n  [online RL training](./examples/openclaw/) by simply replacing the `base_url`.\n- 📈 **Scalability**: **Stable** fully asynchronous RL training with **industry-leading\n  speed**.\n- ✨ **Cutting-Edge Performance**: State-of-the-art [math](/blog/AReaL_v0_2.md),\n  [coding](/blog/AReaL_v0_3.md), [search](https://github.com/inclusionAI/ASearcher), and\n  [customer service](https://arxiv.org/abs/2601.22607) agents.\n\n## 📰 News\n\n**\\[2026/03/02\\]** We provide [a complete example](./examples/openclaw/) to train your\nown 🦞 OpenClaw agent by simply replacing the `base_url` and `api_key` with AReaL's RL\nservice - no complicated dependencies, no code changes, works with any agentic runtime!\n\n**\\[2026/02/06\\]** We are delighted to introduce **AReaL-SEA**, a self-evolving data\nsynthesis engine. Combined with RL training on AReaL, the 235B MoE model surpasses GPT 5\nand achieves comparable performance with Gemini 3.0 Pro on $\\\\tau^2$-bench! Check out\nthe [paper](https://arxiv.org/pdf/2601.22607),\n[model](https://huggingface.co/inclusionAI/AReaL-SEA-235B-A22B),\n[data](https://huggingface.co/datasets/inclusionAI/AReaL-tau2-data), and\n[code](https://github.com/inclusionAI/AReaL/tree/main/examples/tau2).\n\n**\\[2026/01/15\\]** Congrats to our friends at [CAMEL-AI](https://www.camel-ai.org/) for\nopen-sourcing [SETA](https://github.com/camel-ai/seta), their terminal agent RL project\ntrained with AReaL! Check out\n[their training workflow](https://github.com/camel-ai/seta/tree/main/training/tbench_areal_workflow)\nand the [announcement on X](https://x.com/guohao_li/status/2009678513574408636).\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003e📋 Previous Releases\u003c/b\u003e\u003c/summary\u003e\n\n**\\[2026/01/01\\]** Happy New Year! Thanks to the outstanding contribution from\n@HwVanICI, we are excited to officially announce stable support for AReaL training on\n**Ascend NPU devices**! The code is actively maintained and continuously updated in the\n[`ascend` branch](https://github.com/inclusionAI/AReaL/tree/ascend). Check out\n[our documentation](https://inclusionai.github.io/AReaL/tutorial/installation_npu.html)\nto get started, and feel free to report any issues!\n\n**\\[2025/08/30\\]** Introducing ASearcher, a state-of-the-art search agent built with\nAReaL's end-to-end asynchronous RL training. Check out the [paper](assets/paper.pdf) and\nthe [open-source repository](https://github.com/inclusionAI/ASearcher)!\n\n**\\[2025/07/31\\] (AReaL-lite)** We introduce AReaL-lite, a **lightweight** version of\nAReaL designed specifically for AI researchers and rapid prototyping. AReaL-lite\nfeatures an **algorithm-first** API design that prioritizes ease of use and algorithm\ndevelopment, while natively supporting **fully asynchronous agentic RL**. With 80% fewer\nlines of code, AReaL-lite maintains 90% of AReaL's performance and core functionality.\nCheck out [our AReaL-lite design documentation](/areal/README.md) and\n[the quickstart guide](https://inclusionai.github.io/AReaL/tutorial/quickstart.html) to\nbegin your journey with **AReaL-lite**!\n\n**\\[2025/06/03\\] (v0.3, boba²)** We release **boba²** (double-boba) for fully\nasynchronous RL training, which achieves **2.77× speedup while delivering comparable or\nsuperior training performance** compared to synchronous systems. Furthermore,\nasynchronous RL significantly simplifies multi-turn agentic RL training setup! Check out\n[our v0.3 overview blog](/blog/AReaL_v0_3.md) and the\n[research paper](assets/paper.pdf).\n\n**\\[2025/03/31\\] (v0.2, boba)** Introducing our milestone release—boba! Please call it\nA-ReaL-boba! This release features significantly faster training with SGLang support and\nstate-of-the-art 7B and 32B models for mathematical reasoning. Check out our\n[v0.2 technical blog](/blog/AReaL_v0_2.md).\n\n**\\[2025/02/24\\] (v0.1)** Our initial release includes reproducible results for 1.5B and\n7B Large Reasoning Models (LRMs). Check out our\n[v0.1 technical blog](/blog/AReaL_v0_1.md).\n\n\u003c/details\u003e\n\n## 🚀 Getting Started\n\nFirst, install the package:\n\n```bash\ngit clone https://github.com/inclusionAI/AReaL\ncd AReaL\npip install uv\nuv sync --extra cuda\n```\n\nOur training scripts automatically download the required dataset (openai/gsm8k) and\nmodel (Qwen/Qwen2-1.5B-Instruct). To run on a single node:\n\n```bash\npython3 examples/math/gsm8k_rl.py --config examples/math/gsm8k_grpo.yaml scheduler.type=local\n```\n\nTo run on a Ray cluster with 2 nodes and 8 GPUs per node (remember to update paths in\nthe YAML file to point to your shared storage):\n\n```bash\npython3 examples/math/gsm8k_rl.py --config examples/math/gsm8k_grpo.yaml \\\n  cluster.n_nodes=2 cluster.n_gpus_per_node=8 \\\n  scheduler.type=ray\n```\n\nFor comprehensive setup instructions, see\n[our quickstart guide](https://inclusionai.github.io/AReaL/tutorial/quickstart.html).\n\n## 📚 Examples\n\n### Math \u0026 Reasoning\n\n| Task                                                | Description                                                                                  | Performance                                                       |\n| --------------------------------------------------- | -------------------------------------------------------------------------------------------- | ----------------------------------------------------------------- |\n| **[Math](examples/math/)**                          | GSM8K math reasoning with GRPO, PPO, DAPO, REINFORCE, RLOO, LitePPO, DR-GRPO, GSPO, and more | -                                                                 |\n| **[Multi-Turn Math](examples/multi_turn_math/)**    | Multi-turn math agent with reward discounting across turns                                   | [Training Curve](examples/multi_turn_math/reward_curve.png)       |\n| **[LoRA Math](examples/math/gsm8k_grpo_lora.yaml)** | Parameter-efficient math training with LoRA (SGLang/vLLM backends)                           | -                                                                 |\n| **[Countdown](examples/countdown/)**                | Countdown numbers game with custom rewards                                                   | [Training Curve](examples/countdown/countdown_training_curve.png) |\n\n### Agentic RL\n\n| Task                                                     | Description                                                            | Performance                                                                  |\n| -------------------------------------------------------- | ---------------------------------------------------------------------- | ---------------------------------------------------------------------------- |\n| **[General Agent](examples/agent_workflow/)**            | General agentic training with any agentic frameworks                   | [Guide](docs/tutorial/agentic_rl.md)                                         |\n| **[Tau2 Customer Service](examples/tau2/)**              | Customer service agent on Tau2-Bench (retail, airline, telecom)        | [Paper](https://arxiv.org/abs/2601.22607)                                    |\n| **[Search Agent](examples/search_agent/)**               | End-to-end search agent with Tongyi-DeepResearch workflow              | [Training Curve](examples/search_agent/tongyi_deepresearch/reward_curve.png) |\n| **[Tool-Integrated Reasoning](examples/tir/)**           | Multi-turn tool calling during reasoning (Python executor, calculator) | [Training Curve](examples/tir/figures/task_reward.png)                       |\n| **[OpenAI Agents Integration](examples/openai_agents/)** | Integration with OpenAI Agents SDK for agentic workflows               | -                                                                            |\n| **[CAMEL-AI Integration](examples/camel/)**              | Integration with CAMEL-AI framework for agentic RL                     | -                                                                            |\n\n### Vision-Language Models\n\n| Task                                | Description                                               | Performance                                     |\n| ----------------------------------- | --------------------------------------------------------- | ----------------------------------------------- |\n| **[VLM](examples/vlm/)**            | Geometry3K and CLEVR Count 70K visual reasoning with GRPO | -                                               |\n| **[VLM on NPU](examples/vlm_npu/)** | VLM training on Huawei NPU hardware                       | [Benchmark Results](examples/vlm_npu/README.md) |\n\n### Alignment \u0026 Infrastructure\n\n| Task                                            | Description                                           | Performance                                       |\n| ----------------------------------------------- | ----------------------------------------------------- | ------------------------------------------------- |\n| **[RLHF Reward Modeling](examples/alignment/)** | Bradley-Terry reward modeling on Anthropic HH-RLHF    | [Training Curve](examples/alignment/rw_curve.png) |\n| **[SkyPilot Deployment](examples/skypilot/)**   | Cloud deployment with SkyPilot (GCP, AWS, Kubernetes) | [Screenshots](examples/skypilot/README.md)        |\n\n## 🔧 Support Matrix\n\n### 🧠 Algorithms\n\nAll RL algorithms support both asynchronous and synchronous versions by setting\n`max_head_offpolicyness=0`. See [Asynchronous RL Guide](docs/algorithms/async.md).\n\n| Algorithm                | Documentation                             | Paper                                          | Configuration                                                |\n| ------------------------ | ----------------------------------------- | ---------------------------------------------- | ------------------------------------------------------------ |\n| **GRPO**                 | [📖 Docs](docs/algorithms/grpo_series.md) | [📄 Paper](https://arxiv.org/pdf/2402.03300)   | [🔗 GSM8K Example](examples/math/gsm8k_grpo.yaml)            |\n| **GSPO**                 | [📖 Docs](docs/algorithms/grpo_series.md) | [📄 Paper](https://arxiv.org/abs/2507.18071)   | [🔗 GSM8K Example](examples/math/gsm8k_gspo.yaml)            |\n| **PPO**                  | [📖 Docs](docs/algorithms/grpo_series.md) | [📄 Paper](https://arxiv.org/pdf/2203.02155)   | [🔗 GSM8K Example](examples/math/gsm8k_ppo.yaml)             |\n| **DAPO**                 | [📖 Docs](docs/algorithms/grpo_series.md) | [📄 Paper](https://arxiv.org/abs/2503.14476)   | [🔗 GSM8K Example](examples/math/gsm8k_dapo_dynamic_bs.yaml) |\n| **LitePPO**              | [📖 Docs](docs/algorithms/grpo_series.md) | [📄 Paper](https://arxiv.org/abs/2508.08221)   | [🔗 GSM8K Example](examples/math/gsm8k_liteppo.yaml)         |\n| **Dr.GRPO**              | [📖 Docs](docs/algorithms/grpo_series.md) | [📄 Paper](https://arxiv.org/abs/2503.20783)   | [🔗 GSM8K Example](examples/math/gsm8k_drgrpo.yaml)          |\n| **REINFORCE++**          | -                                         | [📄 Paper](https://arxiv.org/pdf/2501.03262)   | [🔗 GSM8K Example](examples/math/gsm8k_reinforce.yaml)       |\n| **RLOO**                 | [📖 Docs](docs/algorithms/grpo_series.md) | [📄 Paper](https://arxiv.org/pdf/2402.14740v1) | [🔗 GSM8K Example](examples/math/gsm8k_rloo.yaml)            |\n| **SAPO**                 | [📖 Docs](docs/algorithms/grpo_series.md) | [📄 Paper](https://arxiv.org/abs/2511.20347)   | [🔗 GSM8K Example](examples/math/gsm8k_sapo.yaml)            |\n| **M2PO**                 | [📖 Docs](docs/algorithms/m2po.md)        | [📄 Paper](https://arxiv.org/abs/2510.01161)   | [🔗 GSM8K Example](examples/math/gsm8k_m2po.yaml)            |\n| **RLHF Reward Modeling** | -                                         | -                                              | [🔗 RLHF Example](examples/alignment/)                       |\n| **SFT**                  | -                                         | -                                              | [🔗 GSM8K Example](examples/math/gsm8k_sft.py)               |\n\n### Models\n\n| Model Family               | Megatron | PyTorch FSDP | PyTorch Archon | Notes                                                    |\n| -------------------------- | -------- | ------------ | -------------- | -------------------------------------------------------- |\n| **Qwen2/3**                | ✅       | ✅           | ✅             | -                                                        |\n| **Qwen3-MoE**              | ✅       | ✅           | ✅             | -                                                        |\n| **Qwen2.5-VL**             | ❌       | ✅           | ❌             | Vision-language model                                    |\n| **Qwen3-VL**               | ❌       | ✅           | ❌             | Vision-language model                                    |\n| **Gemma 3**                | ❌       | ✅           | ❌             | Vision-language model                                    |\n| **Other Hugging Face LLM** | ❌       | ✅           | ❌             | Compatibility depending on the version of `transformers` |\n\nCheck the [AI Coding Assistant Guide](docs/reference/ai_assisted_dev.md) and\n[Archon Reference](docs/tutorial/archon.md) for how to integrate new models into AReaL.\n\n### Training Backends\n\n| Backend            | DP          | Tensor Parallel | Sequence Parallel within TP | Context Parallel | Pipeline Parallel | Expert Parallel | 1D Sequence Packing | LoRA |\n| ------------------ | ----------- | --------------- | --------------------------- | ---------------- | ----------------- | --------------- | ------------------- | ---- |\n| **Megatron**       | ✅ (ZeRO-1) | ✅              | ✅                          | ✅               | ✅                | ✅              | ✅                  | ❌   |\n| **PyTorch FSDP**   | ✅ (FSDP2)  | ✅              | ✅                          | ✅               | ❌                | ❌              | ✅                  | ✅   |\n| **PyTorch Archon** | ✅ (FSDP2)  | ✅              | ✅                          | ✅               | ✅                | ✅              | ✅                  | ❌   |\n\n### Inference Backends\n\n| Backend    | Tensor Parallel | Context Parallel | Pipeline Parallel | Data Parallel Attention | Expert Parallel |\n| ---------- | --------------- | ---------------- | ----------------- | ----------------------- | --------------- |\n| **vLLM**   | ✅              | ❓               | ✅                | ❓                      | ❓              |\n| **SGLang** | ✅              | ❌               | ❌                | ✅                      | ✅              |\n\n## 📖 Resources\n\n### Tutorial\n\n- [Installation](https://inclusionai.github.io/AReaL/tutorial/installation.html)\n- [Quickstart](https://inclusionai.github.io/AReaL/tutorial/quickstart.html)\n- [Agentic RL](https://inclusionai.github.io/AReaL/tutorial/agentic_rl.html)\n- [Evaluation](https://inclusionai.github.io/AReaL/tutorial/eval.html)\n- [Large MoE with Megatron](https://inclusionai.github.io/AReaL/tutorial/megatron.html)\n- [Large MoE with PyTorch Archon](https://inclusionai.github.io/AReaL/tutorial/archon.html)\n\n### Code Walkthrough\n\n- [Running GRPO on GSM8K dataset](https://inclusionai.github.io/AReaL/tutorial/gsm8k_grpo.html)\n\n### Best Practices\n\n- [Improving Algorithm Performance](https://inclusionai.github.io/AReaL/best_practices/algo_perf.html)\n- [Agent Workflow Best Practices](https://inclusionai.github.io/AReaL/best_practices/workflow.html)\n- [Debugging](https://inclusionai.github.io/AReaL/best_practices/debugging.html)\n- [Handling OOM Issues](https://inclusionai.github.io/AReaL/best_practices/handling_oom.html)\n- [Performance Profiling](https://inclusionai.github.io/AReaL/best_practices/perf_profiling.html)\n\n### Customization\n\n- [Customize Dataset](https://inclusionai.github.io/AReaL/customization/dataset.html)\n- [Customize Agentic/RVLR Rollout Workflows](https://inclusionai.github.io/AReaL/customization/agent.html)\n\n### Algorithms\n\n- [Asynchronous RL Explained](https://inclusionai.github.io/AReaL/algorithms/async.html)\n- [PPO, GRPO, and Related Algorithms](https://inclusionai.github.io/AReaL/algorithms/grpo_series.html)\n- [M2PO](https://inclusionai.github.io/AReaL/algorithms/m2po.html)\n\n### Reference\n\n- [CLI Configurations](https://inclusionai.github.io/AReaL/cli_reference.html)\n- [Checkpointing](https://inclusionai.github.io/AReaL/reference/checkpointing.html)\n- [Metrics Tracking](https://inclusionai.github.io/AReaL/reference/metrics_tracking.html)\n- [Allocation Mode](https://inclusionai.github.io/AReaL/reference/alloc_mode.html)\n- [Rollout Workflow](https://inclusionai.github.io/AReaL/reference/rollout_workflow.html)\n- [Agent Workflow](https://inclusionai.github.io/AReaL/reference/agent_workflow.html)\n- [AI-Assisted Development](https://inclusionai.github.io/AReaL/reference/ai_assisted_dev.html)\n\n## 🤝 Contributing\n\nWe warmly welcome contributions from the community! Whether you're fixing bugs, adding\nfeatures, improving documentation, or helping others, your contribution is valued.\nPlease check our **[Contributing Guide](CONTRIBUTING.md)** for detailed information.\n\n```bash\n# Fork and clone the repository\ngit clone https://github.com/YOUR-USERNAME/AReaL\ncd AReaL\n\n# Install uv and sync dependencies\npip install uv\n# Use `--extra cuda` on Linux with CUDA for full functionality\nuv sync --extra cuda --group dev\n# Or without CUDA support\n# uv sync --group dev\n\n# Set up pre-commit hooks for automatic formatting\npre-commit install\n\n# Make changes\ngit checkout -b feat/gpt-o5\ngit add .\n# `git commit` will automatically format your file\ngit commit -m \"Implement gpt-o5 training loop\"\ngit push\n```\n\n## 🗺️ Future Roadmap\n\n- **[Full Roadmap](ROADMAP.md)**\n- **[2025 Q4 Roadmap](https://github.com/inclusionAI/AReaL/issues/542)**\n\nAReaL is under active development with planned minor releases weekly and major releases\nmonthly. We warmly welcome community engagement and contributions. We are also\n**actively hiring interns and full-time employees** with open positions in both the US\nand China.\n\n## 🙏 Acknowledgments\n\nWe gratefully acknowledge that major contributors are from the AReaL Team at the\nInstitute for Interdisciplinary Information Sciences (IIIS), Tsinghua University and Ant\nGroup.\n\nWe have also received invaluable assistance from the following groups (listed\nalphabetically):\n\n- The Data Intelligence Lab at Ant Research for their data support\n\n- @HwVanICI for support on vLLM, LoRA, NPU integration, and more\n\n- The [Relaxed System Lab](https://github.com/Relaxed-System-Lab) at HKUST for seamless\n  collaboration on numerous system-related aspects\n\n- The [SGLang team](https://github.com/sgl-project/sglang) for supporting custom weight\n  update features and their contributions during AReaL-lite development\n\n- The Super Computing Technology (SCT) team at Ant Group for their expertise in\n  large-scale cluster operations and maintenance\n\n- Special thanks to @Lyken17 for providing valuable suggestions throughout the API\n  design process\n\nWe also deeply appreciate all pioneering work from the community, particularly the\n[ReaLHF](https://github.com/openpsi-project/ReaLHF) project from OpenPsi Inc. and other\noutstanding projects, including but not limited to\n[DeepScaleR](https://github.com/agentica-project/deepscaler),\n[Open-Reasoner-Zero](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/tree/main),\n[OpenRLHF](https://github.com/OpenRLHF/OpenRLHF),\n[VeRL](https://github.com/volcengine/verl),\n[SGLang](https://github.com/sgl-project/sglang), [QwQ](https://github.com/QwenLM/QwQ),\n[Light-R1](https://github.com/Qihoo360/Light-R1), and\n[DAPO](https://github.com/BytedTsinghua-SIA/DAPO).\n\n## 📄 Citation\n\n```bibtex\n@inproceedings{mei2025real,\n  author       = {Mei, Zhiyu and Fu, Wei and Li, Kaiwei and Wang, Guangju and Zhang, Huanchen and Wu, Yi},\n  title        = {ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation},\n  booktitle    = {Proceedings of the Eighth Conference on Machine Learning and Systems,\n                  MLSys 2025, Santa Clara, CA, USA, May 12-15, 2025},\n  publisher    = {mlsys.org},\n  year         = {2025},\n}\n```\n\n```bibtex\n@misc{fu2025areal,\n      title={AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning},\n      author={Wei Fu and Jiaxuan Gao and Xujie Shen and Chen Zhu and Zhiyu Mei and Chuyi He and Shusheng Xu and Guo Wei and Jun Mei and Jiashu Wang and Tongkai Yang and Binhang Yuan and Yi Wu},\n      year={2025},\n      eprint={2505.24298},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https://arxiv.org/abs/2505.24298},\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finclusionai%2Fareal","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finclusionai%2Fareal","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finclusionai%2Fareal/lists"}