{"id":25135777,"url":"https://github.com/Unakar/Logic-RL","last_synced_at":"2025-10-23T14:30:27.993Z","repository":{"id":275413812,"uuid":"926010165","full_name":"Unakar/Logic-RL","owner":"Unakar","description":null,"archived":false,"fork":false,"pushed_at":"2025-02-02T10:24:13.000Z","size":1733,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-02T11:24:20.536Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Unakar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-02T10:22:34.000Z","updated_at":"2025-02-02T10:27:22.000Z","dependencies_parsed_at":"2025-02-02T11:34:38.181Z","dependency_job_id":null,"html_url":"https://github.com/Unakar/Logic-RL","commit_stats":null,"previous_names":["unakar/logic-rl"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Unakar%2FLogic-RL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Unakar%2FLogic-RL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Unakar%2FLogic-RL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Unakar%2FLogic-RL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Unakar","download_url":"https://codeload.github.com/Unakar/Logic-RL/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237839055,"owners_count":19374308,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-08T17:01:08.549Z","updated_at":"2025-10-23T14:30:27.988Z","avatar_url":"https://github.com/Unakar.png","language":"Python","funding_links":[],"categories":["Projects","Summary","Open-source","A01_文本生成_文本对话","RelatedRepos"],"sub_categories":["Large Language Models","Codebase","大语言对话模型及数据","Replicates of DeepSeek-R1 and DeepSeek-R1-Zero"],"readme":"\n# Logic-RL\n\n\u003ca href='https://arxiv.org/abs/2502.14768'\u003e\u003cimg src='https://img.shields.io/badge/arXiv-2502.14768-b31b1b.svg'\u003e\u003c/a\u003e \u0026nbsp;\n\nLogic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning \n---\n\n## News\n[2025/03/20] We release the [ADORA: A Scalable Paradigm for Steering Learning Trajectories ](https://github.com/ShadeCloak/ADORA?tab=readme-ov-file).\n\n[2025/03/19] For stable length control, refer to https://github.com/lblankl/Short-RL\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\n      \u003cimg src=\"./pics/teaser.png\" width=\"800\" alt=\"Teaser Image\"\u003e\n    \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003eMain results\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n---\n\n## Benchmark\n\n| Model                                                             | 2ppl | 3ppl | 4ppl | 5ppl | 6ppl | 7ppl | 8ppl |\n|------------------------------------------------------------------------|------|------|------|------|------|------|------|\n| o3-mini-high                | 0.99 | 0.98 | 0.97 | 0.95 | 0.94 | 0.89 | 0.83 |\n| o1-2024-12-17               | 0.83 | 0.51 | 0.38 | 0.38 | 0.35 | 0.30 | 0.20 |\n| GPT-4o                      | 0.68 | 0.57 | 0.49 | 0.32 | 0.23 | 0.21 | 0.11 |\n| Deepseek-Math-7b            | 0.35 | 0.21 | 0.08 | 0.06 | 0.02 | 0.00 | 0.00 |\n| Qwen2.5-7B-Instruct-1M      | 0.49 | 0.40 | 0.25 | 0.11 | 0.02 | 0.06 | 0.01 |\n| Qwen2.5-7B-Logic-RL (ours)  | 0.99 | 0.99 | 0.94 | 0.92 | 0.91 | 0.80 | 0.67 |\n\n\n---\n\n## Installation\n\n```bash\nconda create -n logic python=3.9\npip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121\npip3 install vllm==0.6.3 ray\npip3 install flash-attn --no-build-isolation\npip install -e .  # For verl integration\npip install wandb IPython matplotlib\n```\n\n---\n\n## Data Preparation\n\nYou can directly use /data.\n\nFor your own data generation, here's a demo:\n\n### Base Model\n```bash\npython ./examples/data_preprocess/kk.py \\\n    --local_dir {processed_data_path} \\\n    --data_path {raw_data_path}\n```\n\n### Instruct Model\n```bash\npython ./examples/data_preprocess/kk.py \\\n    --template_type=qwen-instruct \\\n    --local_dir {processed_data_path} \\\n    --data_path {raw_data_path}\n```\n\n---\n\n## Training Execution\n```bash\nconda activate logic\nbash main_grpo.sh  # 4×A100 80G\n```\n\n---\n\n## ⚙️ Implementation Details\n\n| Component              | Location                          |\n|------------------------|-----------------------------------|\n| Reward Modeling     | `verl/utils/reward_score/kk.py`   |\n| Data Preprocessing   | `examples/data_preprocess/kk.py`  |\n\n---\n\n\n## Citation\n```\n@misc{xie2025logicrlunleashingllmreasoning,\n      title={Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning}, \n      author={Tian Xie and Zitian Gao and Qingnan Ren and Haoming Luo and Yuqian Hong and Bryan Dai and Joey Zhou and Kai Qiu and Zhirong Wu and Chong Luo},\n      year={2025},\n      eprint={2502.14768},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https://arxiv.org/abs/2502.14768}, \n}\n```\n\n---\n\n## Acknowledgements\n- [Verl](https://github.com/volcengine/verl) 🔗\n- [TinyZero](https://github.com/Jiayi-Pan/TinyZero) 🔗\n- [Knights and Knaves (K\u0026K) puzzles dataset](https://github.com/AlphaPav/mem-kk-logic) 🔗\n\n---\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=Unakar/Logic-RL\u0026type=Date)](https://star-history.com/#Unakar/Logic-RL\u0026Date)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FUnakar%2FLogic-RL","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FUnakar%2FLogic-RL","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FUnakar%2FLogic-RL/lists"}