https://github.com/Unakar/Logic-RL

Last synced: 8 days ago
JSON representation

Host: GitHub
URL: https://github.com/Unakar/Logic-RL
Owner: Unakar
License: apache-2.0
Created: 2025-02-02T10:22:34.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-02-02T10:24:13.000Z (9 months ago)
Last Synced: 2025-02-02T11:24:20.536Z (9 months ago)
Language: Python
Size: 1.65 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

Awesome-RL-for-LRMs - Unakar/Logic-RL
Awesome-RL-for-LRMs - Unakar/Logic-RL
awesome-llm-and-aigc - Logic-RL - RL?style=social"/> : Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning. "Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning". (**[arXiv 2025](https://arxiv.org/abs/2502.14768)**). (Summary)
awesome-llm-strawberry - Ubiquant
StarryDivineSky - Unakar/Logic-RL - RL项目旨在复现在逻辑谜题上表现出色的R1 Zero算法。该项目专注于使用强化学习解决逻辑问题，特别是那些可以通过一阶逻辑表达的问题。核心思想是将逻辑谜题转化为马尔可夫决策过程（MDP），并利用强化学习算法训练智能体来寻找解决方案。项目特色在于其对逻辑推理和强化学习的结合，以及对R1 Zero算法的实现。它提供了一个框架，可以用于探索强化学习在解决复杂逻辑问题中的潜力。该项目可能包含用于定义逻辑谜题、构建MDP环境、训练智能体和评估性能的代码。目标是创建一个能够自动解决逻辑谜题的智能体，并深入理解强化学习在逻辑推理中的应用。通过复现R1 Zero，项目旨在为该领域的研究做出贡献，并为开发更强大的逻辑推理智能体提供基础。 (A01_文本生成_文本对话 / 大语言对话模型及数据)

README

          
# Logic-RL

  

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning 

---

## News

[2025/03/20] We release the [ADORA: A Scalable Paradigm for Steering Learning Trajectories ](https://github.com/ShadeCloak/ADORA?tab=readme-ov-file).

[2025/03/19] For stable length control, refer to https://github.com/lblankl/Short-RL

  

    

      

    

  

  

    Main results

  

---

## Benchmark

| Model                                                             | 2ppl | 3ppl | 4ppl | 5ppl | 6ppl | 7ppl | 8ppl |

|------------------------------------------------------------------------|------|------|------|------|------|------|------|

| o3-mini-high                | 0.99 | 0.98 | 0.97 | 0.95 | 0.94 | 0.89 | 0.83 |

| o1-2024-12-17               | 0.83 | 0.51 | 0.38 | 0.38 | 0.35 | 0.30 | 0.20 |

| GPT-4o                      | 0.68 | 0.57 | 0.49 | 0.32 | 0.23 | 0.21 | 0.11 |

| Deepseek-Math-7b            | 0.35 | 0.21 | 0.08 | 0.06 | 0.02 | 0.00 | 0.00 |

| Qwen2.5-7B-Instruct-1M      | 0.49 | 0.40 | 0.25 | 0.11 | 0.02 | 0.06 | 0.01 |

| Qwen2.5-7B-Logic-RL (ours)  | 0.99 | 0.99 | 0.94 | 0.92 | 0.91 | 0.80 | 0.67 |

---

## Installation

```bash

conda create -n logic python=3.9

pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121

pip3 install vllm==0.6.3 ray

pip3 install flash-attn --no-build-isolation

pip install -e .  # For verl integration

pip install wandb IPython matplotlib

```

---

## Data Preparation

You can directly use /data.

For your own data generation, here's a demo:

### Base Model

```bash

python ./examples/data_preprocess/kk.py \

    --local_dir {processed_data_path} \

    --data_path {raw_data_path}

```

### Instruct Model

```bash

python ./examples/data_preprocess/kk.py \

    --template_type=qwen-instruct \

    --local_dir {processed_data_path} \

    --data_path {raw_data_path}

```

---

## Training Execution

```bash

conda activate logic

bash main_grpo.sh  # 4×A100 80G

```

---

## ⚙️ Implementation Details

| Component              | Location                          |

|------------------------|-----------------------------------|

| Reward Modeling     | `verl/utils/reward_score/kk.py`   |

| Data Preprocessing   | `examples/data_preprocess/kk.py`  |

---

## Citation

```

@misc{xie2025logicrlunleashingllmreasoning,

      title={Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning}, 

      author={Tian Xie and Zitian Gao and Qingnan Ren and Haoming Luo and Yuqian Hong and Bryan Dai and Joey Zhou and Kai Qiu and Zhirong Wu and Chong Luo},

      year={2025},

      eprint={2502.14768},

      archivePrefix={arXiv},

      primaryClass={cs.CL},

      url={https://arxiv.org/abs/2502.14768}, 

}

```

---

## Acknowledgements

- [Verl](https://github.com/volcengine/verl) 🔗

- [TinyZero](https://github.com/Jiayi-Pan/TinyZero) 🔗

- [Knights and Knaves (K&K) puzzles dataset](https://github.com/AlphaPav/mem-kk-logic) 🔗

---

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=Unakar/Logic-RL&type=Date)](https://star-history.com/#Unakar/Logic-RL&Date)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Unakar/Logic-RL

Awesome Lists containing this project

README