https://github.com/cmu-l3/l1
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
https://github.com/cmu-l3/l1
Last synced: 14 days ago
JSON representation
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
- Host: GitHub
- URL: https://github.com/cmu-l3/l1
- Owner: cmu-l3
- Created: 2025-03-06T17:03:25.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2025-03-18T17:54:34.000Z (about 1 month ago)
- Last Synced: 2025-03-30T03:11:04.030Z (21 days ago)
- Language: Python
- Homepage: https://cmu-l3.github.io/l1/
- Size: 20.5 MB
- Stars: 162
- Watchers: 3
- Forks: 16
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-rl-reasoning-recipes - cmu-l3/l1
- awesome-rl-reasoning-recipes - cmu-l3/l1
README
## How to Use?
### Installation
```bash
git clone https://github.com/cmu-l3/l1.git
cd l1
pip install -e verl
pip install packaging
pip install ninja
pip install flash-attn --no-build-isolation
pip install -e .
```### Prepare Dataset
You can use scripts in `scripts/data` to prepare your own dataset.
Example, generate data for traininng L1-Exact:
```
python scripts/data/deepscaler_dataset.py
```For L1-Max:
```
python scripts/data/deepscaler_dataset.py --use_both_both
```For Evaluation on AIME2025, GPQA, LSAT and MMLU, you can use scripts in `scripts/eval`:
```
python scripts/data/generate_aime.py
python scripts/data/generate_gpqa.py
python scripts/data/generate_lsat.py
python scripts/data/generate_mmlu.py
```### Train Models
You can skip this step if you want to use our pre-trained models.
You can run scripts in `scripts/train` to train your own models. Make sure to specify the correct data path.
### Evaluate Models
Use one of `scripts/eval` to evaluate your models. Make sure to specify the correct model path.
For example, evaluate L1-Exact on AIME2025:
```
./scripts/eval/eval_model_token.sh --model path/to/your/model --num-tokens --datasets aime2025
```### Replicate Results
To replicate results for L1-Exact and L1-Max from the [paper](https://arxiv.org/abs/2503.04697), you can use scripts in `scripts/replicate`.
1. Prepare data:
```
./scripts/replicate/prepare_data.sh
```2. Evaluate models:
```
./scripts/replicate/eval_inference_exact.sh l3lab/L1-Qwen-1.5B-Exact
./scripts/replicate/eval_inference_max.sh l3lab/L1-Qwen-1.5B-Max
```## Acknowledgments
- We would like to thank DeepSeek for releasing Deepseek-r1 and distilled models,
- Qwen for releasing super-awesome Qwen-2.5 math Models, and
- [Agentica](https://github.com/agentica-project/deepscaler) for codebase, and opensourcing their models and datasets! This codebase is built on top of their work.## Citation
If you use L1/LCPO in your research, please cite:
```bibtex
@misc{aggarwal2025l1controllinglongreasoning,
title={L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning},
author={Pranjal Aggarwal and Sean Welleck},
year={2025},
eprint={2503.04697},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.04697},
}
```