An open API service indexing awesome lists of open source software.

https://github.com/cmu-l3/l1

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
https://github.com/cmu-l3/l1

Last synced: 14 days ago
JSON representation

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Awesome Lists containing this project

README

        


L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning











## How to Use?

### Installation

```bash
git clone https://github.com/cmu-l3/l1.git
cd l1
pip install -e verl
pip install packaging
pip install ninja
pip install flash-attn --no-build-isolation
pip install -e .
```

### Prepare Dataset

You can use scripts in `scripts/data` to prepare your own dataset.

Example, generate data for traininng L1-Exact:
```
python scripts/data/deepscaler_dataset.py
```

For L1-Max:
```
python scripts/data/deepscaler_dataset.py --use_both_both
```

For Evaluation on AIME2025, GPQA, LSAT and MMLU, you can use scripts in `scripts/eval`:
```
python scripts/data/generate_aime.py
python scripts/data/generate_gpqa.py
python scripts/data/generate_lsat.py
python scripts/data/generate_mmlu.py
```

### Train Models

You can skip this step if you want to use our pre-trained models.

You can run scripts in `scripts/train` to train your own models. Make sure to specify the correct data path.

### Evaluate Models

Use one of `scripts/eval` to evaluate your models. Make sure to specify the correct model path.

For example, evaluate L1-Exact on AIME2025:
```
./scripts/eval/eval_model_token.sh --model path/to/your/model --num-tokens --datasets aime2025
```

### Replicate Results

To replicate results for L1-Exact and L1-Max from the [paper](https://arxiv.org/abs/2503.04697), you can use scripts in `scripts/replicate`.

1. Prepare data:
```
./scripts/replicate/prepare_data.sh
```

2. Evaluate models:
```
./scripts/replicate/eval_inference_exact.sh l3lab/L1-Qwen-1.5B-Exact
./scripts/replicate/eval_inference_max.sh l3lab/L1-Qwen-1.5B-Max
```

## Acknowledgments

- We would like to thank DeepSeek for releasing Deepseek-r1 and distilled models,
- Qwen for releasing super-awesome Qwen-2.5 math Models, and
- [Agentica](https://github.com/agentica-project/deepscaler) for codebase, and opensourcing their models and datasets! This codebase is built on top of their work.

## Citation

If you use L1/LCPO in your research, please cite:

```bibtex
@misc{aggarwal2025l1controllinglongreasoning,
title={L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning},
author={Pranjal Aggarwal and Sean Welleck},
year={2025},
eprint={2503.04697},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.04697},
}
```