{"id":26483075,"url":"https://github.com/cmu-l3/l1","last_synced_at":"2025-04-06T05:16:46.020Z","repository":{"id":282450815,"uuid":"944029251","full_name":"cmu-l3/l1","owner":"cmu-l3","description":"L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning","archived":false,"fork":false,"pushed_at":"2025-03-18T17:54:34.000Z","size":21531,"stargazers_count":162,"open_issues_count":5,"forks_count":16,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-30T03:11:04.030Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://cmu-l3.github.io/l1/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cmu-l3.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-06T17:03:25.000Z","updated_at":"2025-03-29T08:56:03.000Z","dependencies_parsed_at":null,"dependency_job_id":"d918b4a3-1359-43b6-8b91-dcd6680fc1ac","html_url":"https://github.com/cmu-l3/l1","commit_stats":null,"previous_names":["cmu-l3/l1"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmu-l3%2Fl1","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmu-l3%2Fl1/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmu-l3%2Fl1/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmu-l3%2Fl1/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cmu-l3","download_url":"https://codeload.github.com/cmu-l3/l1/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247436286,"owners_count":20938533,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-20T04:52:04.494Z","updated_at":"2025-04-06T05:16:46.013Z","avatar_url":"https://github.com/cmu-l3.png","language":"Python","funding_links":[],"categories":["Projects"],"sub_categories":["Large Language Models"],"readme":"\u003cdiv align=\"center\"\u003e\n    \u003ch1\u003e L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning\u003c/h1\u003e\n    \u003ca href=\"https://cmu-l3.github.io/l1\"\u003e\u003cimg src=\"https://img.shields.io/website?down_message=down\u0026style=for-the-badge\u0026up_message=up\u0026url=https%3A%2F%2Fcmu-l3.github.io/l1\"\u003e\u003c/a\u003e\n\u003ca href=\"https://arxiv.org/abs/2503.04697\"\u003e\u003cimg src=\"https://img.shields.io/badge/arXiv-2504.04697-red.svg?style=for-the-badge\"\u003e\u003c/a\u003e\n\u003ca href=\"https://huggingface.co/collections/l3lab/l1-67cacf4e39c176ca4e9890f4\"\u003e\u003cimg src=\"https://img.shields.io/badge/Hugging%20Face-Model-blue?style=for-the-badge\u0026logo=huggingface\"\u003e\u003c/a\u003e\n\u003ca href=\"https://colab.research.google.com/drive/1E7A327gO5ph06-kZ6E71AWmqQxLE0kqX?usp=sharing\"\u003e\u003cimg src=\"https://img.shields.io/badge/Colab-Notebook-orange?style=for-the-badge\u0026logo=googlecolab\"\u003e\u003c/a\u003e\n    \u003cbr\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n## How to Use?\n\n### Installation\n\n```bash\ngit clone https://github.com/cmu-l3/l1.git\ncd l1\npip install -e verl\npip install packaging\npip install ninja\npip install flash-attn --no-build-isolation\npip install -e .\n```\n\n\n### Prepare Dataset\n\nYou can use scripts in `scripts/data` to prepare your own dataset.\n\nExample, generate data for traininng L1-Exact:\n```\npython scripts/data/deepscaler_dataset.py \n```\n\nFor L1-Max:\n```\npython scripts/data/deepscaler_dataset.py --use_both_both\n```\n\nFor Evaluation on AIME2025, GPQA, LSAT and MMLU, you can use scripts in `scripts/eval`:\n```\npython scripts/data/generate_aime.py\npython scripts/data/generate_gpqa.py\npython scripts/data/generate_lsat.py\npython scripts/data/generate_mmlu.py\n```\n\n### Train Models\n\nYou can skip this step if you want to use our pre-trained models.\n\nYou can run scripts in `scripts/train` to train your own models. Make sure to specify the correct data path.\n\n### Evaluate Models\n\nUse one of `scripts/eval` to evaluate your models. Make sure to specify the correct model path.\n\nFor example, evaluate L1-Exact on AIME2025:\n```\n./scripts/eval/eval_model_token.sh --model path/to/your/model --num-tokens \u003cnum_tokens\u003e --datasets aime2025\n```\n\n### Replicate Results\n\nTo replicate results for L1-Exact and L1-Max from the [paper](https://arxiv.org/abs/2503.04697), you can use scripts in `scripts/replicate`.\n\n1. Prepare data:\n```\n./scripts/replicate/prepare_data.sh\n```\n\n2. Evaluate models:\n```\n./scripts/replicate/eval_inference_exact.sh l3lab/L1-Qwen-1.5B-Exact\n./scripts/replicate/eval_inference_max.sh l3lab/L1-Qwen-1.5B-Max\n```\n\n## Acknowledgments\n\n- We would like to thank DeepSeek for releasing Deepseek-r1 and distilled models, \n- Qwen for releasing super-awesome Qwen-2.5 math Models, and \n- [Agentica](https://github.com/agentica-project/deepscaler) for codebase, and opensourcing their models and datasets! This codebase is built on top of their work.\n\n\n## Citation\n\nIf you use L1/LCPO in your research, please cite:\n\n```bibtex\n@misc{aggarwal2025l1controllinglongreasoning,\n  title={L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning}, \n  author={Pranjal Aggarwal and Sean Welleck},\n  year={2025},\n  eprint={2503.04697},\n  archivePrefix={arXiv},\n  primaryClass={cs.CL},\n  url={https://arxiv.org/abs/2503.04697}, \n}\n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcmu-l3%2Fl1","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcmu-l3%2Fl1","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcmu-l3%2Fl1/lists"}