{"id":27233042,"url":"https://github.com/GAIR-NLP/LIMR","last_synced_at":"2025-04-10T14:11:22.886Z","repository":{"id":278111634,"uuid":"934155930","full_name":"GAIR-NLP/LIMR","owner":"GAIR-NLP","description":null,"archived":false,"fork":false,"pushed_at":"2025-02-20T15:28:15.000Z","size":4182,"stargazers_count":179,"open_issues_count":6,"forks_count":6,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-05T02:12:41.647Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GAIR-NLP.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-17T11:15:40.000Z","updated_at":"2025-04-04T09:00:04.000Z","dependencies_parsed_at":"2025-02-18T04:32:04.260Z","dependency_job_id":null,"html_url":"https://github.com/GAIR-NLP/LIMR","commit_stats":null,"previous_names":["gair-nlp/limr"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GAIR-NLP%2FLIMR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GAIR-NLP%2FLIMR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GAIR-NLP%2FLIMR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GAIR-NLP%2FLIMR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GAIR-NLP","download_url":"https://codeload.github.com/GAIR-NLP/LIMR/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248232418,"owners_count":21069487,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-10T14:11:17.973Z","updated_at":"2025-04-10T14:11:22.874Z","avatar_url":"https://github.com/GAIR-NLP.png","language":"Python","funding_links":[],"categories":["Projects","A01_文本生成_文本对话","Python"],"sub_categories":["Large Language Models","大语言对话模型及数据"],"readme":"\u003cdiv align=\"center\"\u003e\n\n# LIMR: Less is More for RL Scaling\n\n\u003c/div\u003e\n\n\n\u003cp align=\"center\"\u003e\n  📄 \u003ca href=\"https://arxiv.org/pdf/2502.11886\" target=\"_blank\"\u003ePaper\u003c/a\u003e \u0026nbsp; | \u0026nbsp;\n  🌐 \u003ca href=\"https://huggingface.co/datasets/GAIR/LIMR\" target=\"_blank\"\u003eDataset\u003c/a\u003e \u0026nbsp; | \u0026nbsp;\n  📘 \u003ca href=\"https://huggingface.co/GAIR/LIMR\" target=\"_blank\"\u003eModel\u003c/a\u003e\n\u003c/p\u003e\n\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"assets/main.png\" width=\"700\" alt=\"limr-main\"\u003e\n\u003c/div\u003e\n\n\u003e (a) The accuracy on AIME24 across using different training datasets in RL **without any data distillation and SFT training as cold start**. Our specifically curated LIMR dataset, a strategically selected subset from the full dataset, MATH (level 3-5), achieved comparable accuracy levels while utilizing less than one-sixth of the data volume. Notably, LIM significantly outperformed a randomly selected dataset of equivalent size, demonstrating the effectiveness of our selective dataset construction methodology. (b) A comparison of different data-efficient models.\n\n\n## Releases\n\n[2025/02/17] We're releasing the following components:\n\n- 🛠️ **LIM Tools**: Implementation of our **Learning Impact Measurement** methodology\n- 🚀 **Training \u0026 Evaluation**: Complete implementation of our training pipeline and evaluation scripts\n- 🔥 **[LIMR Dataset](https://huggingface.co/datasets/GAIR/LIMR)**: Our curated dataset of 1,389 mathematical questions\n- 🤖 **[LIMR Model](https://huggingface.co/GAIR/LIMR)**: Model training on the LIMR dataset.\n\n## Overview\n\nThis repository presents **LIMR**, an approach that challenges the assumption about data scaling in reinforcement learning for LLMs. We demonstrate that the quality and relevance of training samples matter far more than their quantity. Our **Learning Impact Measurement (LIM)** methodology enables automated evaluation of training sample effectiveness, eliminating the need for manual curation while achieving **comparable or superior** results with **6x less** data. Notably, all our investigations are conducted directly from base models without distillation, providing clear insights into the core dynamics of RL training.\n\n\nOur key findings revolutionize the understanding of RL training dynamics:\n\n- A strategically selected subset of training samples (1,389) can achieve comparable or even superior performance compared to training with the full dataset (8,523), fundamentally challenging the assumption that larger datasets necessarily lead to better performance.\n- We introduce Learning Impact Measurement (LIM), an automated quantitative method for probing the potential value of RL training samples, enabling systematic analysis of how different samples contribute to model improvement.\n- While distilled long-form reasoning data has shown efficiency in larger models, at the scale of ~1K samples with small models (7B), our data-efficient RL approach significantly outperforms SFT with distilled data.\n- The path to better reasoning capabilities may not lie in simply scaling up RL training data, but rather in being more selective about which samples to use.\n\n\nPerformance across challenging mathematical benchmarks:\n\n| Method | #Questions | AIME2024 | MATH500 | AMC2023 | AVG. |\n|--------|------------|-----------|----------|-----------|-------|\n| Qwen-Math-7B | - | 16.7 | 52.4 | 52.5 | 40.5 |\n| Qwen-Math-7B-FULL | 8,523 | 32.5 | 76.6 | 61.9 | 57.0 |\n| Qwen-Math-7B-RAND | 1,389 | 25.8 | 66.0 | 56.3 | 49.4 |\n| Qwen-Math-7B-LINEAR | 1,138 | 28.3 | 74.6 | 61.9 | 54.9 |\n| LIMR | 1,389 | **32.5** | **78.0** | **63.8** | **58.1** |\n\nComparsion with other popular RL recipes. We apply RL directly from the base model, without using distilled long chain-of-thought data from larger or stronger models, and only use 1k questions.\n| Methods   | Init Model | Long CoT Dis. | #Questions |\n|-----------|------------|---------------|------------|\n| STILL-3   | Instruct   | Yes           | 29,925        |\n| DeepScaleR| Instruct   | Yes           | 40,314        |\n| Sky-T1    | Instruct   | Yes           | 45,000        |\n| THUDM-T1  | Instruct   | No            | 30,000        |\n| PRIME     | Instruct   | No            | 150,000       |\n| SimpleRL  | Base       | No            | 8,523         |\n| LIMR      | Base       | No            | 1,389         |\n\n\n## Quick Start\n\n### Data Selection with LIM(optional)\n\n```bash\n# RUN RL training with MATH-FULL dataset\nbash scripts/train_math.8k.sh\n```\n\n```bash\n# Select data using LIM method\n\n#!/bin/bash\npython lim_selection.py \\\n    --train_samples_path ./data/output/math.8k \\\n    --original_prompts_path ./data/prompts/math8k.json \\\n    --output_path ./data/prompts/limr.json \\\n    --steps_per_epoch 8 \\\n    --max_epochs 21 \\\n    --similarity_threshold 0.6\n```\n\n### RL Training with LIMR dataset\n\n```bash\n# Run RL training with LIMR dataset\nbash scripts/train_limr.sh\n```\n\n### Evaluation\n\n```bash\n# Evaluate on benchmarks\ncd eval\npython eval.sh --run_path run/ckpts/path --step step_of_ckpts\n```\n\n## Acknowledgements\n\nOur work builds upon the insightful technical reports from [DeepSeek R1](https://github.com/deepseek-ai/DeepSeek-R1) and [Kimi-k1.5](https://github.com/MoonshotAI/Kimi-k1.5) teams. We extend our appreciation to the [Qwen-Math](https://github.com/QwenLM/Qwen2.5-Math) team for their open-source model, and to the creators of [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) and [vLLM](https://github.com/vllm-project/vllm) for providing the essential reinforcement learning framework and inference infrastructure, respectively, that enabled this research.\n\n## Citation\n\nIf you find this work useful, please cite our paper:\n\n```bibtex\n\n@misc{li2025limrrlscaling,\n      title={LIMR: Less is More for RL Scaling}, \n      author={Xuefeng Li and Haoyang Zou and Pengfei Liu},\n      year={2025},\n      eprint={2502.11886},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https://arxiv.org/abs/2502.11886}, \n}\n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FGAIR-NLP%2FLIMR","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FGAIR-NLP%2FLIMR","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FGAIR-NLP%2FLIMR/lists"}