{"id":13408879,"url":"https://github.com/huggingface/trl","last_synced_at":"2026-01-24T05:20:38.670Z","repository":{"id":37749948,"uuid":"250510075","full_name":"huggingface/trl","owner":"huggingface","description":"Train transformer language models with reinforcement learning.","archived":false,"fork":false,"pushed_at":"2026-01-19T22:26:18.000Z","size":17890,"stargazers_count":17046,"open_issues_count":633,"forks_count":2430,"subscribers_count":97,"default_branch":"main","last_synced_at":"2026-01-20T01:33:55.329Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://hf.co/docs/trl","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/huggingface.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2020-03-27T10:54:55.000Z","updated_at":"2026-01-19T22:31:45.000Z","dependencies_parsed_at":"2023-12-15T11:31:41.865Z","dependency_job_id":"44da0cc6-a466-4230-9ebf-f2b1b0929b4d","html_url":"https://github.com/huggingface/trl","commit_stats":null,"previous_names":["huggingface/trl","lvwerra/trl"],"tags_count":73,"template":false,"template_full_name":"fastai/nbdev_template","purl":"pkg:github/huggingface/trl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Ftrl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Ftrl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Ftrl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Ftrl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/huggingface","download_url":"https://codeload.github.com/huggingface/trl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Ftrl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28712936,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-24T05:01:10.984Z","status":"ssl_error","status_checked_at":"2026-01-24T04:59:18.328Z","response_time":89,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T20:00:56.075Z","updated_at":"2026-01-24T05:20:38.665Z","avatar_url":"https://github.com/huggingface.png","language":"Python","readme":"# TRL - Transformer Reinforcement Learning\n\n\u003cdiv style=\"text-align: center\"\u003e\n    \u003cimg src=\"https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/trl_banner_dark.png\" alt=\"TRL Banner\"\u003e\n\u003c/div\u003e\n\n\u003chr\u003e \u003cbr\u003e\n\n\u003ch3 align=\"center\"\u003e\n    \u003cp\u003eA comprehensive library to post-train foundation models\u003c/p\u003e\n\u003c/h3\u003e\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/huggingface/trl/blob/main/LICENSE\"\u003e\u003cimg alt=\"License\" src=\"https://img.shields.io/github/license/huggingface/trl.svg?color=blue\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://huggingface.co/docs/trl/index\"\u003e\u003cimg alt=\"Documentation\" src=\"https://img.shields.io/website?label=documentation\u0026url=https%3A%2F%2Fhuggingface.co%2Fdocs%2Ftrl%2Findex\u0026down_color=red\u0026down_message=offline\u0026up_color=blue\u0026up_message=online\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/huggingface/trl/releases\"\u003e\u003cimg alt=\"GitHub release\" src=\"https://img.shields.io/github/release/huggingface/trl.svg\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://huggingface.co/trl-lib\"\u003e\u003cimg alt=\"Hugging Face Hub\" src=\"https://img.shields.io/badge/🤗%20Hub-trl--lib-yellow\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n## 🎉 What's New\n\n**OpenEnv Integration:** TRL now supports **[OpenEnv](https://huggingface.co/blog/openenv)**, the open-source framework from Meta for defining, deploying, and interacting with environments in reinforcement learning and agentic workflows.\n\nExplore how to seamlessly integrate TRL with OpenEnv in our [dedicated documentation](https://huggingface.co/docs/trl/openenv).\n\n## Overview\n\nTRL is a cutting-edge library designed for post-training foundation models using advanced techniques like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), and Direct Preference Optimization (DPO). Built on top of the [🤗 Transformers](https://github.com/huggingface/transformers) ecosystem, TRL supports a variety of model architectures and modalities, and can be scaled-up across various hardware setups.\n\n## Highlights\n\n- **Trainers**: Various fine-tuning methods are easily accessible via trainers like [`SFTTrainer`](https://huggingface.co/docs/trl/sft_trainer), [`GRPOTrainer`](https://huggingface.co/docs/trl/grpo_trainer), [`DPOTrainer`](https://huggingface.co/docs/trl/dpo_trainer), [`RewardTrainer`](https://huggingface.co/docs/trl/reward_trainer) and more.\n\n- **Efficient and scalable**:\n  - Leverages [🤗 Accelerate](https://github.com/huggingface/accelerate) to scale from single GPU to multi-node clusters using methods like [DDP](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) and [DeepSpeed](https://github.com/deepspeedai/DeepSpeed).\n  - Full integration with [🤗 PEFT](https://github.com/huggingface/peft) enables training on large models with modest hardware via quantization and LoRA/QLoRA.\n  - Integrates [🦥 Unsloth](https://github.com/unslothai/unsloth) for accelerating training using optimized kernels.\n\n- **Command Line Interface (CLI)**: A simple interface lets you fine-tune with models without needing to write code.\n\n## Installation\n\n### Python Package\n\nInstall the library using `pip`:\n\n```bash\npip install trl\n```\n\n### From source\n\nIf you want to use the latest features before an official release, you can install TRL from source:\n\n```bash\npip install git+https://github.com/huggingface/trl.git\n```\n\n### Repository\n\nIf you want to use the examples you can clone the repository with the following command:\n\n```bash\ngit clone https://github.com/huggingface/trl.git\n```\n\n## Quick Start\n\nFor more flexibility and control over training, TRL provides dedicated trainer classes to post-train language models or PEFT adapters on a custom dataset. Each trainer in TRL is a light wrapper around the 🤗 Transformers trainer and natively supports distributed training methods like DDP, DeepSpeed ZeRO, and FSDP.\n\n### `SFTTrainer`\n\nHere is a basic example of how to use the [`SFTTrainer`](https://huggingface.co/docs/trl/sft_trainer):\n\n```python\nfrom trl import SFTTrainer\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"trl-lib/Capybara\", split=\"train\")\n\ntrainer = SFTTrainer(\n    model=\"Qwen/Qwen2.5-0.5B\",\n    train_dataset=dataset,\n)\ntrainer.train()\n```\n\n### `GRPOTrainer`\n\n[`GRPOTrainer`](https://huggingface.co/docs/trl/grpo_trainer) implements the [Group Relative Policy Optimization (GRPO) algorithm](https://huggingface.co/papers/2402.03300) that is more memory-efficient than PPO and was used to train [Deepseek AI's R1](https://huggingface.co/deepseek-ai/DeepSeek-R1).\n\n```python\nfrom datasets import load_dataset\nfrom trl import GRPOTrainer\nfrom trl.rewards import accuracy_reward\n\ndataset = load_dataset(\"trl-lib/DeepMath-103K\", split=\"train\")\n\ntrainer = GRPOTrainer(\n    model=\"Qwen/Qwen2.5-0.5B-Instruct\",\n    reward_funcs=accuracy_reward,\n    train_dataset=dataset,\n)\ntrainer.train()\n```\n\n\u003e [!NOTE]\n\u003e For reasoning models, use the `reasoning_accuracy_reward()` function for better results.\n\n### `DPOTrainer`\n\n[`DPOTrainer`](https://huggingface.co/docs/trl/dpo_trainer) implements the popular [Direct Preference Optimization (DPO) algorithm](https://huggingface.co/papers/2305.18290) that was used to post-train [Llama 3](https://huggingface.co/papers/2407.21783) and many other models. Here is a basic example of how to use the `DPOTrainer`:\n\n```python\nfrom datasets import load_dataset\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nfrom trl import DPOConfig, DPOTrainer\n\nmodel = AutoModelForCausalLM.from_pretrained(\"Qwen/Qwen2.5-0.5B-Instruct\")\ntokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen2.5-0.5B-Instruct\")\ndataset = load_dataset(\"trl-lib/ultrafeedback_binarized\", split=\"train\")\ntraining_args = DPOConfig(output_dir=\"Qwen2.5-0.5B-DPO\")\ntrainer = DPOTrainer(\n    model=model,\n    args=training_args,\n    train_dataset=dataset,\n    processing_class=tokenizer\n)\ntrainer.train()\n```\n\n### `RewardTrainer`\n\nHere is a basic example of how to use the [`RewardTrainer`](https://huggingface.co/docs/trl/reward_trainer):\n\n```python\nfrom trl import RewardTrainer\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"trl-lib/ultrafeedback_binarized\", split=\"train\")\n\ntrainer = RewardTrainer(\n    model=\"Qwen/Qwen2.5-0.5B-Instruct\",\n    train_dataset=dataset,\n)\ntrainer.train()\n```\n\n## Command Line Interface (CLI)\n\nYou can use the TRL Command Line Interface (CLI) to quickly get started with post-training methods like Supervised Fine-Tuning (SFT) or Direct Preference Optimization (DPO):\n\n**SFT:**\n\n```bash\ntrl sft --model_name_or_path Qwen/Qwen2.5-0.5B \\\n    --dataset_name trl-lib/Capybara \\\n    --output_dir Qwen2.5-0.5B-SFT\n```\n\n**DPO:**\n\n```bash\ntrl dpo --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \\\n    --dataset_name argilla/Capybara-Preferences \\\n    --output_dir Qwen2.5-0.5B-DPO \n```\n\nRead more about CLI in the [relevant documentation section](https://huggingface.co/docs/trl/clis) or use `--help` for more details.\n\n## Development\n\nIf you want to contribute to `trl` or customize it to your needs make sure to read the [contribution guide](https://github.com/huggingface/trl/blob/main/CONTRIBUTING.md) and make sure you make a dev install:\n\n```bash\ngit clone https://github.com/huggingface/trl.git\ncd trl/\npip install -e .[dev]\n```\n\n## Experimental\n\nA minimal incubation area is available under `trl.experimental` for unstable / fast-evolving features. Anything there may change or be removed in any release without notice.\n\nExample:\n\n```python\nfrom trl.experimental.new_trainer import NewTrainer\n```\n\nRead more in the [Experimental docs](https://huggingface.co/docs/trl/experimental_overview).\n\n## Citation\n\n```bibtex\n@software{vonwerra2020trl,\n  title   = {{TRL: Transformers Reinforcement Learning}},\n  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},\n  license = {Apache-2.0},\n  url     = {https://github.com/huggingface/trl},\n  year    = {2020}\n}\n```\n\n## License\n\nThis repository's source code is available under the [Apache-2.0 License](LICENSE).\n","funding_links":[],"categories":["Models and Tools","Python","Tools","Training","Fine Tuning ([home](#awesome-llm))","By Language","A01_文本生成_文本对话","NLP","Fine-Tuning \u0026 Training","🚀 RL \u0026 LLM Fine-Tuning Repositories","4. 算法","Industry Strength Reinforcement Learning","Sec5.2 Frameworks","Repos","Model Training \u0026 Fine-tuning","🛠️ AI 工具与框架","Fine-tuning \u0026 Quantization (18)","Graph Machine Learning","📋 List of Open-Source Projects","LLM Training / Finetuning","Training \u0026 Fine-Tuning","Language Models","LLM Alignment (RLHF / DPO)"],"sub_categories":["LLM Finetuning","Foundation Model Fine Tuning","Python","大语言对话模型及数据","Fine-Tuning Frameworks","4.2 Reinforcement Learning","LangManus","模型微调","Others","LLM Infra and Optimization","Fine-Tuning Tools","Fine-tuning"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuggingface%2Ftrl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhuggingface%2Ftrl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuggingface%2Ftrl/lists"}