{"id":28702274,"url":"https://github.com/modelscope/trinity-rft","last_synced_at":"2025-06-14T12:32:18.944Z","repository":{"id":289012122,"uuid":"963030058","full_name":"modelscope/Trinity-RFT","owner":"modelscope","description":"Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (LLM).","archived":false,"fork":false,"pushed_at":"2025-06-11T04:10:28.000Z","size":16998,"stargazers_count":118,"open_issues_count":6,"forks_count":15,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-06-11T05:21:31.219Z","etag":null,"topics":["agent","llm","rlhf"],"latest_commit_sha":null,"homepage":"https://modelscope.github.io/Trinity-RFT/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/modelscope.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-09T03:46:14.000Z","updated_at":"2025-06-11T04:06:38.000Z","dependencies_parsed_at":"2025-06-03T08:09:52.850Z","dependency_job_id":"9a8f9cc4-621c-42ef-9905-5268d397c3be","html_url":"https://github.com/modelscope/Trinity-RFT","commit_stats":null,"previous_names":["modelscope/trinity-rft"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/modelscope/Trinity-RFT","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2FTrinity-RFT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2FTrinity-RFT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2FTrinity-RFT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2FTrinity-RFT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/modelscope","download_url":"https://codeload.github.com/modelscope/Trinity-RFT/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2FTrinity-RFT/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259816207,"owners_count":22915834,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","llm","rlhf"],"created_at":"2025-06-14T12:31:01.514Z","updated_at":"2025-06-14T12:32:18.931Z","avatar_url":"https://github.com/modelscope.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\n\u003c!-- ![trinity-rft](./docs/sphinx_doc/assets/trinity-title.png) --\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://img.alicdn.com/imgextra/i1/O1CN01lvLpfw25Pl4ohGZnU_!!6000000007519-2-tps-1628-490.png\" alt=\"Trinity-RFT\" style=\"height: 120px;\"\u003e\n\u003c/div\u003e\n\n\u0026nbsp;\n\n\u003cdiv align=\"center\"\u003e\n\n[![paper](http://img.shields.io/badge/cs.LG-2505.17826-B31B1B?logo=arxiv\u0026logoColor=red)](https://arxiv.org/abs/2505.17826)\n[![doc](https://img.shields.io/badge/Docs-blue?logo=markdown)](https://modelscope.github.io/Trinity-RFT/)\n[![pypi](https://img.shields.io/pypi/v/trinity-rft?logo=pypi\u0026color=026cad)](https://pypi.org/project/trinity-rft/0.1.0/)\n![license](https://img.shields.io/badge/license-Apache--2.0-000000.svg)\n\n\u003c/div\u003e\n\n\n**Trinity-RFT is a general-purpose, flexible, scalable and user-friendly framework designed for reinforcement fine-tuning (RFT) of large language models (LLM).**\n\n\nBuilt with a decoupled design, seamless integration for agent-environment interaction, and systematic data processing pipelines, Trinity-RFT can be easily adapted for diverse application scenarios, and serve as a unified platform for exploring advanced reinforcement learning (RL) paradigms.\n\n\n\n\n\n## Vision of this project\n\n\nCurrent RFT approaches, such as RLHF (Reinforcement Learning from Human Feedback) with proxy reward models or training long-CoT reasoning models with rule-based rewards, are limited in their ability to handle dynamic, real-world, and continuous learning.\n\nTrinity-RFT envisions a future where AI agents learn by interacting directly with environments, collecting delayed or complex reward signals, and continuously refining their behavior through RL.\n\n\nFor example, imagine an AI scientist that designs an experiment, executes it, waits for feedback (while working on other tasks concurrently), and iteratively updates itself based on true environmental rewards when the experiment is finally finished.\n\n\nTrinity-RFT offers a path into this future by providing various useful features.\n\n\n\n\n\n## Key features\n\n\n\n+ **Unified RFT modes \u0026 algorithm support.**\nTrinity-RFT unifies and generalizes existing RFT methodologies into a flexible and configurable framework, supporting synchronous/asynchronous, on-policy/off-policy, and online/offline training, as well as hybrid modes that combine them seamlessly into a single learning process.\n\n\n+ **Agent-environment interaction as a first-class citizen.**\nTrinity-RFT allows delayed rewards in multi-step/time-lagged feedback loops, handles long-tailed latencies and environment/agent failures gracefully, and supports distributed deployment where explorers and trainers can operate across separate devices and scale up independently.\n\n\n\n+ **Data processing pipelines optimized for RFT with diverse/messy data.**\nThese include converting raw datasets to task sets for RL, cleaning/filtering/prioritizing experiences stored in the replay buffer, synthesizing data for tasks and experiences, offering user interfaces for human in the loop, etc.\n\n\n\n## The design of Trinity-RFT\n\n\n\u003c!-- ![design](./docs/sphinx_doc/assets/trinity-design.png) --\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://img.alicdn.com/imgextra/i2/O1CN01X5jFm81peNsADtRt2_!!6000000005385-2-tps-3298-1498.png\" alt=\"Trinity-RFT\"\u003e\n\u003c/div\u003e\n\n\n\n\n\nThe overall design of Trinity-RFT exhibits a trinity:\n+ RFT-core;\n+ agent-environment interaction;\n+ data processing pipelines;\n\nand the design of RFT-core also exhibits a trinity:\n+ explorer;\n+ trainer;\n+ buffer.\n\n\n\nThe *explorer*, powered by the rollout model, interacts with the environment and generates rollout trajectories to be stored in the experience buffer.\n\nThe *trainer*, powered by the policy model, samples batches of experiences from the buffer and updates the policy model via RL algorithms.\n\nThese two can be completely decoupled and act asynchronously on separate machines, except that they share the same experience buffer, and their model weights are synchronized once in a while.\nSuch a decoupled design is crucial for making the aforementioned features of Trinity-RFT possible.\n\n\u003c!-- e.g., flexible and configurable RFT modes (on-policy/off-policy, synchronous/asynchronous, immediate/lagged rewards),\nfault tolerance for failures of explorer (agent/environment) or trainer,\nhigh efficiency in the presence of long-tailed rollout latencies,\ndata processing pipelines and human in the loop of RFT (e.g., via acting on the experience buffer, which is implemented as a persistent database),\namong others. --\u003e\n\n\n\nMeanwhile, Trinity-RFT has done a lot of work to ensure high efficiency and robustness in every component of the framework,\ne.g., utilizing NCCL (when feasible) for model weight synchronization, sequence concatenation with proper masking for multi-turn conversations and ReAct-style workflows, pipeline parallelism for the synchronous RFT mode,\nasynchronous and concurrent LLM inference for rollout,\nfault tolerance for agent/environment failures,\namong many others.\n\n\n\n## Getting started\n\n\n\u003e [!NOTE]\n\u003e This project is currently under active development. Comments and suggestions are welcome!\n\n\n\n\n### Step 1: preparations\n\n\n\n\nInstallation from source (recommended):\n\n```shell\n# Pull the source code from GitHub\ngit clone https://github.com/modelscope/Trinity-RFT\ncd Trinity-RFT\n\n# Create a new environment using Conda or venv\n# Option 1: Conda\nconda create -n trinity python=3.10\nconda activate trinity\n\n# Option 2: venv\npython3.10 -m venv .venv\nsource .venv/bin/activate\n\n# Install the package in editable mode\n# for bash\npip install -e .[dev]\n# for zsh\npip install -e .\\[dev\\]\n\n# Install flash-attn after all dependencies are installed\n# Note: flash-attn will take a long time to compile, please be patient.\npip install flash-attn -v\n# Try the following command if you encounter errors during installation\n# pip install flash-attn -v --no-build-isolation\n```\n\nInstallation using pip:\n\n```shell\npip install trinity-rft==0.1.0\n```\n\nInstallation from docker:\nwe have provided a dockerfile for Trinity-RFT (trinity)\n\n```shell\ngit clone https://github.com/modelscope/Trinity-RFT\ncd Trinity-RFT\n\n# build the docker image\n# Note: you can edit the dockerfile to customize the environment\n# e.g., use pip mirrors or set api key\ndocker build -f scripts/docker/Dockerfile -t trinity-rft:latest .\n\n# run the docker image\ndocker run -it --gpus all --shm-size=\"64g\" --rm -v $PWD:/workspace -v \u003croot_path_of_data_and_checkpoints\u003e:/data trinity-rft:latest\n```\n\n\nTrinity-RFT requires\nPython version \u003e= 3.10,\nCUDA version \u003e= 12.4,\nand at least 2 GPUs.\n\n\n### Step 2: prepare dataset and model\n\n\nTrinity-RFT supports most datasets and models from Huggingface and ModelScope.\n\n\n**Prepare the model** in the local directory `$MODEL_PATH/{model_name}`:\n\n```bash\n# Using Huggingface\nhuggingface-cli download {model_name} --local-dir $MODEL_PATH/{model_name}\n\n# Using Modelscope\nmodelscope download {model_name} --local_dir $MODEL_PATH/{model_name}\n```\n\nFor more details about model downloading, please refer to [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli) or  [ModelScope](https://modelscope.cn/docs/models/download).\n\n\n\n**Prepare the dataset** in the local directory `$DATASET_PATH/{dataset_name}`:\n\n```bash\n# Using Huggingface\nhuggingface-cli download {dataset_name} --repo-type dataset --local-dir $DATASET_PATH/{dataset_name}\n\n# Using Modelscope\nmodelscope download --dataset {dataset_name} --local_dir $DATASET_PATH/{dataset_name}\n```\n\nFor more details about dataset downloading, please refer to [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli#download-a-dataset-or-a-space) or [ModelScope](https://modelscope.cn/docs/datasets/download).\n\n\n\n### Step 3: configurations\n\n\nFor convenience, Trinity-RFT provides a web interface for configuring your RFT process.\n\n\u003e [!NOTE]\n\u003e This is an experimental feature, and we will continue to improve it.\n\n\nTo enable *minimal* features (mainly for trainer), you can run\n```bash\ntrinity studio --port 8080\n```\nThen you can configure your RFT process in the web page and generate a config file. You can save the config for later use or run it directly as described in the following section.\n\nAdvanced users can also configure the RFT process by editing the config file directly.\nWe provide a set of example config files in [`examples`](examples/).\n\nTo enable *complete* visualization features, please refer to the monorepo for [Trinity-Studio](https://github.com/modelscope/Trinity-Studio).\n\n\n### Step 4: run the RFT process\n\n\nFirst, start a ray cluster with the following command:\n\n```shell\n# On master node\nray start --head\n\n# On worker nodes\nray start --address=\u003cmaster_address\u003e\n```\n\nOptionally, we can login into [wandb](https://docs.wandb.ai/quickstart/) to better monitor the RFT process:\n\n```shell\nexport WANDB_API_KEY=\u003cyour_api_key\u003e\nwandb login\n```\n\nThen, for command-line users, run the RFT process with the following command:\n\n```shell\ntrinity run --config \u003cconfig_path\u003e\n```\n\n\u003e For example, below is the command for fine-tuning Qwen-2.5-1.5B-Instruct on GSM8k dataset using GRPO algorithm:\n\u003e ```shell\n\u003e trinity run --config examples/grpo_gsm8k/gsm8k.yaml\n\u003e ```\n\nFor studio users, just click the \"Run\" button in the web page.\n\n\nFor more detailed examples about how to use Trinity-RFT, please refer to the following tutorials:\n+ [A quick example with GSM8k](./docs/sphinx_doc/source/tutorial/example_reasoning_basic.md)\n+ [Off-policy mode of RFT](./docs/sphinx_doc/source/tutorial/example_reasoning_advanced.md)\n+ [Asynchronous mode of RFT](./docs/sphinx_doc/source/tutorial/example_async_mode.md)\n+ [Multi-turn tasks](./docs/sphinx_doc/source/tutorial/example_multi_turn.md)\n+ [Offline learning by DPO](./docs/sphinx_doc/source/tutorial/example_dpo.md)\n+ [Advanced data processing / human-in-the-loop](./docs/sphinx_doc/source/tutorial/example_data_functionalities.md)\n\n\n\n\n\n## Advanced usage and full configurations\n\n\nPlease refer to [this document](./docs/sphinx_doc/source/tutorial/trinity_configs.md).\n\n\n\n\n\n## Programming guide for developers\n\n\nPlease refer to [this document](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.md).\n\n\n## Upcoming features\n\nA tentative roadmap: https://github.com/modelscope/Trinity-RFT/issues/51\n\n\n\n## Contribution guide\n\n\nThis project is currently under active development, and we welcome contributions from the community!\n\n\nCode style check:\n\n```shell\npre-commit run --all-files\n```\n\n\n\nUnit tests:\n\n```shell\npython -m pytest tests\n```\n\n\n\n## Acknowledgements\n\n\nThis project is built upon many excellent open-source projects, including:\n\n+ [verl](https://github.com/volcengine/verl) and [PyTorch's FSDP](https://pytorch.org/docs/stable/fsdp.html) for LLM training;\n+ [vLLM](https://github.com/vllm-project/vllm) for LLM inference;\n+ [Data-Juicer](https://github.com/modelscope/data-juicer?tab=readme-ov-file) for data processing pipelines;\n+ [AgentScope](https://github.com/modelscope/agentscope) for agentic workflow;\n+ [Ray](https://github.com/ray-project/ray) for distributed systems;\n+ we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl) and [ChatLearn](https://github.com/alibaba/ChatLearn);\n+ ......\n\n\n\n\n\n## Citation\n```plain\n@misc{trinity-rft,\n      title={Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models},\n      author={Xuchen Pan and Yanxi Chen and Yushuo Chen and Yuchang Sun and Daoyuan Chen and Wenhao Zhang and Yuexiang Xie and Yilun Huang and Yilei Zhang and Dawei Gao and Yaliang Li and Bolin Ding and Jingren Zhou},\n      year={2025},\n      eprint={2505.17826},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https://arxiv.org/abs/2505.17826},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodelscope%2Ftrinity-rft","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmodelscope%2Ftrinity-rft","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodelscope%2Ftrinity-rft/lists"}