{"id":28676589,"url":"https://github.com/zjunlp/trice","last_synced_at":"2025-06-13T23:05:18.451Z","repository":{"id":182851615,"uuid":"633896783","full_name":"zjunlp/TRICE","owner":"zjunlp","description":"[NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback","archived":false,"fork":false,"pushed_at":"2024-03-14T06:49:38.000Z","size":16159,"stargazers_count":26,"open_issues_count":0,"forks_count":2,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-03-14T17:52:12.601Z","etag":null,"topics":["agent","execution","feedback","knowlm","large-language-models","natur","natural-language-processing","reasoning","reinforcement-learning","tool-learning","tools","trice"],"latest_commit_sha":null,"homepage":"https://zjunlp.github.io/project/TRICE/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zjunlp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-04-28T14:30:13.000Z","updated_at":"2024-03-14T05:29:14.000Z","dependencies_parsed_at":"2023-07-21T21:26:43.644Z","dependency_job_id":"9ec56d9b-9991-45d2-bc5f-b3dc8aa6b4b7","html_url":"https://github.com/zjunlp/TRICE","commit_stats":null,"previous_names":["zjunlp/trice"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zjunlp/TRICE","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FTRICE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FTRICE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FTRICE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FTRICE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zjunlp","download_url":"https://codeload.github.com/zjunlp/TRICE/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FTRICE/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259732770,"owners_count":22903087,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","execution","feedback","knowlm","large-language-models","natur","natural-language-processing","reasoning","reinforcement-learning","tool-learning","tools","trice"],"created_at":"2025-06-13T23:05:14.743Z","updated_at":"2025-06-13T23:05:18.444Z","avatar_url":"https://github.com/zjunlp.png","language":"Python","readme":"# TRICE\n\nCode and datasets for the NAACL2024 paper \"[Making Language Models Better Tool Learners with Execution Feedback](figs/paper.pdf)\".\n\n## 🔔 News\n\n- We upload a [tutorial](./tutorial/tutorial.pdf) entitled \"From Chain-of-Thought to LLM Powered Autonomous Agent\".\n\n## Quick Links\n\n* [TRICE](#TRICE)\n  * [Overview](#overview)\n  * [Installation](#installation)\n  * [Tasks and Datasets](#tasks-and-datasets)\n    * [Data Generation](#data-generation)\n  * [Train](#train)\n    * [Training Stage1](#training-stage1)\n    * [Training Stage2](#training-stage2)\n  * [Evaluate](#evaluate)\n  * [Acknowledgement](#Acknowledgement)\n  * [Citation](#citation)\n\n## Overview\n\n\u003cimg src=\"figs/figure1.gif\" alt=\"figure1\" style=\"zoom: 20%;\" /\u003e\n\nIn this paper, we focus on addressing the challenge of selective utilization of tools by LLMs and propose a two-stage end-to-end training framework dubbed **TRICE** (**T**ool Lea**R**ning w**I**th Exe**C**ution F**E**edback) to make LLMs better tool learners with execution feedback. An overview of our proposed training method can be seen as follows:\n\n\u003cimg src=\"figs/method.png\" alt=\"method\" style=\"zoom: 13%;\" /\u003e\n\nIn stage-1 (**Behavior Cloning**), we conduct instruct-tuning on the dataset to let the model imitate the tool-using behavior.  In stage-2 (**RLEF**), we further reinforce the model with tool execution feedback by aligning it with desirable candidate responses.\n\n## Installation\n\n```bash\ngit clone https://github.com/zjunlp/TRICE.git\ncd TRICE\npip install -r requirements.txt\n```\n\n## Tasks and Datasets\n\nWe mainly evaluate our method on four tasks with each task specified to an external tool.\n\n\u003cimg src=\"figs/task.png\" alt=\"task\" style=\"zoom:30%;\" /\u003e\n\nDue to limited computational resources, we randomly sample train and test sets from each dataset to reduce the data scale. **We release the mixed training data for Vicuna-7B and the test set for each task in [Google Drive](https://drive.google.com/drive/folders/1rqBrVcOl1ykFDd7g71xNwt9Q194L67DJ?usp=sharing).** We display the detailed data distribution for each task as follows:\n\n\u003cimg src=\"figs/task_dis.png\" alt=\"task_dis\" style=\"zoom: 15%;\" /\u003e\n\n### Data Generation\n\nGiven the lack of gold tool API labels, we utilize ChatGPT to automatically generate tool APIs for training stage-1. For training stage-2, we collect five responses for each question from four different models, e.g. ChatGPT, InstuctGPT, Vicuna-7B, Alpaca-7B, and the output of the training data in Behavior Cloning stage as the pseudo-human-expert response. For the detailed data generation process, please refer to [here](https://github.com/zjunlp/TRICE/tree/main/generate_data).\n\nThe generated data should be placed under the `data` folder with the following structure:\n\n```\ndata\n |-- raw  # original dataset\n |    |-- math\n |    |    |-- math.json  # all the data for this task\n |    |    |-- GSM8K_right.json  # questions with right answers\n |    |    |-- GSM8K_wrong.json  # questions with wrong answers\n |    |    |-- ...  # other datasets\n |    |-- ...  # other tasks\n |-- stage1  # training data for stage1\n |    |-- math\n |    |    |-- math.json  # all the training data for this task (can be directly used to train)\n |    |-- ...\n |    |-- mix  # mixed training data for all tasks\n |-- stage2  # training data for stage2\n |    |-- math\n |    |    |-- math.json  # all the training data for this task (can be directly used to train)\n |    |    |-- math_gold_response.json\n |    |    |-- math_chatgpt_response.json\n |    |    |-- math_davinci_response.json\n |    |    |-- math_vicuna_response.json\n |    |    |-- math_alpaca_response.json\n |    |-- ...\n |-- dev  # test dataset for each task\n |    |-- math\n |    |    |-- math.json\n |    |-- ...\n```\n\n## Train\n\n**We train all the models with [LoRA](https://arxiv.org/pdf/2106.09685.pdf).** Here we release the code to train [Vicuna](https://github.com/lm-sys/FastChat) and [Alpaca](https://github.com/tatsu-lab/stanford_alpaca) (thanks to the [alpaca-lora](https://github.com/tloen/alpaca-lora) and [RRHF](https://github.com/GanjinZero/RRHF) program). For code to train [ChatGLM](https://github.com/THUDM/ChatGLM-6B), you can refer to [here](https://github.com/mymusise/ChatGLM-Tuning) for help.\n\n### Training Stage1\n\nIn stage 1, we train the model in an instruct-tuning manner.\n\n```bash\ncd train\npython train_stage1.py \\\n    --base_model ../PLMs/vicuna-7b \\\n    --data_path ../data/stage1/math/math.json \\\n    --output_dir vicuna-lora/stage1/math \\\n    --batch_size 1024 \\\n    --micro_batch_size 128 \\\n    --num_epochs 5 \\\n    --learning_rate 1e-4 \\\n    --cutoff_len 512 \\\n    --prompt_template_name vicuna  # set to 'alpaca' for Alpaca-7B\n```\n\n### Training Stage2\n\nIn stage 2, we train the model by reinforcement learning based on the execution feedback.\n\n```bash\ncd train\npython train_stage2.py \\\n    --model_name_or_path ../PLMs/vicuna-7b \\\n    --resume_from_checkpoint vicuna-lora/stage1/math \\\n    --data_path ../data/stage2/math.json \\\n    --fp16 True \\\n    --output_dir vicuna-lora/stage2/math \\\n    --num_train_epochs 2 \\\n    --per_device_train_batch_size 8 \\\n    --per_device_eval_batch_size 8 \\\n    --gradient_accumulation_steps 32 \\\n    --evaluation_strategy \"no\" \\\n    --save_strategy \"steps\" \\\n    --save_steps 10 \\\n    --save_total_limit 100 \\\n    --learning_rate 2e-5 \\\n    --lr_scheduler_type \"cosine\" \\\n    --logging_steps 1 \\\n    --model_max_length 512 \\\n    --rrhf_weight 1 \\\n    --remove_unused_columns False\n```\n\n🍓We provide the best LoRA checkpoint for Vicuna-7B at [Google Drive](https://drive.google.com/drive/folders/14-pl8Vkx2_ohn53fgnLFHCE7OpxK9YgE?usp=sharing).\n\n## Evaluate\n\nBefore evaluating the model on the test set, you should first generate the response.\n\n```bash\ncd evaluate\npython generate.py \\\n    --base_model ../PLMs/vicuna-7b \\\n    --task math \\\n    --data_path ../data/dev/math/math.json \\\n    --lora_weights ../train/vicuna-lora/stage2/math \\\n    --output_path ../data/dev/math/math_response.json \\\n    --prompt_template vicuna\n```\n\nThen evaluate the model performance based on the generated responses.\n\n```bash\npython evaluate.py \\\n    --task math \\\n    --model vicuna \\\n    --lora_weights ../train/vicuna-lora/stage2/math \\\n    --data_path ../data/dev/math/math_response.json \\\n    --target_path ../data/dev/math/math_result.json\n```\n\n## Acknowledgement\n\nOur training code of stage-I for Vicuna and Alpaca references [alpaca-lora](https://github.com/tloen/alpaca-lora) and for ChatGLM references [ChatGLM-Tuning](https://github.com/mymusise/ChatGLM-Tuning). The training code of stage-II for all the models references [RRHF](https://github.com/GanjinZero/RRHF). Thanks for their great jobs!\n\n## Citation\n\nIf you use or extend our work, please cite the paper as follows:\n\n```bibtex\n@article{qiao2023trice,\n  author       = {Shuofei Qiao and Honghao Gui and Qianghuai Jia and Huajun Chen and Ningyu Zhang},\n  title        = {Making Language Models Better Tool Learners with Execution Feedback},\n  journal      = {CoRR},\n  year         = {2023},\n  eprinttype   = {arXiv},\n  eprint       = {2305.13068},\n}\n```\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2Ftrice","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzjunlp%2Ftrice","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2Ftrice/lists"}