{"id":15028334,"url":"https://github.com/thudm/agenttuning","last_synced_at":"2025-05-16T01:05:12.718Z","repository":{"id":202564053,"uuid":"706842035","full_name":"THUDM/AgentTuning","owner":"THUDM","description":"AgentTuning: Enabling Generalized Agent Abilities for LLMs","archived":false,"fork":false,"pushed_at":"2023-10-31T15:34:43.000Z","size":29350,"stargazers_count":1435,"open_issues_count":16,"forks_count":101,"subscribers_count":16,"default_branch":"main","last_synced_at":"2025-05-16T01:05:07.398Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://thudm.github.io/AgentTuning/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/THUDM.png","metadata":{"files":{"readme":"README-zh.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-10-18T18:10:52.000Z","updated_at":"2025-05-15T04:03:23.000Z","dependencies_parsed_at":"2023-10-23T06:25:20.787Z","dependency_job_id":"e4041be4-e26f-4714-bed3-18b5e3a844fe","html_url":"https://github.com/THUDM/AgentTuning","commit_stats":null,"previous_names":["thudm/agenttuning"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FAgentTuning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FAgentTuning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FAgentTuning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FAgentTuning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/THUDM","download_url":"https://codeload.github.com/THUDM/AgentTuning/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254448579,"owners_count":22072764,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-09-24T20:08:04.442Z","updated_at":"2025-05-16T01:05:07.608Z","avatar_url":"https://github.com/THUDM.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AgentTuning: Enabling Generalized Agent Abilities For LLMs\n\n\u003cp align=\"center\"\u003e\n🤗 \u003ca href=\"https://huggingface.co/THUDM/agentlm-70b\" target=\"_blank\"\u003e模型 (AgentLM-70B)\u003c/a\u003e • 🤗 \u003ca href=\"https://huggingface.co/datasets/THUDM/AgentInstruct\" target=\"_blank\"\u003e数据集 (AgentInstruct)\u003c/a\u003e • 📃 \u003ca href=\"https://arxiv.org/abs/2310.12823\" target=\"_blank\"\u003e论文\u003c/a\u003e • 🌐 \u003ca href=\"https://thudm.github.io/AgentTuning/\" target=\"_blank\"\u003e项目主页\u003c/a\u003e \u003cbr\u003e\n\u003c/p\u003e\n\u003ccenter\u003e\u003cimg src=\"assets/main-figure.svg\" alt=\"main-figure\" style=\"zoom:50%;\" /\u003e\u003c/center\u003e\n\n**AgentTuning** 是首次利用多个 Agent 任务交互轨迹对 LLM 进行指令调整的方法。评估结果表明，**AgentTuning** 让 LLM 在**未见过**的 Agent 任务中也展现出强大的泛化能力，同时**通用语言能力**也基本保持不变。\n\n**AgentInstruct** 数据集和 **AgentLM** 模型均已开源。\n\n## 主要结果\n\n\u003ccenter\u003e\u003cimg src=\"assets/head-figure.svg\" alt=\"head-figure\" width=\"1500\" /\u003e\u003c/center\u003e\n\n\u003ccenter\u003e\u003cb\u003eFigure 1\u003c/b\u003e\u0026nbsp;\u0026nbsp; 在 held-in 和 held-out 任务上的总得分\u003c/center\u003e\n\n## AgentInstruct\n\n**AgentInstruct** 是一个经过挑选的智能体数据集，包含 **1866** 个高质量交互、**6** 个多样化的真实场景任务，用于增强语言模型的 Agent 能力，有如下特性\n\n- 🔍 **思维链** - 采用 [ReAct](http://arxiv.org/abs/2210.03629) 提示词策略，为每步操作提供详细的思维链，深入理解模型决策过程\n\n- 🌍 **多样性** - 涵盖 6 个现实世界场景，包括日常家务到操作数据库，平均回合数 5 ~ 35 不等。\n\n- 🎯 **精确性** - GPT-4 也不能完全做对智能体任务，使用轨迹奖励机制对数据严格筛选，确保每条数据的质量。\n\n- ✅ **泛化性** - 严格检查，避免数据泄露，保证数据的泛化性\n\n**AgentInstruct** 数据集开源在 [🤗Huggingface Repo](https://huggingface.co/datasets/THUDM/AgentInstruct)\n\n## AgentLM\n\n**AgentLM** 由 Llama2-chat 开源模型系列在 **AgentInstruct**，**ShareGPT** 混合数据集上微调得到。模型遵循 [Llama-2-chat](https://huggingface.co/blog/llama2#how-to-prompt-llama-2) 的对话格式，系统提示词固定为 `You are a helpful, respectful and honest assistant.`。\n\n7B、13B 和 70B 模型开源网址如下\n\n|    Model    |                        Huggingface Repo                        |\n| :---------: | :------------------------------------------------------------: |\n| AgentLM-7B  | [🤗Huggingface Repo](https://huggingface.co/THUDM/agentlm-7b)  |\n| AgentLM-13B | [🤗Huggingface Repo](https://huggingface.co/THUDM/agentlm-13b) |\n| AgentLM-70B | [🤗Huggingface Repo](https://huggingface.co/THUDM/agentlm-70b) |\n\n## 运行 AgentLM\n\n使用 [Text-Generation-Inference](https://github.com/huggingface/text-generation-inference) 加速评测流程，启动一个 AgentLM-70b 实例：\n\n```bash\ncd docker\ndocker compose -f agentlm-70b.yml up\n```\n\n成功部署后的端口位于 `30070`，可以向其发送请求：\n\n```bash\ncurl 127.0.0.1:30070/generate \\\n    -X POST \\\n    -H 'Content-Type: application/json' \\\n    -d '{\"inputs\": \"[INST] \u003c\u003cSYS\u003e\u003e\\nYou are a helpful, respectful and honest assistant.\\n\u003c\u003c/SYS\u003e\u003e\\n\\nHello! [/INST]\", \"parameters\":{\"temperature\": 1.0}}'\n\n# {\"generated_text\":\"Hello! How can I help you today? \"}\n```\n\n可在 docker compose 文件后面增加更多端口，产生多个推理实例。\n\n## 评测\n\n模型评测包含 6 个 held-in 任务、6 个 held-out 任务、通用任务\n\n### Held-in 任务\n\n6 个保留任务来源于 [**AgentBench**](https://github.com/THUDM/AgentBench)。 但是，由于 AgentBench 仍在开发中，最新版本可能无法完全重现论文中报告的结果。\n\n本项目有关评测代码位于`./AgentBench.old` 文件夹中。\n\n### Held-out 任务\n\nHeld-out 任务来源于以下开源框架\n\n| 任务              | AgentTuning 评测脚本                                        | 原始仓库                                                     |\n| ----------------- | ----------------------------------------------------------- | ------------------------------------------------------------ |\n| SciWorld          | [📂 eval_heldout/science-world](eval_heldout/science-world/) | [💻 allenai/ScienceWorld](https://github.com/allenai/ScienceWorld) |\n| MiniWoB++         | [📂 eval_heldout/miniwob++](eval_heldout/miniwob++)          | [💻 Farama-Foundation/miniwob-plusplus](https://github.com/Farama-Foundation/miniwob-plusplus) |\n| HotpotQA          | [📂 eval_heldout/hotpotQA](eval/held_out/hotpotQA)           | [💻 salesforce/BOLAA](https://github.com/salesforce/BOLAA)    |\n| ReWOO             | [📂 eval_heldout/rewoo](eval_heldout/rewwo/)                 | [💻 billxbf/ReWOO](https://github.com/billxbf/ReWOO)          |\n| WebArena          | [📂 eval_heldout/webarena](eval_heldout/webarena/)           | [💻 web-arena-x/webarena](https://github.com/web-arena-x/webarena) |\n| Digital Card Game | [💻 AgentBench.old](./AgentBench.old) ( _Extend_ Split )     | [💻 THUDM/AgentBench](https://github.com/THUDM/AgentBench)    |\n\n### 通用任务\n\n**MMLU 配置**\n\n- 下载 14k 多项选择题到 `./data` 文件夹：\n  ```bash\n  cd data\n  wget https://people.eecs.berkeley.edu/~hendrycks/data.tar\n  tar xf data.tar\n  cd ..\n  ```\n- 执行以下代码评测 Hf 模型 MMLU 得分：\n  ```bash\n  python eval_general/evaluate_mmlu_hf.py -c THUDM/AgentLM-70b\n  ```\n\n**GSM8k 配置**\n\n- 部署 TGI\n- 运行以下代码评测 GSM8k\n\n  ```bash\n  python eval_general/evaluate_gsm8k_tgi.py --port 30070\n  ```\n\n  使用 `--sample-input-file` 可以加载本地数据，否则脚本会下载 [GSM8K](https://huggingface.co/datasets/gsm8k)  到本地\n\n**MT-Bench 配置**\n\n- 本地安装 [FastChat](https://github.com/lm-sys/FastChat)\n\n  ```bash\n  git clone https://github.com/lm-sys/FastChat.git\n  pip install -e FastChat\n  ```\n\n- 部署 TGI\n\n- 运行评测脚本\n\n  ```bash\n  python eval_general/eval_mt_bench_tgi.py --host 127.0.0.1 --port 30070 --model-id agentlm-70b\n  ```\n\n- 使用 GPT-4 评测回答\n  ```bash\n  cd FastChat/fastchat/llm_judge\n  OPENAI_API_KEY=\u003cyour-api-key\u003e python gen_judgment.py --model-list agentlm-70b --parallel \u003cnumber-of-cuncurrent-requests\u003e\n  ```\n\n## 引用\n\n如果你觉得我们的工作有帮助的话，请考虑引用下列论文\n\n```\n@misc{zeng2023agenttuning,\n      title={AgentTuning: Enabling Generalized Agent Abilities for LLMs},\n      author={Aohan Zeng and Mingdao Liu and Rui Lu and Bowen Wang and Xiao Liu and Yuxiao Dong and Jie Tang},\n      year={2023},\n      eprint={2310.12823},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthudm%2Fagenttuning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthudm%2Fagenttuning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthudm%2Fagenttuning/lists"}