{"id":25177200,"url":"https://github.com/Reason-Wang/ToolGen","last_synced_at":"2025-10-24T11:31:16.045Z","repository":{"id":258688224,"uuid":"867649542","full_name":"Reason-Wang/ToolGen","owner":"Reason-Wang","description":"The official implementation of paper \"ToolGen: Unified Tool Retrieval and Calling via Generation\"","archived":false,"fork":false,"pushed_at":"2024-12-14T17:33:10.000Z","size":273,"stargazers_count":108,"open_issues_count":2,"forks_count":10,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-12-14T18:32:41.180Z","etag":null,"topics":["agent","llm","nlp","retrieval","tool","tool-learning","tool-retrieval","toolgen"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2410.03439","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Reason-Wang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-04T13:10:23.000Z","updated_at":"2024-12-14T17:33:14.000Z","dependencies_parsed_at":"2024-10-20T18:16:15.192Z","dependency_job_id":null,"html_url":"https://github.com/Reason-Wang/ToolGen","commit_stats":null,"previous_names":["reason-wang/toolgen"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Reason-Wang%2FToolGen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Reason-Wang%2FToolGen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Reason-Wang%2FToolGen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Reason-Wang%2FToolGen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Reason-Wang","download_url":"https://codeload.github.com/Reason-Wang/ToolGen/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237957705,"owners_count":19393726,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","llm","nlp","retrieval","tool","tool-learning","tool-retrieval","toolgen"],"created_at":"2025-02-09T14:00:55.661Z","updated_at":"2025-10-24T11:31:16.040Z","avatar_url":"https://github.com/Reason-Wang.png","language":"Python","funding_links":[],"categories":["Papers"],"sub_categories":["Tools"],"readme":"![banner.png](assets/banner.png)\n# ToolGen: Unified Tool Retrieval and Calling via Generation\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://huggingface.co/collections/reasonwang/toolgen-668a46a4959745ec8e9891f6\"\u003e🤗ToolGen Model \u003c/a\u003e\n    • \n\t\u003ca href=\"https://arxiv.org/pdf/2410.03439\"\u003e📄Paper (arxiv)\u003c/a\u003e\n\t\u003c!-- •  --\u003e\n    \u003c!-- \u003ca href=\"https://huggingface.co/datasets/reasonwang/ToolGen-Datasets\"\u003e🤗ToolGen Datasets\u003c/a\u003e --\u003e\n\u003c/p\u003e\n\n`25/02/2025`: Updated the training scripts. Refer to [training/README.md](training/README.md) for more details.\n\n`14/12/2024`: Updated [Qwen2.5-based ToolGen](https://huggingface.co/collections/reasonwang/toolgen-668a46a4959745ec8e9891f6)\n\nToolGen is a framework that integrates tool knowledge directly into LLMs by representing tools as unique tokens, enabling seamless tool invocation and language generation.🔧🦙 With 47,000 tool tokens, ToolGen shows superior performance in both tool retrieval and task completion.\n\n\n## Run ToolGen\n\nThe following code snippet shows how to run ToolGen locally. First, get your ToolBench key from [ToolBench](https://github.com/OpenBMB/ToolBench) repo. Then deploy [StableToolBench](https://github.com/THUNLP-MT/StableToolBench) following the instructions in their repo.\n\n```python\nimport json\nfrom OpenAgent import ToolGen\nfrom OpenAgent import RapidAPIWrapper\n\n# Initialize rapid api tools\nwith open(\"keys.json\", 'r') as f:\n    keys = json.load(f)\ntoolbench_key = keys['TOOLBENCH_KEY']\nrapidapi_wrapper = RapidAPIWrapper(\n    toolbench_key=toolbench_key,\n    rapidapi_key=\"\",\n)\n\ntoolgen = ToolGen(\n    \"reasonwang/ToolGen-Llama-3-8B\", # reasonwang/ToolGen-Qwen2.5-3B\n    template=\"llama-3\", # qwen-7b-chat\n    indexing=\"Atomic\",\n    tools=rapidapi_wrapper,\n)\n\nmessages = [\n    {\"role\": \"system\", \"content\": \"\"},\n    {\"role\": \"user\", \"content\": \"I'm a football fan and I'm curious about the different team names used in different leagues and countries. Can you provide me with an extensive list of football team names and their short names? It would be great if I could access more than 7000 team names. Additionally, I would like to see the first 25 team names and their short names using the basic plan.\"}\n]\n\ntoolgen.restart()\ntoolgen.start(\n    single_chain_max_step=16,\n    start_messages=messages\n)\n\n```\n## ToolGen\nDownload and decompress [data.tar.gz](https://huggingface.co/datasets/reasonwang/ToolGen-Datasets/blob/main/data.tar.gz). Other datasets are at [🤗ToolGen-Datasets](https://huggingface.co/datasets/reasonwang/ToolGen-Datasets).\n\n### Tool Virtualization\nThe first step is to map tools into tokens. We have extracted all the tools in ToolBench and converted them into tokens, as shown in [virtual_tokens.txt](data/virtual_tokens.txt). The following code adds the tokens into the vocabulary and expands model embeddings.\n\n```python\nwith open('data/virtual_tokens.txt', 'r') as f:\n    virtual_tokens = f.readlines()\n    virtual_tokens = [unidecode(vt.strip()) for vt in virtual_tokens]\n\nmodel_name_or_path = \"meta-llama/Meta-Llama-3-8B\"\n# Load tokenizer and add tokens into vocabulary\ntokenizer = transformers.AutoTokenizer.from_pretrained(model_name_or_path)\ntokenizer.add_tokens(new_tokens=virtual_tokens, special_tokens=False)\n# Load model and expand embeddings\nmodel = transformers.AutoModelForCausalLM.from_pretrained(\n    model_name_or_path,\n    torch_dtype=torch.bfloat16\n)\nmodel.resize_token_embeddings(len(tokenizer))\ncombined_tokens = []\nfor vt in virtual_tokens:\n    combined_token = vt[2:-2].split(\"\u0026\u0026\")\n    combined_tokens.append(combined_token)\n    \nfor combined_token, virtual_token in zip(combined_tokens, virtual_tokens):\n    combined_token_ids = tokenizer(\" \".join(combined_token), add_special_tokens=False).input_ids\n    virtual_token_id = tokenizer(virtual_token, add_special_tokens=False).input_ids\n    assert len(virtual_token_id) == 1\n    combined_token_embeddings = model.model.embed_tokens(torch.tensor(combined_token_ids).to(model.device))\n    embedding = torch.mean(combined_token_embeddings, dim=0)\n    model.model.embed_tokens.weight.data[virtual_token_id[0]] = embedding\n```\n\n### Tool Memorization\nAfter tool virtualization, there is a three-stage training to finetune ToolGen. The first stage is tool memorization, which trains the model to memorize all tool tokens. The data for this stage is at [🤗ToolGen-Memorization](https://huggingface.co/datasets/reasonwang/ToolGen-Datasets/blob/main/toolgen_atomic_memorization.json). We have converted the format into ShareGPT-like format for an easy integration with current training framework like [FastChat](https://github.com/lm-sys/FastChat) and [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory). Note that we train the first stage for 8 epochs. A sample is shown bellow:\n```\n{\n    \"conversations\": [\n        {\n            \"role\": \"user\",\n            \"content\": \"Tool Name: QRCheck. Tool Description: Check the quality of any QRCode Api Name: quality_v1_quality_post Api Description: None.\",\n            \"loss\": false\n        },\n        {\n            \"role\": \"assistant\",\n            \"content\": \"\u003c\u003cQRCheck\u0026\u0026quality_v1_quality_post\u003e\u003e\",\n            \"loss\": true\n        }\n    ]\n}\n```\n\n### Retrieval Training\nThe second stage mainly trains the tool retrieval capability of ToolGen. The data is also at [🤗ToolGen-Retrieval](https://huggingface.co/datasets/reasonwang/ToolGen-Datasets/blob/main/toolgen_atomic_retrieval_G123.json). We train it for 1 epoch. After the second stage training, we obtain ToolGen-Retriever. A sample is shown below:\n```\n{\n    \"conversations\": [\n        {\n            \"role\": \"user\",\n            \"content\": \"My friends and I are organizing a hackathon on 'web development' and 'mobile app development'. We need some inspiration and guidance. Can you fetch the top stories on these topics from Medium.com?\",\n            \"loss\": false,\n        },\n        {\n            \"role\": \"assistant\",\n            \"content\": \"\u003c\u003cMedium\u0026\u0026/search/topics\u003e\u003e\",\n            \"loss\": true\n        }\n    ]\n}\n```\n### End-to-End Agent-Tuning\nFinally, we train the ToolGen with agent trajectories to enable them task completion capability. The data is at [🤗ToolGen-Agent](https://huggingface.co/datasets/reasonwang/ToolGen-Datasets/blob/main/toolgen_atomic_G123_dfs.json).\n\n\n## Evaluation\n### Retrieval\nThe following command shows an example to evaluate the retrieval performance. Other tool retrieval evaluation scripts can be found in `scripts/retrieval`.\n\n```\npython -m evaluation.retrieval.eval_toolgen \\\n    --model_name_or_path \"reasonwang/ToolGen-Llama-3-8B-Tool-Retriever\" \\\n    --indexing \"Atomic\" \\\n    --stage \"G1\" \\\n    --split \"instruction\" \\\n    --result_path data/results/retrieval/ \\\n    --constrain True\n```\n\n### Inference\nFor end-to-end evaluation, first get [ToolBench](https://github.com/OpenBMB/ToolBench) Key and run [StableToolBench](https://github.com/THUNLP-MT/StableToolBench).\nThen, perform inference on queries to generate trajectories. Scripts can be found in `scripts/inference`\n\n### Solvable Pass Rate\nFirst, using `scripts/convert_answer/run_convert_answer.sh` to convert trajectory format. Then run `scripts/pass_rate/run_pass_rate.sh` for pass rate evaluation.\n\n### Solvable Win Rate\nRun `scripts/preference/run_preference.sh` for win rate evaluation.\n\n## Citation\nIf our work is helpful, please kindly cite as:\n```\n@misc{wang2024toolgenunifiedtoolretrieval,\n      title={ToolGen: Unified Tool Retrieval and Calling via Generation}, \n      author={Renxi Wang and Xudong Han and Lei Ji and Shu Wang and Timothy Baldwin and Haonan Li},\n      year={2024},\n      eprint={2410.03439},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https://arxiv.org/abs/2410.03439}, \n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FReason-Wang%2FToolGen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FReason-Wang%2FToolGen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FReason-Wang%2FToolGen/lists"}