{"id":13754100,"url":"https://github.com/sambanova/toolbench","last_synced_at":"2026-02-02T19:59:07.930Z","repository":{"id":169556249,"uuid":"642912896","full_name":"sambanova/toolbench","owner":"sambanova","description":"ToolBench, an evaluation suite for LLM tool manipulation capabilities. ","archived":false,"fork":false,"pushed_at":"2024-02-28T20:07:35.000Z","size":765,"stargazers_count":150,"open_issues_count":1,"forks_count":11,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-02T10:08:05.993Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2305.16504 ","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sambanova.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-05-19T16:23:35.000Z","updated_at":"2025-03-11T23:37:27.000Z","dependencies_parsed_at":"2024-02-28T21:27:29.662Z","dependency_job_id":null,"html_url":"https://github.com/sambanova/toolbench","commit_stats":null,"previous_names":["sambanova/toolbench"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sambanova%2Ftoolbench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sambanova%2Ftoolbench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sambanova%2Ftoolbench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sambanova%2Ftoolbench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sambanova","download_url":"https://codeload.github.com/sambanova/toolbench/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253335346,"owners_count":21892655,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T09:01:40.209Z","updated_at":"2026-02-02T19:59:02.843Z","avatar_url":"https://github.com/sambanova.png","language":"Python","funding_links":[],"categories":["A01_文本生成_文本对话","Evaluation Benchmarks"],"sub_categories":["大语言对话模型及数据"],"readme":"\u003ca href=\"https://sambanova.ai/\"\u003e\n\u003cpicture\u003e\n  \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"/images/SambaNova-light-logo-1.png\" height=\"60\"\u003e\n  \u003cimg alt=\"Text changing depending on mode. Light: 'So light!' Dark: 'So dark!'\" src=\"/images/SambaNova-dark-logo-1.png\" height=\"60\"\u003e\n\u003c/picture\u003e\n\u003c/a\u003e\n\n# ToolBench\n\n\u003cp\u003e\n    \u003ca href=\"https://huggingface.co/spaces/qiantong-xu/toolbench-leaderboard\"\u003e\n        \u003cimg src=\"https://img.shields.io/badge/leaderboard-0.0.0-yellow\"\n            alt=\"leaderboard\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://discord.gg/JehFG5HXKb\"\u003e\n        \u003cimg src=\"https://img.shields.io/discord/1105549926475247626?logo=discord\"\n            alt=\"chat on Discord\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cimg src=\"./images/toolbench.jpg\" title=\"SambaNova\" height=\"180\" /\u003e\nRecent studies on software tool manipulation with large language models (LLMs) mostly rely on closed model APIs (e.g. OpenAI), as there is an significant gap of model accuracy between those closed models and all the rest open-source LLMs.\nTo study the root cause of the gap and further facilitate the development of open-source LLMs, especially their capabilities on tool manipulation, we create the ToolBench. \nThe ToolBench is a benchmark consisting of diverse software tools for real-world tasks. \nWe also provide easy-to-use infrastructure in this repository to directly evaluate the execution success rate of each model. \nContributions to this repo are highly welcomed! We are excited to see new action generation algorithms and new testing tasks.\n\n\n## Table of contents\n- [Prerequisites](#prerequisites)\n- [Installation](#installation)\n- [Usage](#usage)\n- [Tasks](#tasks)\n- [Available Checkpoints](#checkpoints)\n\n## Prerequisites \n\n### Credentials\n- Create an [OpenAI account](https://platform.openai.com/account/api-keys) and register an API key.\n- Follow [this guide](https://developers.google.com/workspace/guides/create-credentials#service-account) to create a Google Cloud service account and create credentials for the account. Enable Google Sheets API and Google Drive API for the credentials.\n- Create an account for [OpenWeather](https://home.openweathermap.org/users/sign_up) and register an API key\n- Register an API key for [The Cat API](https://thecatapi.com/signup)\n\nAfter registration, update your credentials in `credential.sh` and \n```\nsource credential.sh\n```\n\n### Software\n- [Conda](https://www.anaconda.com/) (anaconda)\n- [Java](https://www.oracle.com/java/technologies/downloads/) \u003e= 11.0.13\n\n```\njava -version\nconda --version\n```\n\n## Installation\n\n- Activate a virtual environment\n```\nconda create --prefix ./venv python=3.8.13\nconda activate ./venv\n```\n\n- Download resources\n```\nsh download_resources.sh\n```\nPress Enter on all the questions. This process may take about 15 minutes.\n\n- Installation\n```\npip install -e .\n```\n\n- Make sure everything's good!\n```\npytest tests\n```\n\n\u003cdetails\u003e\n\n  \u003csummary\u003eInstallation FAQ\u003c/summary\u003e\n\n- `Permission denied: '/tmp/tika.log'`\n```\n# If you are sharing your machine with someone else, please set\nmkdir /tmp/$USER \u0026\u0026 export TIKA_LOG_PATH=/tmp/$USER\n```\n\n- Unable to find `libjvm.so`\n```\nexport JAVA_HOME=\n```\n\u003c/details\u003e\n\n## Usage\nThis repository evaluates the API function call success rate on the following tools from the ToolBench:\n1. [OpenWeather](https://openweathermap.org/api)\n2. [The Cat API](https://thecatapi.com)\n3. Home Search (similar to [Redfin](https://www.redfin.com/))\n4. Trip Booking (similar to [booking.com](https://www.booking.com/))\n5. [Google Sheet](https://docs.gspread.org/)\n6. [VirtualHome](http://virtual-home.org/)\n7. [Webshop](https://webshop-pnlp.github.io/)\n8. [Tabletop](https://code-as-policies.github.io/)\n\nOne can kick off the evaluation job with `test.py` on any combination of tools, models, number of APIs \nto retrieve and number of demonstation examples to place in the prompt. Here is an example of evaluating \n`text-davinci-003` on `open_weather` task.\n```\npython test.py \\\n--task 'open_weather' --version 'v0' \\\n--top_k_api 10 --top_k_example 3 --num_test_samples -1 \\\n--client_name \"openai\" --model_name 'text-davinci-003' --max_output_token 128 \n```\nAll the results will be logged out to the `--out_dir`, which defults to `out/`.\nThere will also be a cache created for each `client_name` and `model_name` combination as a sqlite database. When you want to query that given LM with a past query (prompt), it will retrieve the answer from the cache directly without running LM inference again.\n\nMore examples can be found below:\n\u003cdetails\u003e\n\n  \u003csummary\u003eEvaluation of OpenAI Models\u003c/summary\u003e\n    \n```\npython test.py --task 'open_weather' --version 'v0' --client_name \"openai\" --model_name 'text-davinci-003' --max_output_token 128 --top_k_api 10 --top_k_example 3 --num_test_samples -1\npython test.py --task 'the_cat_api' --version 'v0' --client_name \"openai\" --model_name 'text-davinci-003' --max_output_token 128 --top_k_api 3 --top_k_example 3 --num_test_samples -1\npython test.py --task 'virtual_home' --version 'v0' --client_name \"openai\" --model_name 'text-davinci-003' --max_output_token 128 --top_k_api 10 --top_k_example 3 --num_test_samples -1\npython test.py --task 'home_search' --version 'v0' --client_name \"openai\" --model_name 'text-davinci-003' --max_output_token 128 --top_k_api 15 --top_k_example 3 --num_test_samples -1\npython test.py --task 'booking' --version 'v0' --client_name \"openai\" --model_name 'text-davinci-003' --max_output_token 300 --top_k_api 15 --top_k_example 3 --num_test_samples -1\npython test.py --task 'google_sheets' --version 'v0' --client_name \"openai\" --model_name 'text-davinci-003' --max_output_token 256 --top_k_api 0 --top_k_example 3 --num_test_samples -1\npython test.py --task 'web_shop' --version 'v0' --client_name \"openai\" --model_name 'text-davinci-003' --max_output_token 128 --top_k_api 0 --top_k_example 3 --num_test_samples -1\npython test.py --task 'web_shop' --version 'v1' --client_name \"openai\" --model_name 'text-davinci-003' --max_output_token 128 --top_k_api 0 --top_k_example 3 --num_test_samples -1\npython test.py --task 'code_as_policies_tabletop' --version 'v0' --client_name \"openai\" --model_name 'text-davinci-003' --max_output_token 256 --top_k_api 0 --top_k_example 0 --num_test_samples -1\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\n  \u003csummary\u003eEvaluation of HuggingFace Models Locally\u003c/summary\u003e\n    \n- To host a model on a server, independent from this repo, follow [manifest](https://github.com/HazyResearch/manifest).\n- Find the IP address + port in the output of the commands above, and plug them in to the following commands.\n```\npython test.py --task 'open_weather' --version 'v0' --client_name \"huggingface\" --model 'facebook/opt-iml-30b' --client_connection 'http://10.10.1.98:5000' --max_output_token 128 --top_k_api 10 --top_k_example 3 --num_test_samples -1\npython test.py --task 'the_cat_api' --version 'v0' --client_name \"huggingface\" --model 'facebook/opt-iml-30b' --client_connection 'http://10.10.1.98:5000' --max_output_token 128 --top_k_api 3 --top_k_example 3 --num_test_samples -1\npython test.py --task 'virtual_home' --version 'v0' --client_name \"huggingface\" --model 'facebook/opt-iml-30b' --client_connection 'http://10.10.1.98:5000' --max_output_token 128 --top_k_api 10 --top_k_example 3 --num_test_samples -1\npython test.py --task 'home_search' --version 'v0' --client_name \"huggingface\" --model 'facebook/opt-iml-30b' --client_connection 'http://10.10.1.98:5000' --max_output_token 128 --top_k_api 15 --top_k_example 3 --num_test_samples -1\npython test.py --task 'booking' --version 'v0' --client_name \"huggingface\" --model 'facebook/opt-iml-30b' --client_connection 'http://10.10.1.98:5000' --max_output_token 300 --top_k_api 15 --top_k_example 3 --num_test_samples -1\npython test.py --task 'google_sheets' --version 'v0' --client_name \"huggingface\" --model 'facebook/opt-iml-30b' --client_connection 'http://10.10.1.98:5000' --max_output_token 256 --top_k_api 0 --top_k_example 3 --num_test_samples -1\npython test.py --task 'web_shop' --version 'v0' --client_name \"huggingface\" --model 'facebook/opt-iml-30b' --client_connection 'http://10.10.1.98:5000' --max_output_token 128 --top_k_api 0 --top_k_example 3 --num_test_samples -1\npython test.py --task 'web_shop' --version 'v1' --client_name \"huggingface\" --model 'facebook/opt-iml-30b' --client_connection 'http://10.10.1.98:5000' --max_output_token 128 --top_k_api 0 --top_k_example 3 --num_test_samples -1\npython test.py --task 'code_as_policies_tabletop' --version 'v0' --client_name \"huggingface\" --model 'facebook/opt-iml-30b' --client_connection 'http://10.10.1.98:5000' --max_output_token 256 --top_k_api 0 --top_k_example 0 --num_test_samples -1\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\n  \u003csummary\u003eRunning FAQ\u003c/summary\u003e\n    \n- If your job terminates with label assertion error: Rerun it.\n\u003c/details\u003e\n\n\n## Tasks\n\n### Test set structure\nEach task in `data/` is organized in the following way.\n```\n$ tree \u003ctask\u003e/\u003cversion\u003e\n\u003ctask\u003e/\u003cversion\u003e/\n├── examples\n│   ├── 0_0.txt\n│   ├── 1_0.txt\n│   ├── ...\n│   ├── 8_0.txt\n│   └── 9_0.txt\n├── functions\n│   ├── search\n│   ├── select_home_type\n│   ├── ...\n│   ├── set_num_garages\n│   └── set_num_swimming_pools\n└── test.jsonl\n```\n- `\u003cversion\u003e` is the version of the task.\n- `examples/` folder contains the example use cases of the APIs. Each file contains a single Q\u0026A pair, where the answer is in the executable code. You can place as many examples as you want in this folder.\n- `functions` folder defines the set of API functions for this task. Each file contains the function signiture and a description.\n- `test.jsonl` is a list of test cases. \n- Addtional files for either task context or evaluation can also be placed in the folder. \n\n\n### Examples\n\nHere we list of a subset of the API docs as well as the examples of each ToolBench task.\n\n\u003cdetails\u003e\n\n  \u003csummary\u003eOpen Weather\u003c/summary\u003e\n\n  - [API source](https://openweathermap.org/api)\n  - API docs\n  ```\n  # Get the current air pollution data in location with latitude={lat}, longitude={lon}\n  curl -X GET 'https://api.openweathermap.org/data/2.5/air_pollution?lat={lat}\u0026lon={lon}\u0026appid={API_KEY}'\n\n  Parameters:\n  lat, lon: (required) Geographical coordinates (latitude, longitude).\n  appid: (required) Your unique API key.\n\n\n  # Get the weather forecast data in location with latitude={lat}, longitude={lon}\n  curl -X GET 'https://api.openweathermap.org/data/2.5/forecast?lat={lat}\u0026lon={lon}\u0026appid={API_KEY}{optional_params}'\n\n  Parameters:\n  lat, lon: (required) Geographical coordinates (latitude, longitude).\n  appid: (required) Your unique API key.\n  units: (optional) Units of measurement. 'standard' (default), 'metric' and 'imperial' units are available.\n  mode: (optional) Response format. 'JSON' format is used by default. To get data in 'XML' format use mode=xml.\n  lang: (optional) You can use the lang parameter to get the output in your language. 'en' for English (default); 'fr' for Franch; 'zh_cn' for simplified Chinese; 'it' for Italian; 'de' for German; 'ru' for Russian; 'ja' for Japanese; 'nl' for Dutch.\n  ```\n  - Examples\n  ```\n  Task: Do you know what's the weather like in the following days in 94957? Please give me a json-mode response in Dutch.\n  Action:\n  curl -X GET 'https://api.openweathermap.org/data/2.5/forecast?zip=94957\u0026appid={API_KEY}\u0026mode=json\u0026lang=nl'\n\n\n  Task: What's the current air pollution level at the location with longitute -107.6 and latitude -75.4?\n  Action:\n  curl -X GET 'https://api.openweathermap.org/data/2.5/air_pollution?lat=-75.4\u0026lon=-107.6\u0026appid={API_KEY}'\n  ```\n\u003c/details\u003e\n\n\n\n\u003cdetails\u003e\n\n  \u003csummary\u003eThe Cat API\u003c/summary\u003e\n\n  - [API source](https://thecatapi.com/)\n  - API docs\n  ```\n  # Delete the image with id {image_id} from the list of favorites\n  curl -X DELETE 'https://api.thecatapi.com/v1/favourites/{image_id}'\n\n  # Add the image with id {image_id} to the list of favorites\n  curl -X POST 'https://api.thecatapi.com/v1/favourites' --data '{\"image_id\":\"{image_id}\"}'\n\n  # Get the list of favorite cat images.\n  curl -X GET 'https://api.thecatapi.com/v1/favourites'\n  ```\n  - Examples\n  ```\n  Task: Remove the image tu from my favorites.\n  Action:\n  curl -X DELETE 'https://api.thecatapi.com/v1/favourites/tu'\n\n  Task: List all my favorite cat images.\n  Action:\n  curl -X GET 'https://api.thecatapi.com/v1/favourites'\n\n  Task: Vote up the image with id alp.\n  Action:\n  curl -X POST 'https://api.thecatapi.com/v1/votes' --data '{\"image_id\":\"alp\", \"value\":1}'\n  ```\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\n  \u003csummary\u003eHome Search\u003c/summary\u003e\n\n  - API docs\n  ```\n  # To set home types for search. For home buying, home_types choices are: \"House\", \"Townhouse\", \"Condo\", \"Land\", \"Multi-family\", \"Mobile\", \"Co-op\"; for home renting, home_types choices are: \"House\", \"Townhouse\", \"Condo\", \"Apartment\".\n  API.select_home_type(home_types: List[str])\n\n  # To specify whether to search homes for buying or renting. 'value' can be chosen from ['buy', 'rent']. This function must be called after setting the location and before setting any other criteria.\n  API.set_buy_or_rent(value: str)\n\n  # To set the maximum commute time in minite\n  API.set_max_commute_time(value: int)\n\n  # To set the minimum home price in dollars\n  API.set_min_price(value: int)\n\n  # To set the maximum home price in dollars\n  API.set_max_price(value: int)\n\n  # To set the minimum home size in square feet\n  API.set_min_square_feet(value: int)\n\n  # To set the maximum home size in square feet\n  API.set_max_square_feet(value: int)\n\n  # To set the number of balconies\n  API.set_num_balconies(value: int)\n\n  # To set the floor number\n  API.set_floor_number(value: int)\n\n  # To set the number of swimming pool(s)\n  API.set_num_swimming_pools(value: int)\n\n  # To set the number of garage(s)\n  API.set_num_garages(value: int)\n\n  # To set the number of bedroom(s)\n  API.set_num_beds(value: int)\n\n  # To set the number of bathroom(s)\n  API.set_num_baths(value: float)\n\n  # To set the location for the search area. This function must be called before setting any criteria.\n  API.set_location(value: string)\n\n  # Submit criterion to get search results. This function should be called after setting all the criterion.\n  API.search()\n  ```\n  - Examples\n  ```\n  Task: I want to buy a townhouse, mobile or co-op in Pittsburgh with 4 rooms. My budget is $1385000.\n  Actions:\n  API.set_location(\"Pittsburgh\")\n  API.set_buy_or_rent(\"buy\")\n  API.select_home_type([\"Townhouse\", \"Mobile\", \"Co-op\"])\n  API.set_num_beds(4)\n  API.set_max_price(1385000)\n  API.search()\n\n  Task: Can you assist me in finding a co-op, townhouse or land in Aurora with 2 bedrooms, 1 bathrooms, 4 swimming pools, a price range of 118000 to 1464000, and a minimum square footage of 1300?\n  Actions:\n  API.set_location(\"Aurora\")\n  API.set_buy_or_rent(\"buy\")\n  API.select_home_type([\"Co-op\", \"Townhouse\", \"Land\"])\n  API.set_num_beds(2)\n  API.set_num_baths(1)\n  API.set_num_swimming_pools(4)\n  API.set_min_price(118000)\n  API.set_max_price(1464000)\n  API.set_min_square_feet(1300)\n  API.search()\n\n  Task: I'd like to rent a house, townhouse or apartment in Lincoln between 1150 and 3400 square feet, with 1 bedrooms, and a budget of $716000 to $1415000.\n  Actions:\n  API.set_location(\"Lincoln\")\n  API.set_buy_or_rent(\"rent\")\n  API.select_home_type([\"House\", \"Townhouse\", \"Apartment\"])\n  API.set_min_square_feet(1150)\n  API.set_max_square_feet(3400)\n  API.set_num_beds(1)\n  API.set_min_price(716000)\n  API.set_max_price(1415000)\n  API.search()\n  ```\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\n  \u003csummary\u003eTrip Booking\u003c/summary\u003e\n\n  - API docs\n  ```\n  # To select the transportation type from ['flight', 'train', 'bus', 'cruise'].\n  API.select_transportation(transportation_type)\n\n  # To select the booking type from ['hotels', 'trip tickets', 'both'].\n  API.select_booking_type(booking_type)\n\n  # To set the number of child tickets to purchase.\n  API.set_num_children(value)\n\n  # To set the number of adult tickets to purchase.\n  API.set_num_adults(value)\n\n  # To set the location for arrival, given a Loc object.\n  API.set_destination(Loc)\n\n  # To set the location for departure, given a Loc object.\n  API.set_origin(Loc)\n\n  # To set the location for hotel search, given a Loc object.\n  API.set_hotel_location(Loc)\n\n  # To select the hotel room type from ['King Bed', 'Queen Bed', 'Double', 'Luxury'].\n  API.select_room_type(room_type)\n\n  # To set the number of hotel rooms to book.\n  API.set_num_rooms(value)\n\n  # Submit criterion to get search results.\n  API.search()\n  ```\n  - Examples\n  ```\n  I live in Laredo, and am planning a trip to Glendale. I need to book 5 adult round trip tickets for train and 1 hotel rooms for 5 nights. I'll leave and check in on 2023/08/25, and check out on 2023/08/30.\n  Actions:\n  API.select_booking_type(\"both\")\n  location_from = Loc(\"Laredo\")\n  API.set_origin(location_from)\n  location_to = Loc(\"Glendale\")\n  API.set_destination(location_to)\n  API.set_num_adults(5)\n  API.select_transportation(\"train\")\n  API.set_num_rooms(1)\n  checkin_date = Date(8, 25, 2023)\n  API.set_checkin_date(checkin_date)\n  checkout_date = Date(8, 30, 2023)\n  API.set_checkout_date(checkout_date)\n  hotel_location = Loc(\"Glendale\")\n  API.set_hotel_location(hotel_location)\n  API.search()\n\n  Could you help me find train tickets for 3 children and 5 adults from Des Moines to Cape Coral on July 07, 2022? My budget is up to 280 per ticket.\n  Actions:\n  API.select_booking_type(\"trip tickets\")\n  API.select_transportation(\"train\")\n  API.set_num_children(3)\n  API.set_num_adults(5)\n  location_from = Loc(\"Des Moines\")\n  API.set_origin(location_from)\n  location_to = Loc(\"Cape Coral\")\n  API.set_destination(location_to)\n  departure_date = Date(7, 7, 2022)\n  API.set_departure_date(departure_date)\n  API.set_max_ticket_price(280)\n  API.search()\n  ```\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\n  \u003csummary\u003eGoogle Sheets\u003c/summary\u003e\n\n  - [API source](https://docs.gspread.org/)\n  - We also encourage the model to utilize [pands dataframe](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) for advanced manipulations.\n  - The test data can be [found here](https://docs.google.com/spreadsheets/d/1dgsg17hqRHkrJnKvWQyFwinMJNrsi1z2uhWNiJCUVIQ/edit?usp=sharing).\n\n  - API docs\n  ```\n  # Sets values in a cell range of the sheet. \n  worksheet.update(range_name, values=None, **kwargs)\n\n  # Updates the value of a cell. \n  worksheet.update_cell(row, col, value)\n\n  # Deletes multiple columns from the worksheet at the specified index. \n  worksheet.delete_columns(start_index, end_index=None)\n\n  # Deletes multiple rows from the worksheet at the specified index. \n  worksheet.delete_rows(start_index, end_index=None)\n  ```\n  - Examples\n  ```\n  | Product | Cost | Price |\n  | beef | 1 | 3 |\n  | pork | 5 | 4 |\n  | chicken | 10 | 11 |\n  | lamb | 3 | 15 |\n  | duck | 12 | 2 |\n  | fish | 2 | 100 |\n\n  Task: Sets 'Hello world' in 'A2' cell\n  Actions:\n  worksheet.update('A2', 'Hello world')\n\n  Task: Sets 'Hello world' in 'A2' cell\n  Actions:\n  worksheet.update_cell(2, 1, 'Hello world')\n\n  Task: Updates A2 and A3 with values 42 and 43\n  Actions:\n  worksheet.update('A2:A3', [[42], [43]])\n\n  Task: Updates D2 with values 3\n  Actions:\n  worksheet.update('D2', 3)\n\n  Task: Sum A1:A4 and write the result below A4\n  Actions:\n  worksheet.update('A5', '=SUM(A1:A4)', raw=False)\n\n  Task: Update chicken's price by 2\n  Actions:\n  df = get_as_dataframe(worksheet)\n  df.loc[df['Product'] == 'chicken', 'Price'] += 2\n  worksheet.clear()\n  set_with_dataframe(worksheet, df, include_index=False, include_column_header=True)\n  ```\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\n  \u003csummary\u003eVirtual Home\u003c/summary\u003e\n\n  - API is inherited from [VirtualHome](http://virtual-home.org/)\n  - Train/Test sets are from [this work](https://wenlong.page/language-planner/)\n\n  - API docs\n  ```\n  # Take a piece of clothes off. 'object' can only be: ['clothes_jacket', 'clothes_dress', 'clothes_hat', 'shoes', 'clothes_shirt', 'clothes_pants'].\n  Agent.TakeOff(object)\n\n  # Scrub an object. 'object' can only be: ['mop', 'cup', 'toilet', 'plate', 'soap', 'sink', 'spoon', 'cat', 'shower', 'dishwasher', 'hands_both', 'drinking_glass', 'bowl', 'towel'].\n  Agent.Scrub(object)\n\n  # Rinse an object. 'object' can only be: ['cup', 'pot', 'water', 'water_glass', 'sponge', 'soap', 'towel', 'dish_soap', 'oven', 'cleaning_solution', 'knife', 'spoon', 'sink', 'faucet', 'clothes_underwear', 'detergent', 'drinking_glass', 'hands_both', 'toilet', 'shower', 'rag', 'plate', 'bowl', 'fork'].\n  Agent.Rinse(object)\n\n  # Wash an object. 'object' can only be: ['face', 'cup', 'food_vegetable', 'dresser', 'fork', 'shoes', 'child', 'coffee_cup', 'bed', 'water', 'soap', 'duster', 'brush', 'bathtub', 'toy', 'cleaning_bottle', 'hair', 'sink', 'razor', 'hands_both', 'drinking_glass', 'table', 'toilet', 'basket_for_clothes', 'shower', 'dishwasher', 'plate', 'bowl', 'spoon'].\n  Agent.Wash(object)\n\n  # Turn to an object. 'object' can only be: ['cup', 'coffee_pot', 'button', 'dresser', 'milk', 'phone', 'water', 'bed', 'soap', 'mail', 'picture', 'filing_cabinet', 'oven', 'faucet', 'newspaper', 'food_cheese', 'towel', 'spoon', 'laptop', 'man', 'food_bread', 'dishrack', 'keyboard', 'freezer', 'bookshelf', 'chair', 'water_glass', 'brush', 'centerpiece', 'desk', 'kitchen_cabinet', 'fly', 'toy', 'mouse', 'homework', 'computer', 'television', 'shower', 'plate', 'clothes_pants', 'child', 'food_dessert', 'folder', 'window', 'hair', 'coffee_maker', 'hanger', 'cat', 'food_food', 'floor_lamp', 'creditcard', 'razor', 'electrical_outlet', 'washing_machine', 'toilet', 'coffee_table', 'paper_towel', 'clothes_dress', 'couch', 'drawing', 'remote_control', 'face', 'food_vegetable', 'vacuum_cleaner', 'light', 'shoes', 'pot', 'mirror', 'duster', 'cleaning_bottle', 'sink', 'toaster', 'novel', 'clothes_shirt', 'table', 'kitchen_counter', 'wall_clock', 'woman', 'dishwasher', 'dog', 'check'].\n  Agent.TurnTo(object)\n\n  # Open an object. 'object' can only be: ['coffee_pot', 'dresser', 'freezer', 'bookshelf', 'cupboard', 'folder', 'mail', 'filing_cabinet', 'window', 'oven', 'desk', 'kitchen_cabinet', 'trashcan', 'coffee_maker', 'curtain', 'bathroom_cabinet', 'nightstand', 'bag', 'washing_machine', 'toilet', 'basket_for_clothes', 'microwave', 'dishwasher', 'purse'].\n  Agent.Open(object)\n\n  # Pull an object. 'object' can only be: ['table', 'mop', 'mouse', 'chair', 'clothes_pants', 'light_bulb', 'curtain', 'vacuum_cleaner', 'mat', 'cat', 'food_food', 'drawing', 'shoes', 'centerpiece', 'sheets', 'pot', 'laptop'].\n  Agent.Pull(object)\n  ```\n  - Examples\n  ```\n  Task: Put down bags\n  Actions:\n  Agent.WalkTo(dining_room)\n  Agent.WalkTo(food_food)\n  Agent.Find(food_food)\n  Agent.Grab(food_food)\n  Agent.Find(table)\n  Agent.Put(food_food, table)\n\n  Task: Getting dresses\n  Actions:\n  Agent.WalkTo(bedroom)\n  Agent.WalkTo(dresser)\n  Agent.Find(dresser)\n  Agent.WalkTo(dresser)\n  Agent.Open(dresser)\n  Agent.Find(clothes_pants)\n  Agent.Grab(clothes_pants)\n  Agent.PutOn(clothes_pants)\n  Agent.Find(clothes_shirt)\n  Agent.Grab(clothes_shirt)\n  Agent.PutOn(clothes_shirt)\n  Agent.Close(dresser)\n  ```\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\n  \u003csummary\u003eWebshop\u003c/summary\u003e\n\n  - API docs and train/test sets are from [this work](https://arxiv.org/pdf/2207.01206.pdf)\n\n  - API docs\n  ```\n  # search a text query, and go from 'search' page to 'results' page that lists a set of products returned by a search engine.\n  search[{query}]\n\n  # On the current page, click a button with text 'button_text', which is the lowercased text between any '[button]' and '[button_]'. For a single action, click on only one button.\n  click[button_text]\n  ```\n  - Examples\n  ```\n  Task:\n  Amazon Shopping Game\n  Instruction: \n  i need a high speed usb flash drive that is 32 gb, and price lower than 50.00 dollars\n  [button] Search [button_]\n\n  Action:\n  search[32 gb high speed usb flash drive ]\n\n  Task:\n  Instruction:\n  i need a high speed usb flash drive that is 32 gb, and price lower than 50.00 dollars\n  [button] Back to Search [button_]\n  Page 1 (Total results: 50)\n  [button] Next \u003e [button_]\n\n  [button] B0845XPB49 [button_]\n  SanDisk 32GB Ultra Luxe USB 3.1 Flash Drive (Bulk 2 Pack) 150MB/s Speed 32 GB Pendrive Works with Computer, Laptop (SDCZ74-032G-G46) Bundle with (1) Everything But Stromboli Lanyard\n  $20.58\n\n  [button] B085X5548B [button_]\n  V7 32GB USB 3.1 Flash Drive - 32 GB - USB 3.1-120 MB/s Read Speed - Black\n  $11.59\n\n  [button] B08DQ22WFM [button_]\n  SanDisk Ultra Luxe 32GB USB 3.1 Flash Drive (Bulk 5 Pack) Works with Computer, Laptop, 150MB/s 32 GB PenDrive High Speed All Metal (SDCZ74-032G-G46) Bundle with (2) Everything But Stromboli Lanyards\n  $46.36\n\n  Action:\n  click[b085x5548b]\n  ```\n\u003c/details\u003e\n\n  \n\u003cdetails\u003e\n\n  \u003csummary\u003eTabletop\u003c/summary\u003e\n\n  - API docs and train/test sets are from [this work](https://code-as-policies.github.io/)\n\n  - API docs: there is no explicit API definition. All the functions and usages are covered by the examples.\n  - Examples\n  ```\n  ##### High-level UI #####\n  objects = ['yellow block', 'green block', 'yellow bowl', 'blue block', 'blue bowl', 'green bowl']\n  # the yellow block on the yellow bowl.\n  say('Ok - putting the yellow block on the yellow bowl')\n  put_first_on_second('yellow block', 'yellow bowl')\n\n  objects = ['yellow block', 'green block', 'yellow bowl', 'blue block', 'blue bowl', 'green bowl']\n  # which block did you move.\n  say('I moved the yellow block')\n\n  objects = ['yellow block', 'green block', 'yellow bowl', 'blue block', 'blue bowl', 'green bowl']\n  # move the green block to the top right corner.\n  say('Got it - putting the green block on the top right corner')\n  corner_pos = parse_position('top right corner')\n  put_first_on_second('green block', corner_pos)\n\n  ##### parse_obj_name #####\n  bjects = ['blue block', 'cyan block', 'purple bowl', 'gray bowl', 'brown bowl', 'pink block', 'purple block']\n  # the block closest to the purple bowl.\n  block_names = ['blue block', 'cyan block', 'purple block']\n  block_positions = get_obj_positions_np(block_names)\n  closest_block_idx = get_closest_idx(points=block_positions, point=get_obj_pos('purple bowl'))\n  closest_block_name = block_names[closest_block_idx]\n  ret_val = closest_block_name\n\n  objects = ['brown bowl', 'banana', 'brown block', 'apple', 'blue bowl', 'blue block']\n  # the blocks.\n  ret_val = ['brown block', 'blue block']\n  objects = ['brown bowl', 'banana', 'brown block', 'apple', 'blue bowl', 'blue block']\n  # the brown objects.\n  ret_val = ['brown bowl', 'brown block']\n\n  ##### parse_position #####\n  # a 30cm horizontal line in the middle with 3 points.\n  middle_pos = denormalize_xy([0.5, 0.5]) \n  start_pos = middle_pos + [-0.3/2, 0]\n  end_pos = middle_pos + [0.3/2, 0]\n  line = make_line(start=start_pos, end=end_pos)\n  points = interpolate_pts_on_line(line=line, n=3)\n  ret_val = points\n\n  # a 20cm vertical line near the right with 4 points.\n  middle_pos = denormalize_xy([1, 0.5]) \n  start_pos = middle_pos + [0, -0.2/2]\n  end_pos = middle_pos + [0, 0.2/2]\n  line = make_line(start=start_pos, end=end_pos)\n  points = interpolate_pts_on_line(line=line, n=4)\n  ret_val = points\n\n  ##### parse_question #####\n  objects = ['yellow bowl', 'blue block', 'yellow block', 'blue bowl', 'fruit', 'green block', 'black bowl']\n  # is the blue block to the right of the yellow bowl?\n  ret_val = get_obj_pos('blue block')[0] \u003e get_obj_pos('yellow bowl')[0]\n\n  objects = ['yellow bowl', 'blue block', 'yellow block', 'blue bowl', 'fruit', 'green block', 'black bowl']\n  # how many yellow objects are there?\n  yellow_object_names = parse_obj_name('the yellow objects', f'objects = {get_obj_names()}')\n  ret_val = len(yellow_object_names)\n\n  ##### helper functions #####\n  # define function: total = get_total(xs=numbers).\n  def get_total(xs):\n      return np.sum(xs)\n\n  # define function: y = eval_line(x, slope, y_intercept=0).\n  def eval_line(x, slope, y_intercept):\n      return x * slope + y_intercept\n  ```\n\u003c/details\u003e\n\nHere is an example prompt we sent into the model with API definition, demonstration examples and the query \non the Open Weather.\n\n\u003cdetails\u003e\n\n  \u003csummary\u003ePrompt Example\u003c/summary\u003e\n\n```\nI have the following set of API:\n\n# Get the current air pollution data in location with latitude={lat}, longitude={lon}\ncurl -X GET 'https://api.openweathermap.org/data/2.5/air_pollution?lat={lat}\u0026lon={lon}\u0026appid={API_KEY}'\n\nParameters:\nlat, lon: (required) Geographical coordinates (latitude, longitude).\nappid: (required) Your unique API key.\n\n# Get the forecast air pollution data in location with latitude={lat}, longitude={lon}\ncurl -X GET 'https://api.openweathermap.org/data/2.5/air_pollution/forecast?lat={lat}\u0026lon={lon}\u0026appid={API_KEY}'\n\nParameters:\nlat, lon: (required) Geographical coordinates (latitude, longitude).\nappid: (required) Your unique API key.\n\n# Get the current weather data in location with latitude={lat}, longitude={lon}\ncurl -X GET 'https://api.openweathermap.org/data/2.5/weather?lat={lat}\u0026lon={lon}\u0026appid={API_KEY}{optional_params}'\n\nParameters:\nlat, lon: (required) Geographical coordinates (latitude, longitude).\nappid: (required) Your unique API key.\nunits: (optional) Units of measurement. 'standard' (default), 'metric' and 'imperial' units are available.\nmode: (optional) Response format. 'JSON' format is used by default. To get data in 'XML' format use mode=xml.\nlang: (optional) You can use the lang parameter to get the output in your language. 'en' for English (default); 'fr' for French; 'zh_cn' for simplified Chinese; 'it' for Italian; 'de' for German; 'ru' for Russian; 'ja' for Japanese; 'nl' for Dutch.\n\n# Get the weather forecast data in location with latitude={lat}, longitude={lon}\ncurl -X GET 'https://api.openweathermap.org/data/2.5/forecast?lat={lat}\u0026lon={lon}\u0026appid={API_KEY}{optional_params}'\n\nParameters:\nlat, lon: (required) Geographical coordinates (latitude, longitude).\nappid: (required) Your unique API key.\nunits: (optional) Units of measurement. 'standard' (default), 'metric' and 'imperial' units are available.\nmode: (optional) Response format. 'JSON' format is used by default. To get data in 'XML' format use mode=xml.\nlang: (optional) You can use the lang parameter to get the output in your language. 'en' for English (default); 'fr' for French; 'zh_cn' for simplified Chinese; 'it' for Italian; 'de' for German; 'ru' for Russian; 'ja' for Japanese; 'nl' for Dutch.\n\n# Get up to 'k' geographical coordinates of locations named 'city_name'\ncurl -X GET 'https://api.openweathermap.org/geo/1.0/direct?q={city_name}\u0026limit={k}\u0026appid={API_KEY}'\n\nParameters:\nq: (required) City name.\nappid: (required) Your unique API key.\nlimit: (optional) Number of the locations in the API response.\n\n# Get the current weather data in {zipcode} area\ncurl -X GET 'https://api.openweathermap.org/data/2.5/weather?zip={zipcode}\u0026appid={API_KEY}{optional_params}'\n\nParameters:\nzip: (required) Zip code.\nappid: (required) Your unique API key.\nunits: (optional) Units of measurement. 'standard' (default), 'metric' and 'imperial' units are available.\nmode: (optional) Response format. 'JSON' format is used by default. To get data in 'XML' format use mode=xml.\nlang: (optional) You can use the lang parameter to get the output in your language. 'en' for English (default); 'fr' for French; 'zh_cn' for simplified Chinese; 'it' for Italian; 'de' for German; 'ru' for Russian; 'ja' for Japanese; 'nl' for Dutch.\n\n# Get the weather forecast data in {zipcode} area\ncurl -X GET 'https://api.openweathermap.org/data/2.5/forecast?zip={zipcode}\u0026appid={API_KEY}{optional_params}'\n\nParameters:\nzip: (required) Zip code.\nappid: (required) Your unique API key.\nunits: (optional) Units of measurement. 'standard' (default), 'metric' and 'imperial' units are available.\nmode: (optional) Response format. 'JSON' format is used by default. To get data in 'XML' format use mode=xml.\nlang: (optional) You can use the lang parameter to get the output in your language. 'en' for English (default); 'fr' for French; 'zh_cn' for simplified Chinese; 'it' for Italian; 'de' for German; 'ru' for Russian; 'ja' for Japanese; 'nl' for Dutch.\n\n# Get the weather forecast data in {city}\ncurl -X GET 'https://api.openweathermap.org/data/2.5/forecast?q={city_formatted}\u0026appid={API_KEY}{optional_params}'\n\nParameters:\nq: (required) City name.\nappid: (required) Your unique API key.\nunits: (optional) Units of measurement. 'standard' (default), 'metric' and 'imperial' units are available.\nmode: (optional) Response format. 'JSON' format is used by default. To get data in 'XML' format use mode=xml.\nlang: (optional) You can use the lang parameter to get the output in your language. 'en' for English (default); 'fr' for French; 'zh_cn' for simplified Chinese; 'it' for Italian; 'de' for German; 'ru' for Russian; 'ja' for Japanese; 'nl' for Dutch.\n\n# Get the current weather data in {city}\ncurl -X GET 'https://api.openweathermap.org/data/2.5/weather?q={city_formatted}\u0026appid={API_KEY}{optional_params}'\n\nParameters:\nq: (required) City name.\nappid: (required) Your unique API key.\nunits: (optional) Units of measurement. 'standard' (default), 'metric' and 'imperial' units are available.\nmode: (optional) Response format. 'JSON' format is used by default. To get data in 'XML' format use mode=xml.\nlang: (optional) You can use the lang parameter to get the output in your language. 'en' for English (default); 'fr' for French; 'zh_cn' for simplified Chinese; 'it' for Italian; 'de' for German; 'ru' for Russian; 'ja' for Japanese; 'nl' for Dutch.\n\n-------------\nI have the following set of examples:\n\nTask: Do you know what's the weather like in the following days in 94957? Please give me a json-mode response in Dutch.\nAction:\ncurl -X GET 'https://api.openweathermap.org/data/2.5/forecast?zip=94957\u0026appid={API_KEY}\u0026mode=json\u0026lang=nl'\n\nTask: How's the air quality for the next few days at the place where longitute = -32.1 and latitude = -43.1?\nAction:\ncurl -X GET 'https://api.openweathermap.org/data/2.5/air_pollution/forecast?lat=-43.1\u0026lon=-32.1\u0026appid={API_KEY}'\n\nTask: Please give me the air quality data at longitute 163.3 and latitude -80.0 at this moment.\nAction:\ncurl -X GET 'https://api.openweathermap.org/data/2.5/air_pollution?lat=-80.0\u0026lon=163.3\u0026appid={API_KEY}'\n\n-------------\nTask: Can you tell me what to expect on the air quality in the coming days at the location with (longitute, latitude) = (133.3, -56.7)?\nActions:\n```\n\u003c/details\u003e\n\n\n## Checkpoints\nWe release the 3 finetuned checkpoints we created in the paper\n- [LLaMA-30b](https://huggingface.co/sambanovasystems/LLaMA-30b-toolbench)\n- [StarCoder](https://huggingface.co/sambanovasystems/starcoder-toolbench)\n- [CodeGen-16b-mono](https://huggingface.co/sambanovasystems/codegen-16B-mono-toolbench)\n\n\\[Note\\] The checkpoints provided in this section is only for initial study purpose, \nwe do not expect the checkpoints to perform well in other tool tasks. \nWe highly encourage the community to contribute!\n\n\n## Citation\n```\n@misc{xu2023tool,\n      title={On the Tool Manipulation Capability of Open-source Large Language Models}, \n      author={Qiantong Xu and Fenglu Hong and Bo Li and Changran Hu and Zhengyu Chen and Jian Zhang},\n      year={2023},\n      eprint={2305.16504},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n```\n\n### Acknowledgement\nWe sincerely appreciate all the helpful discussions from Urmish Thakker, Tian Zhao, Raghu Prabhakar, Kaizhao Liang, Petro Junior Milan, \nBowen Yang, Qinghua Li and Yaqi Zhang.\n\nWe also want to express our gratitude to the great works in the open sourced models. Especially the ones we rely heavily on in the paper\n- [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)\n- [CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis](https://arxiv.org/abs/2203.13474)\n- [StarCoder: may the source be with you!](https://arxiv.org/abs/2305.06161)\n\nThere are also several other inspiring works about tool manipulation and benchmarking that we want to raise our thanks to\n- [Tool Learning with Foundation Models](https://arxiv.org/abs/2304.08354)\n- [API-Bank: A Benchmark for Tool-Augmented LLMs](https://arxiv.org/abs/2304.08244)\n- [Gorilla: Large Language Model Connected with Massive APIs](https://arxiv.org/abs/2305.15334)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsambanova%2Ftoolbench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsambanova%2Ftoolbench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsambanova%2Ftoolbench/lists"}