{"id":13441778,"url":"https://github.com/conceptofmind/toolformer","last_synced_at":"2025-03-20T12:32:56.176Z","repository":{"id":84212794,"uuid":"602858141","full_name":"conceptofmind/toolformer","owner":"conceptofmind","description":null,"archived":true,"fork":false,"pushed_at":"2023-03-10T16:20:52.000Z","size":642,"stargazers_count":337,"open_issues_count":19,"forks_count":38,"subscribers_count":11,"default_branch":"main","last_synced_at":"2024-08-01T03:37:37.354Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/conceptofmind.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-02-17T04:46:36.000Z","updated_at":"2024-07-12T15:38:26.000Z","dependencies_parsed_at":"2023-03-12T21:59:27.883Z","dependency_job_id":null,"html_url":"https://github.com/conceptofmind/toolformer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/conceptofmind%2Ftoolformer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/conceptofmind%2Ftoolformer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/conceptofmind%2Ftoolformer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/conceptofmind%2Ftoolformer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/conceptofmind","download_url":"https://codeload.github.com/conceptofmind/toolformer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221760173,"owners_count":16876367,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T03:01:37.942Z","updated_at":"2025-03-20T12:32:56.168Z","avatar_url":"https://github.com/conceptofmind.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Toolformer\n\nOpen-source implementation of [Toolformer: Language Models Can Teach Themselves to Use Tools](https://arxiv.org/abs/2302.04761) by Meta AI.\n\n## Abstract\n\nLanguage models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q\\\u0026A system, two different search engines, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.\n\n## How to run\n\n### Inference\nModels are available on huggingface! [toolformer_v0](https://huggingface.co/dmayhem93/toolformer_v0_epoch2)\n\nQuick example on how to launch it below:\n```python\nimport torch\nfrom transformers import AutoTokenizer, AutoModelForCausalLM, pipeline\n\ntokenizer = AutoTokenizer.from_pretrained(r\"dmayhem93/toolformer_v0_epoch2\")\nmodel = AutoModelForCausalLM.from_pretrained(\n    r\"dmayhem93/toolformer_v0_epoch2\",\n    torch_dtype=torch.float16,\n    low_cpu_mem_usage=True,\n).cuda()\ngenerator = pipeline(\n    \"text-generation\", model=model, tokenizer=tokenizer, device=0\n) \n```\n\n#### Model Performance\n##### v0\nThe model is currently able to do retrieval. In a one shot setting it will pick it up without too much hand holding.\nFor zero shot, adding a token bias to the \u003cTOOLFORMER_API_START\u003e(token index 50257) will get it started.\n\nToken bias seems to depend on the length of context, 2.5 with minimal context, 7.5 with a lot of context, seemed to be good numbers in the brief testing.\n\nCalculation and Calendar are a WIP, you can give it a shot, but don't expect good results.\n\n#### Tool Integration\nWIP\n\nTool integration into sampling is a work in progress, so you will need to manually perform the tool integration.\n\ne.g. when it outputs \u003cTOOLFORMER_API_START\u003eCalculator(1 + 2)\u003cTOOLFORMER_API_RESPONSE\u003e you will need to input 3\u003cTOOLFORMER_API_END\u003e right after.\n\nFor retrieval, copy/pasting search results seems to work, but pasting results from actual retrieval is better if you have it.\n\nTo get some retrieval, here is a brief script on setting it up with some data you'll load in and retrieve from.\n```python\nfrom tools import Retriever\nimport json\n\n\nif __name__ == '__main__':\n    retriever = Retriever()\n    ret_val = \"location of New Orleans\"\n    with open('retrieval_test_data.json', encoding='utf-8') as f:\n        ret_strings = json.load(f)\n    print(', '.join(retriever.retrieval(\n        ret_strings, ret_val, 3\n    )))\n```\n\n### Data generation\nLooking to make your own data?\n\n```bash\npython data_generator.py --num_devices=x, --device_id=y\n```\n\nWill let you run it without collision on x devices, so if you only have one,\n\n```bash\npython data_generator.py --num_devices=1, --device_id=0\n```\n\nEach one uses an entire GPU, so if you want to run in a node with multiple GPUs please set your CUDA_VISIBLE_DEVICES, e.g.\n```bash\nexport CUDA_VISIBLE_DEVICES=5\npython data_generator.py --num_devices=8, --device_id=5\n```\n\nThe easiest way to gather multiple tools would be to make a data_generator script for each tool you want to use\n\nfinally, after you have your results, some minimal postprocessing scripts are in [this folder](data_handling)\n\nYou'll probably want to look at your data and figure out if there's any filtering needed.\n\nFor an example of what it looks like after, our first dataset generation is [here](https://huggingface.co/datasets/dmayhem93/toolformer_raw_v0), and the \npostprocessed outputs ready for HF trainer is [here](https://huggingface.co/datasets/dmayhem93/toolformer-v0-postprocessed)\n\n## How to train\n\nWe used huggingface's run_clm.py which we put in this repository as train_gptj_toolformer.py.\n\nWe used a batch size of 32 (4/device), command used is below\n```bash\ndeepspeed train_gptj_toolformer.py --model_name_or_path=EleutherAI/gpt-j-6B --per_device_train_batch_size=4 \\\n  --num_train_epochs 10 --save_strategy=epoch --output_dir=finetune_toolformer_v0 --report_to \"wandb\" \\\n  --dataset_name dmayhem93/toolformer-v0-postprocessed --tokenizer_name customToolformer \\\n  --block_size 2048 --gradient_accumulation_steps 1 --do_train --do_eval --evaluation_strategy=epoch \\\n  --logging_strategy=epoch --fp16 --overwrite_output_dir --adam_beta1=0.9 --adam_beta2=0.999 \\\n  --weight_decay=2e-02 --learning_rate=1e-05 --warmup_steps=100 --per_device_eval_batch_size=1 \\\n  --cache_dir=\"hf_cache\" --gradient_checkpointing=True --deepspeed ds_config_gpt_j.json\n```\n\n## Citations\n```bibtex\n@misc{https://doi.org/10.48550/arxiv.2302.04761,\n  doi = {10.48550/ARXIV.2302.04761},\n  \n  url = {https://arxiv.org/abs/2302.04761},\n  \n  author = {Schick, Timo and Dwivedi-Yu, Jane and Dessì, Roberto and Raileanu, Roberta and Lomeli, Maria and Zettlemoyer, Luke and Cancedda, Nicola and Scialom, Thomas},\n  \n  keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},\n  \n  title = {Toolformer: Language Models Can Teach Themselves to Use Tools},\n  \n  publisher = {arXiv},\n  \n  year = {2023},\n  \n  copyright = {arXiv.org perpetual, non-exclusive license}\n}\n\n@Article{dao2022flashattention,\n    title={Flashattention: Fast and memory-efficient exact attention with io-awareness},\n    author={Dao, Tri and Fu, Daniel Y and Ermon, Stefano and Rudra, Atri and R{'e}, Christopher},\n    journal={arXiv preprint arXiv:2205.14135},\n    year={2022}\n}\n\n@software{Liang_Long_Context_Transformer_2023,\n    author = {Liang, Kaizhao},\n    doi = {10.5281/zenodo.7651809},\n    month = {2},\n    title = {{Long Context Transformer v0.0.1}},\n    url = {https://github.com/github/linguist},\n    version = {0.0.1},\n    year = {2023}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconceptofmind%2Ftoolformer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fconceptofmind%2Ftoolformer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconceptofmind%2Ftoolformer/lists"}