{"id":27569342,"url":"https://github.com/StevenGrove/GPT4Tools","last_synced_at":"2025-04-21T00:01:30.350Z","repository":{"id":154096467,"uuid":"631607568","full_name":"AILab-CVC/GPT4Tools","owner":"AILab-CVC","description":"GPT4Tools is an intelligent system that can automatically decide, control, and utilize different visual foundation models, allowing the user to interact with images during a conversation.","archived":false,"fork":false,"pushed_at":"2023-12-19T01:55:15.000Z","size":6649,"stargazers_count":771,"open_issues_count":14,"forks_count":57,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-04-10T16:16:14.779Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://gpt4tools.github.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AILab-CVC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-04-23T15:00:53.000Z","updated_at":"2025-03-31T20:30:43.000Z","dependencies_parsed_at":"2024-01-14T09:32:40.368Z","dependency_job_id":"5d693f58-3ff9-455f-b31f-476ed9ee781a","html_url":"https://github.com/AILab-CVC/GPT4Tools","commit_stats":null,"previous_names":["ailab-cvc/gpt4tools","stevengrove/gpt4tools"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FGPT4Tools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FGPT4Tools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FGPT4Tools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FGPT4Tools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AILab-CVC","download_url":"https://codeload.github.com/AILab-CVC/GPT4Tools/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249977522,"owners_count":21354863,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-21T00:00:45.491Z","updated_at":"2025-04-21T00:01:30.239Z","avatar_url":"https://github.com/AILab-CVC.png","language":"Python","funding_links":[],"categories":["Statistics"],"sub_categories":[],"readme":"# GPT4Tools: Teaching LLM to Use Tools via Self-instruction\n\n[Lin Song](http://linsong.info/), [Yanwei Li](https://yanwei-li.com/), [Rui Yang](https://github.com/Yangr116), Sijie Zhao, [Yixiao Ge](https://geyixiao.com/), [Xiu Li](https://www.sigs.tsinghua.edu.cn/lx/), [Ying Shan](https://scholar.google.com/citations?user=4oXBp9UAAAAJ\u0026hl=en)\n\n\nGPT4Tools is a centralized system that can control multiple visual foundation models. \nIt is based on Vicuna (LLaMA), and 71K self-built instruction data.\nBy analyzing the language content, GPT4Tools is capable of automatically deciding, controlling, and utilizing different visual foundation models, allowing the user to interact with images during a conversation.\nWith this approach, GPT4Tools provides a seamless and efficient solution to fulfill various image-related requirements in a conversation.\nDifferent from previous work, we support users teach their own LLM to use tools with simple refinement via self-instruction and LoRA.\n\n\u003ca href='https://gpt4tools.github.io'\u003e\u003cimg src='https://img.shields.io/badge/Project-Page-Green'\u003e\u003c/a\u003e  \u003ca href='https://huggingface.co/stevengrove/gpt4tools-vicuna-13b-lora'\u003e\u003cimg src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue'\u003e\u003c/a\u003e  [![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://youtu.be/Qrj94ibQIT8) [![arXiv](https://img.shields.io/badge/arXiv-Paper-\u003cCOLOR\u003e.svg)](https://arxiv.org/pdf/2305.18752.pdf)\n\n## Updates\n\n* 🔥 We update new code and models adapted to vicuna-v1.5!\n* 🔥 Our paper is accpeted by [NIPS 2023](https://openreview.net/pdf?id=cwjh8lqmOL)!\n* 🔥 We now release the \u003ca href='https://arxiv.org/pdf/2305.18752.pdf'\u003e\u003cstrong\u003e\u003cfont color='#008AD7'\u003epaper\u003c/font\u003e\u003c/strong\u003e\u003c/a\u003e and new \u003ca href='https://huggingface.co/spaces/stevengrove/GPT4Tools'\u003e\u003cstrong\u003e\u003cfont color='#008AD7'\u003edemo\u003c/font\u003e\u003c/strong\u003e\u003c/a\u003e with LLAVA, OPT, LlaMA and Vicuna.\n* 🔥 We released pretrained GPT4Tools models with \u003cstrong\u003e\u003cfont color=\"#008AD7\"\u003eVicuna-13B\u003c/font\u003e\u003c/strong\u003e and released the dataset for \u003cstrong\u003e\u003cfont color=\"#008AD7\"\u003eself-instruction\u003c/font\u003e\u003c/strong\u003e. Check out the blog and demo.\n\n## Demo\nWe provide some selected examples using GPT4Tools in this section. More examples can be found in our [project page](https://gpt4tools.github.io). Feel free to try our onlin [demo](https://c60eb7e9400930f31b.gradio.live)!\n\n\n\u003cdiv align=center\u003e\n\u003cimg width=\"80%\" src=\"asserts/images/demo.gif\"/\u003e\n\u003c/div\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003eMore demos\u003c/summary\u003e\n\n|   |   |\n:-------------------------:|:-------------------------:\n![segment](asserts/images/demo_seg.png) |  ![detect kps](asserts/images/demo_kps.png)\n![solve problem](asserts/images/demo_explain.png)  |  ![style transfer](asserts/images/demo_style.png)\n\n\u003c/details\u003e\n\n\n---\n## Dataset\n| Data file name | **Size** | OneDrive| Google Driver|\n|:------------------|:--------:| :--------: | :---------:|\n| gpt4tools_71k.json    | 229 MB   | [link](https://1drv.ms/u/s!AqPQkBZ4aeVnhRdryHC9b1NtWJpZ?e=ZHBCqd) | [link](https://drive.google.com/file/d/1JKIT-Or1of7TJuWvmrJpPoOx0cLdcWry/view?usp=share_link)|\n| gpt4tools_val_seen.json    | --   | [link](https://1drv.ms/u/s!AqPQkBZ4aeVnhT1DPh5qZtSoZjtC?e=bDALfB) | [link](https://drive.google.com/file/d/1nDl7zhtQSx-L12K7151DfQD-XTqh_uzc/view?usp=sharing)|\n| gpt4tools_test_unseen.json    | --   | [link](https://1drv.ms/u/s!AqPQkBZ4aeVnhTz3dCV77Ps6abzQ?e=ex4ojQ) | [link](https://drive.google.com/file/d/1BHm0HEwYaVdMRYZiDdECy8ozyix607PH/view?usp=sharing)|\n\n* ```gpt4tools_71k.json``` contains 71K instruction-following data we used for fine-tuning the GPT4Tools model. \n\n* ```gpt4tools_val_seen.json``` is the manually cleaned instruction data used for validation, which includes instructions related to tools of ```gpt4tools_71k.json```.\n\n* ```gpt4tools_test_unseen.json``` cleaned instruction data used for testing, including instructions related to some tools that are absented in ```gpt4tools_71k.json```.\n\n[data.md](./asserts/docs/data.md) shows how to generate, format and clean the data.\n\n\n## Models\nGTP4Tools mainly contains three parts: LLM for instruction, LoRA for adaptation, and Visual Agent for provided functions.\nIt is a flexible and extensible system that can be easily extended to support more tools and functions.\nFor example, users can replace the existing LLM or tools with their own models, or add new tools to the system.\nThe only things needed are finetuned the LoRA with the provided instruction, which teaches LLM to use the provided tools.\n\n![image](asserts/images/overview.png)\n\nGPT4Tools is based on the [Vicuna](https://github.com/lm-sys/FastChat), we release the LoRA weights of GPT4Tools to comply with the LLaMA model license. You can merge our LoRA weights with the Vicuna weights to obtain the GPT4Tools weights.\n\n\n## Getting Start\n### Env\n```\ngit clone https://github.com/AILab-CVC/GPT4Tools\ncd GPT4Tools\npip install -r requirements.txt\n```\n\n### Weights \n1. Download [vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) and [vicuna-13b-v1.5](https://huggingface.co/lmsys/vicuna-13b-v1.5).\n```\n# download to your cache dir\npython3 scripts/download.py \\\n\t--model-names \"lmsys/vicuna-13b-v1.5\" \"lmsys/vicuna-7b-v1.5\" \\\n\t--cache-dir $your_cache_dir\n```\n\n2. Download gpt4tools LoRA weights following the below links:\n\n| Models | OneDrive| Google Driver| Huggingface|\n|:------------------|:--------: | :---------:| :---------:|\n| vicuna-7b-v1.5-gpt4tools    | [link]() | [link](https://drive.google.com/file/d/1UdA6_iOxXZs2V13adLa_V605Ty19KR4s/view?usp=sharing) | | \n| vicuna-13b-v1.5-gpt4tools    | [link]() | [link](https://drive.google.com/file/d/1V6r2aoo1ovxMi63yPkUC0fwdz-M-xXwC/view?usp=sharing)| |\n\nOld weights can be found [here](./asserts/docs/weights.md).\n\n### Tools\nGPT4Tools supports 22 tools. Please check [tools.md](docs/tools.md) for more details.\nWhen using tools for the first time, the weights of tools need to be downloaded to cache. If you don't like stored them on default cache, please revise the shell environment varibles: \n```\nexport TRANSFORMERS_CACHE=${your_transformers_cache}\nexport HUGGINGFACE_HUB_CACHE=${your_diffusers_cache}\n```\nAdditionally, you can also download weights to the custom cache.\n```\n# download huggingface model\npython3 scripts/download.py \\\n\t--model-names \"Salesforce/blip-image-captioning-base\" \"Salesforce/blip-vqa-base\" \"timbrooks/instruct-pix2pix\" \"runwayml/stable-diffusion-v1-5\" \"runwayml/stable-diffusion-inpainting\" \"lllyasviel/ControlNet\" \"fusing/stable-diffusion-v1-5-controlnet-canny\" \"fusing/stable-diffusion-v1-5-controlnet-mlsd\" \"fusing/stable-diffusion-v1-5-controlnet-hed\" \"fusing/stable-diffusion-v1-5-controlnet-scribble\" \"fusing/stable-diffusion-v1-5-controlnet-openpose\" \"fusing/stable-diffusion-v1-5-controlnet-seg\" \"fusing/stable-diffusion-v1-5-controlnet-depth\" \"fusing/stable-diffusion-v1-5-controlnet-normal\" \"sam\" \"groundingdino\" \\\n\t--cache-dir $your_cache_dir\n```\n\n### Serving with Web GUI \nFollowing [scripts/demo.sh](./scripts/demo.sh) or the below code to make a gradio interface on your own devices:\n```\n# Advice for 1 GPU\npython gpt4tools_demo.py \\\n\t--base_model $path_to_vicuna_with_tokenizer \\\n\t--lora_model $path_to_lora_weights \\\n\t--llm_device \"cpu\" \\ \n\t--load \"Text2Box_cuda:0,Segmenting_cuda:0,Inpainting_cuda:0,ImageCaptioning_cuda:0\" \\ \n\t--cache-dir $your_cache_dir \\\n\t--server-port 29509 \\\n\t--share\n```\n\n```\n# Advice for 4 GPUs\npython gpt4tools_demo.py \\\n\t--base_model $path_to_vicuna_with_tokenizer\n\t--lora_model $path_to_lora_weights \\\n\t--llm_device \"cuda:3\" \\\n\t--load \"Text2Box_cuda:0,Segmenting_cuda:0,Inpainting_cuda:0,ImageCaptioning_cuda:0,Text2Image_cuda:1,VisualQuestionAnswering_cuda:1,InstructPix2Pix_cuda:2,SegText2Image_cuda:2,Image2Pose_cpu,PoseText2Image_cuda:2\" \\\n\t--cache-dir $your_cache_dir \\\n\t--server-port 29509 \\\n\t--share\n```\n\nYou can customize the used tools by specifying ```{tools_name}_{devices}``` after args ```--load``` of ```gpt4tools_demo.py```. ```tools_name``` is illustrated in [tools.md](./docs/tools.md).\n\n### Finetuning\nAfter downloading the ```gpt4tools_71k.json``` to ```./datasets```, you can follow [scripts/finetune_lora.sh](scripts/finetune_lora.sh) or run the below code to finetune your model:\n```\ndeepspeed train.py \\\n\t--base_model $path_to_vicuna_with_tokenizer \\\n\t--data_path $path_to_gpt4tools_71k.json \\\n\t--deepspeed \"scripts/zero2.json\" \\\n\t--output_dir output/gpt4tools \\\n\t--num_epochs 6 \\\n\t--per_device_train_batch_size 1 \\\n\t--per_device_eval_batch_size 4 \\\n\t--gradient_accumulation_steps 16 \\\n\t--model_max_length 2048 \\\n\t--lora_target_modules '[q_proj,k_proj,v_proj,o_proj]' \\\n\t--lora_r 16 \\\n\t--learning_rate 3e-4 \\\n\t--lazy_preprocess True \\\n\t--cache_dir $your_cache_dir \\\n\t--report_to 'tensorboard' \\\n\t--gradient_checkpointing True\n```\n\n| Hyperparameter | Global Batch Size | Learning rate | Max length | Weight decay | LoRA attention dimension (lora_r) | LoRA scaling alpha(lora_alpha) | LoRA dropout (lora_dropout) | Modules to apply LoRA (lora_target_modules)      |\n|:--------------:|:-----------------:|:-------------:|:----------:|:------------:|:---------------------------------:|:----------:|:------------:|:-----------------------------:|\n|    GPT4Tools \u0026 Vicuna-13B   |        512        |      3e-4     |    2048    |      0.0     |                 16                |     16     |     0.05     | [q_proj,k_proj,v_proj,o_proj] |\n\nIf you want to evaluate the model's successful rate of using tools, please locate [here](./asserts/docs/inference.md).\n\n## Acknowledgement\n* [VisualChatGPT](https://github.com/microsoft/TaskMatrix): It connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting.\n* [Vicuna](https://github.com/lm-sys/FastChat): The language ability of Vicuna is fantastic and amazing. And it is open-source!\n* [Alpaca-LoRA](https://github.com/tloen/alpaca-lora): Instruct-tune LLaMA on consumer hardware.\n\nIf you're using our GPT4Tools in your research or applications, please cite:\n```\n@misc{gpt4tools,\n  title = {GPT4Tools: Teaching LLM to Use Tools via Self-instruction},\n  author={Rui Yang, Lin Song, Yanwei Li, Sijie Zhao, Yixiao Ge, Xiu Li, Ying Shan},\n  journal={arXiv preprint arXiv:2305.18752},\n  year={2023}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FStevenGrove%2FGPT4Tools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FStevenGrove%2FGPT4Tools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FStevenGrove%2FGPT4Tools/lists"}