{"id":13379259,"url":"https://github.com/yxuansu/PandaGPT","last_synced_at":"2025-03-13T05:30:49.649Z","repository":{"id":168727309,"uuid":"642558250","full_name":"yxuansu/PandaGPT","owner":"yxuansu","description":"[TLLM'23] PandaGPT: One Model To Instruction-Follow Them All","archived":false,"fork":false,"pushed_at":"2023-06-01T19:39:49.000Z","size":22375,"stargazers_count":763,"open_issues_count":22,"forks_count":60,"subscribers_count":11,"default_branch":"main","last_synced_at":"2024-11-09T19:41:18.533Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://panda-gpt.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yxuansu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-05-18T21:01:24.000Z","updated_at":"2024-11-04T02:35:02.000Z","dependencies_parsed_at":"2024-01-14T10:05:55.645Z","dependency_job_id":"fe2117b3-f770-4329-90cd-09177eb01893","html_url":"https://github.com/yxuansu/PandaGPT","commit_stats":null,"previous_names":["yxuansu/pandagpt"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yxuansu%2FPandaGPT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yxuansu%2FPandaGPT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yxuansu%2FPandaGPT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yxuansu%2FPandaGPT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yxuansu","download_url":"https://codeload.github.com/yxuansu/PandaGPT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243350999,"owners_count":20276893,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T07:02:03.846Z","updated_at":"2025-03-13T05:30:46.565Z","avatar_url":"https://github.com/yxuansu.png","language":"Python","funding_links":[],"categories":["Python","多模态大模型"],"sub_categories":["网络服务_其他"],"readme":"\u003cp align=\"center\" width=\"100%\"\u003e\n\u003cimg src=\"./pandagpt.png\" alt=\"PandaGPT-4\" style=\"width: 40%; min-width: 300px; display: block; margin: auto;\"\u003e\n\u003c/p\u003e\n\n# PandaGPT: One Model To Instruction-Follow Them All\n\n![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg)\n![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)\n![Model Weight License](https://img.shields.io/badge/Model_Weight%20License-CC%20By%20NC%204.0-red.svg)\n![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)\n\n\n\u003cp align=\"left\"\u003e\n   🌐 \u003ca href=\"https://panda-gpt.github.io/\" target=\"_blank\"\u003eProject Page\u003c/a\u003e • 🤗 \u003ca href=\"https://huggingface.co/spaces/GMFTBY/PandaGPT\" target=\"_blank\"\u003eOnline Demo\u003c/a\u003e • 🤗 \u003ca href=\"https://ailabnlp.tencent.com/research_demos/panda_gpt/\" target=\"_blank\"\u003eOnline Demo-2 (Runs fast for users from mainland China)\u003c/a\u003e • 📃 \u003ca href=\"http://arxiv.org/abs/2305.16355\" target=\"_blank\"\u003ePaper\u003c/a\u003e •  ⏬ \u003ca href=\"https://github.com/yxuansu/PandaGPT/blob/main/README.md#31-data-preparation\" target=\"_blank\"\u003eData\u003c/a\u003e • 🤖 \u003ca href=\"https://github.com/yxuansu/PandaGPT/blob/main/README.md#24-prepare-delta-weights-of-pandagpt\" target=\"_blank\"\u003eModel\u003c/a\u003e • 📹 \u003ca href=\"https://www.youtube.com/watch?v=96XgdQle7EY\" target=\"_blank\"\u003eVideo\u003c/a\u003e\n\u003c/p\u003e\n\n\n**Team:** [Yixuan Su](https://yxuansu.github.io/)\u003csup\u003e\\*\u003c/sup\u003e, [Tian Lan](https://github.com/gmftbyGMFTBY)\u003csup\u003e\\*\u003c/sup\u003e, [Huayang Li](https://sites.google.com/view/huayangli)\u003csup\u003e\\*\u003c/sup\u003e, Jialu Xu, Yan Wang, and [Deng Cai](https://jcyk.github.io/)\u003csup\u003e\\*\u003c/sup\u003e (Major contributors\u003csup\u003e\\*\u003c/sup\u003e)\n\n****\n\n## Online Demo Demonstration:\n\nBelow, we demonstrate some examples of our online [demo](https://huggingface.co/spaces/GMFTBY/PandaGPT). For more generated examples of PandaGPT, please refer to our [webpage](https://panda-gpt.github.io/) or our [paper](https://github.com/yxuansu/PandaGPT/blob/main/PandaGPT.pdf).\n\n\u003cp align=\"center\" width=\"100%\"\u003e\n\u003cimg src=\"./demonstration.jpg\" alt=\"PandaGPT-4\" style=\"width: 100%; min-width: 300px; display: block; margin: auto;\"\u003e\n\u003c/p\u003e\n\n(1) In this example, PandaGPT takes an input image and reasons over the user's input.\n\n\u003cp align=\"center\" width=\"100%\"\u003e\n\u003cimg src=\"./online_demo.jpg\" alt=\"PandaGPT-4\" style=\"width: 100%; min-width: 300px; display: block; margin: auto;\"\u003e\n\u003c/p\u003e\n\n(2) In this example, PandaGPT takes the joint input from two modalities, i.e. (1) an \u003cb\u003eimage\u003c/b\u003e 👀 of car and (2) an \u003cb\u003eaudio\u003c/b\u003e👂 of thunderstorm. \n\n\n****\n\n\u003cspan id='all_catelogue'/\u003e\n\n## Catalogue:\n* \u003ca href='#introduction'\u003e1. Introduction\u003c/a\u003e\n* \u003ca href='#environment'\u003e2. Running PandaGPT Demo\u003c/a\u003e\n    * \u003ca href='#install_environment'\u003e2.1. Environment Installation\u003c/a\u003e\n    * \u003ca href='#download_imagebind_model'\u003e2.2. Prepare ImageBind Checkpoint\u003c/a\u003e\n    * \u003ca href='#download_vicuna_model'\u003e2.3. Prepare Vicuna Checkpoint\u003c/a\u003e\n    * \u003ca href='#download_pandagpt'\u003e2.4. Prepare Delta Weights of PandaGPT\u003c/a\u003e\n    * \u003ca href='#running_demo'\u003e2.5. Deploying Demo\u003c/a\u003e\n* \u003ca href='#train_pandagpt'\u003e3. Train Your Own PandaGPT\u003c/a\u003e\n    * \u003ca href='#data_preparation'\u003e3.1. Data Preparation\u003c/a\u003e\n    * \u003ca href='#training_configurations'\u003e3.2. Training Configurations\u003c/a\u003e\n    * \u003ca href='#model_training'\u003e3.3. Training PandaGPT\u003c/a\u003e\n* \u003ca href='#license'\u003eUsage and License Notices\u003c/a\u003e\n* \u003ca href='#citation'\u003eCitation\u003c/a\u003e\n* \u003ca href='#acknowledgments'\u003eAcknowledgments\u003c/a\u003e\n\n****\n\n\u003cspan id='introduction'/\u003e\n\n### 1. Introduction: \u003ca href='#all_catelogue'\u003e[Back to Top]\u003c/a\u003e\n\n\u003cp align=\"center\" width=\"100%\"\u003e\n\u003cimg src=\"./PandaGPT.png\" alt=\"PandaGPT-4\" style=\"width: 80%; min-width: 300px; display: block; margin: auto;\"\u003e\n\u003c/p\u003e\n\n**License** The icons in the image are taken from [this website](https://www.flaticon.com).\n\n\nPandaGPT is the first foundation model capable of instruction-following data across six modalities, without the need of explicit supervision. It demonstrates a diverse set of multimodal capabilities such as complex understanding/reasoning, knowledge-grounded description, and multi-turn conversation.\n\nPandaGPT is a general-purpose instruction-following model that can both \u003cb\u003esee 👀\u003c/b\u003e and \u003cb\u003ehear👂\u003c/b\u003e. Our pilot experiments show that PandaGPT can perform complex tasks such as detailed image description generation, writing stories inspired by videos, and answering questions about audios. More Interestingly, PandaGPT can take multimodal inputs simultaneously and compose their semantics naturally. For example, PandaGPT can connect how objects look in a photo and how they sound in an audio. \n\n\n****\n\n\u003cspan id='environment'/\u003e\n\n### 2. Running PandaGPT Demo: \u003ca href='#all_catelogue'\u003e[Back to Top]\u003c/a\u003e\n\n\u003cspan id='install_environment'/\u003e\n\n#### 2.1. Environment Installation:\nTo install the required environment, please run\n```\npip install -r requirements.txt\n```\n\nThen install the Pytorch package with the correct cuda version, for example\n```\npip install torch==1.13.1+cu117 -f https://download.pytorch.org/whl/torch/\n```\n\n\u003cspan id='download_imagebind_model'/\u003e\n\n#### 2.2. Prepare ImageBind Checkpoint:\nYou can download the pre-trained ImageBind model using [this link](https://dl.fbaipublicfiles.com/imagebind/imagebind_huge.pth). After downloading, put the downloaded file (imagebind_huge.pth) in [[./pretrained_ckpt/imagebind_ckpt/]](./pretrained_ckpt/imagebind_ckpt/) directory. \n\n\u003cspan id='download_vicuna_model'/\u003e\n\n#### 2.3. Prepare Vicuna Checkpoint:\nTo prepare the pre-trained Vicuna model, please follow the instructions provided [[here]](./pretrained_ckpt#1-prepare-vicuna-checkpoint).\n\n\n\u003cspan id='download_pandagpt'/\u003e\n\n#### 2.4. Prepare Delta Weights of PandaGPT:\n\n|**Base Language Model**|**Maximum Sequence Length**|**Huggingface Delta Weights Address**|\n|:-------------:|:-------------:|:-------------:|\n|Vicuna-7B (version 0)|512|[openllmplayground/pandagpt_7b_max_len_512](https://huggingface.co/openllmplayground/pandagpt_7b_max_len_512)|\n|Vicuna-7B (version 0)|1024|[openllmplayground/pandagpt_7b_max_len_1024](https://huggingface.co/openllmplayground/pandagpt_7b_max_len_1024)|\n|Vicuna-13B (version 0)|256|[openllmplayground/pandagpt_13b_max_len_256](https://huggingface.co/openllmplayground/pandagpt_13b_max_len_256)|\n|Vicuna-13B (version 0)|400|[openllmplayground/pandagpt_13b_max_len_400](https://huggingface.co/openllmplayground/pandagpt_13b_max_len_400)|\n\nWe release the delta weights of PandaGPT trained with different strategies in the table above. After downloading, put the downloaded 7B/13B delta weights file (pytorch_model.pt) in the [./pretrained_ckpt/pandagpt_ckpt/7b/](./pretrained_ckpt/pandagpt_ckpt/7b/) or [./pretrained_ckpt/pandagpt_ckpt/13b/](./pretrained_ckpt/pandagpt_ckpt/13b/) directory. In our [online demo](https://huggingface.co/spaces/GMFTBY/PandaGPT), we use the `openllmplayground/pandagpt_7b_max_len_1024` as our default model due to the limitation of computation resource. Better results are expected if switching to `openllmplayground/pandagpt_13b_max_len_400`.\n\n\u003cspan id='running_demo'/\u003e\n\n#### 2.5. Deploying Demo:\nUpon completion of previous steps, you can run the demo locally as\n```bash\ncd ./code/\nCUDA_VISIBLE_DEVICES=0 python web_demo.py\n```\n\nIf you running into `sample_rate` problem, please git install `pytorchvideo` from the source as\n```yaml\ngit clone https://github.com/facebookresearch/pytorchvideo\ncd pytorchvideo\npip install --editable ./\n```\n\n****\n\n\u003cspan id='train_pandagpt'/\u003e\n\n### 3. Train Your Own PandaGPT: \u003ca href='#all_catelogue'\u003e[Back to Top]\u003c/a\u003e\n\n**Prerequisites:** Before training the model, making sure the environment is properly installed and the checkpoints of ImageBind and Vicuna are downloaded. You can refer to [here](https://github.com/yxuansu/PandaGPT#2-running-pandagpt-demo-back-to-top) for more information.  \n\n\u003cspan id='data_preparation'/\u003e\n\n#### 3.1. Data Preparation:\n\n**Declaimer:** To ensure the reproducibility of our results, we have released our training dataset. The dataset must be used for research purpose only. The use of the dataset must comply with the licenses from original sources, i.e. LLaVA and MiniGPT-4. These datasets may be taken down when requested by the original authors.\n\n|**Training Task**|**Dataset Address**|\n|:-------------:|:-------------:|\n|Visual Instruction-Following|[openllmplayground/pandagpt_visual_instruction_dataset](https://huggingface.co/datasets/openllmplayground/pandagpt_visual_instruction_dataset)|\n\nAfter downloading, put the downloaded file and unzip them under the [./data/](./data/) directory.\n\n\u003e **** The directory should look like:\n\n    .\n    └── ./data/ \n         ├── pandagpt4_visual_instruction_data.json\n         └── /images/\n             ├── 000000426538.jpg\n             ├── 000000306060.jpg\n             └── ...\n              \n\n\u003cspan id='training_configurations'/\u003e\n\n#### 3.2 Training Configurations:\n\nThe table below show the training hyperparameters used in our experiments. The hyperparameters are selected based on the constrain of our computational resources, i.e. 8 x A100 (40G) GPUs.\n\n|**Base Language Model**|**Training Task**|**Epoch Number**|**Batch Size**|**Learning Rate**|**Maximum Length**|\n|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|\n|7B|Visual Instruction|2|64|5e-4|1024|\n|13B|Visual Instruction|2|64|5e-4|400|\n\n\n\n\u003cspan id='model_training'/\u003e\n\n\n#### 3.3. Training PandaGPT:\n \nTo train PandaGPT, please run the following commands:\n```yaml\ncd ./code/scripts/\nchmod +x train.sh\ncd ..\n./scripts/train.sh\n```\n\nThe key arguments of the training script are as follows:\n* `--data_path`: The data path for the json file `pandagpt4_visual_instruction_data.json`.\n* `--image_root_path`: The root path for the downloaded images.\n* `--imagebind_ckpt_path`: The path where saves the ImageBind checkpoint `imagebind_huge.pth`.\n* `--vicuna_ckpt_path`: The directory that saves the pre-trained Vicuna checkpoints.\n* `--max_tgt_len`: The maximum sequence length of training instances.\n* `--save_path`: The directory which saves the trained delta weights. This directory will be automatically created.\n\nNote that the epoch number can be set in the `epochs` argument at [./code/config/openllama_peft.yaml](./code/config/openllama_peft.yaml) file. The `train_micro_batch_size_per_gpu` and `gradient_accumulation_steps` arguments in [./code/dsconfig/openllama_peft_stage_1.json](./code/dsconfig/openllama_peft_stage_1.json) should be set as `2` and `4` for 7B model, and set as `1` and `8` for 13B model.\n\n****\n\n\u003cspan id='license'/\u003e\n\n### Usage and License Notices:\n\nPandaGPT is intended and licensed for research use only. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes. The delta weights are also CC BY NC 4.0 (allowing only non-commercial use).\n\n\n****\n\n\u003cspan id='citation'/\u003e\n\n### Citation:\n\nIf you found PandaGPT useful in your research or applications, please kindly cite using the following BibTeX:\n```\n@article{su2023pandagpt,\n  title={PandaGPT: One Model To Instruction-Follow Them All},\n  author={Su, Yixuan and Lan, Tian and Li, Huayang and Xu, Jialu and Wang, Yan and Cai, Deng},\n  journal={arXiv preprint arXiv:2305.16355},\n  year={2023}\n}\n```\n\n\n****\n\n\u003cspan id='acknowledgments'/\u003e\n\n### Acknowledgments:\n\n\nThis repo benefits from [OpenAlpaca](https://github.com/yxuansu/OpenAlpaca), [ImageBind](https://github.com/facebookresearch/ImageBind), [LLaVA](https://github.com/haotian-liu/LLaVA), and [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4). Thanks for their wonderful works!\n\n\n\n \n\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyxuansu%2FPandaGPT","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyxuansu%2FPandaGPT","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyxuansu%2FPandaGPT/lists"}