{"id":20648164,"url":"https://github.com/optimalscale/detgpt","last_synced_at":"2025-04-12T20:45:15.200Z","repository":{"id":163043241,"uuid":"637920281","full_name":"OptimalScale/DetGPT","owner":"OptimalScale","description":null,"archived":false,"fork":false,"pushed_at":"2024-08-07T14:49:11.000Z","size":36665,"stargazers_count":770,"open_issues_count":27,"forks_count":71,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-04-03T23:12:02.052Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OptimalScale.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-08T17:36:10.000Z","updated_at":"2025-03-31T13:09:22.000Z","dependencies_parsed_at":"2024-11-23T18:11:27.551Z","dependency_job_id":null,"html_url":"https://github.com/OptimalScale/DetGPT","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OptimalScale%2FDetGPT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OptimalScale%2FDetGPT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OptimalScale%2FDetGPT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OptimalScale%2FDetGPT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OptimalScale","download_url":"https://codeload.github.com/OptimalScale/DetGPT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248631688,"owners_count":21136556,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-16T17:06:42.252Z","updated_at":"2025-04-12T20:45:15.178Z","avatar_url":"https://github.com/OptimalScale.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DetGPT: Detect What You Need via Reasoning\n\n[![Demo](https://img.shields.io/badge/Website-Demo-ff69b4.svg)](https://a03e18d54fcb7ceb54.gradio.live)\n[![Project](https://img.shields.io/badge/Project-Page-20B2AA.svg)](https://detgpt.github.io/)\n[![Code License](https://img.shields.io/badge/License-BSD--3--Clause-green)](https://github.com/OptimalScale/DetGPT/blob/master/LICENSE.md)\n[![Python 3.9+](https://img.shields.io/badge/Python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/)\n[![Embark](https://img.shields.io/badge/Discord-DetGPT-%237289da.svg?logo=discord)](https://discord.gg/u9VJNpzhvA)\n[![slack badge](https://img.shields.io/badge/Slack-Join-blueviolet?logo=slack\u0026amp)](https://join.slack.com/t/lmflow/shared_invite/zt-1s6egx12s-THlwHuCjF6~JGKmx7JoJPA)\n[![WeChat badge](https://img.shields.io/badge/WeChat-Join-brightgreen?logo=wechat\u0026amp)](https://i.328888.xyz/2023/05/08/i19P4Q.jpeg)\n\n\u003ca href=\"https://detgpt.github.io/\"\u003e\u003cimg src=\"assets/demo_refrige.gif\" width=\"100%\"\u003e\u003c/a\u003e\n\n## News\n* [2023-06-13] Added tuned linear weights for Vicuna-7b.\n* [2023-05-25] Our paper is available at [this link](https://arxiv.org/abs/2305.14167).\n* [2023-05-09] We have launched our [project website](https://detgpt.github.io).\n* [2023-05-08] The first version of DetGPT is available now! Try our [demo](https://883e396b2a812343ca.gradio.live/).\n\n\n## Online Demo\nDue to high website traffic, we have created multiple online services. If one link is not working, please use another one. Thank you for your support!\n\n\n[Demo](https://8d23682acd7bb9cb19.gradio.live)\n\n[演示 (简体中文)](https://5eb087810868adf099.gradio.live)\n\n[Demo (backup)](https://8d23682acd7bb9cb19.gradio.live)\n\n[comment]: \u003c\u003e ([Demo4]\u0026#40;https://b66150ee453d74dfeb.gradio.live/\u0026#41;)\n\n\n## Examples\n\n  |   |\n:-------------------------:\n![ex1](assets/ex1.jpeg) | \n![ex5](assets/ex6.png)  |\n![ex3](assets/ex4.png)  |  \n\n\n## Features\n\u003cp align=\"center\" width=\"100%\"\u003e\n\u003cimg src=\"assets/detgpt.png\" alt=\"DetGPT\" style=\"width: 100%; min-width: 300px; display: block; margin: auto; background-color: transparent;\"\u003e\n\u003c/p\u003e\n\n- DetGPT locates target objects, not just describing images.\n- DetGPT understands complex instructions, like \"Find blood pressure-reducing foods in the image.\"\n- DetGPT accurately localizes target objects via LLM reasoning. - For example, it can identify bananas as a potassium-rich food to alleviate high blood pressure.\n- DetGPT provides answers beyond human common sense, like identifying unfamiliar fruits rich in potassium.\n\n\n## Setup\n\n**1. Installation**\n```bash\ngit clone https://github.com/OptimalScale/DetGPT.git\ncd DetGPT\nconda create -n detgpt python=3.9 -y\nconda activate detgpt\npip install -e .\n```\n\n**2. Install GroundingDino**\n```bash\npython -m pip install -e GroundingDINO\n```\n\n**2. Download the pretrained checkpoint and task tuning dataset**\n\nOur model is based on pretrained language model checkpoints.\nIn our experiments, we use [Robin](https://github.com/OptimalScale/LMFlow#model-zoo) from [LMFlow team](https://github.com/OptimalScale/LMFlow), and [Vicuna](https://lmsys.org/blog/2023-03-30-vicuna/) and find they perform competitively well.\nYou can run following script to download the Robin checkpoint:\n```\ncd output_models\nbash download.sh all\ncd -\n```\nMerge the robin lora model with the original llama model and save the merged\nmodel to `output_models/robin-7b`, where the corresponding model path is\nspecified in this config file\n[here](detgpt/configs/models/detgpt_robin_7b.yaml#L16).\n\nTo obtain the original llama model, one may refer to this\n[doc](https://optimalscale.github.io/LMFlow/examples/checkpoints.html). To\nmerge a lora model with a base model, one may refer to\n[PEFT](https://github.com/huggingface/peft) or use the\n[merge script](https://github.com/OptimalScale/LMFlow#53-reproduce-the-result)\nprovided by LMFlow.\n\nThe dataset for task tuning is named \"coco_task_annotation.json\". Please modify detgpt/configs/datasts/coco/align.yaml, such that \"storage\" points to the COCO dataset, and \"file_name\" points to the path of the instruction tuning dataset.\n## Data Preparation\n```\ncd dataset\nmkdir coco\n```\nDownload the COCO dataset from [COCO home page](https://cocodataset.org/#home).\n\nHere is the data structure:\n\n```\ndataset/coco/\n├── train2017/\n├── val2017/\n├── annotations.json\n├── coco_task_annotation.json\n```\nNote: Please move ```coco_task_annotation.json``` from ```output_models/``` to ```coco/```\n## Training\nPlease execute the following command to conduction task training:\n```\nCUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc-per-node 8 train.py --cfg-path path-to-config\n```\nNote that we provide two example config files for task tuning under configs/ directory. You need to replace model/ckpt with the path to pretrained linear weights of first stage.\n## Deploy Demo Locally\nRun the demo by executing the following command. Replace 'path/to/pretrained_linear_weights' in the config file to the real path.  We currently release linear weights based on [Vicuna-13B-v1.1](https://github.com/lm-sys/FastChat#vicuna-weights) and will release other weights later. The demo runs on 2 GPUs by default, one for the language model and another for GroundingDino.\n\n```\nCUDA_VISIBLE_DEVICES=0,1 python demo_detgpt.py --cfg-path configs/detgpt_eval_13b.yaml\n```\n\n\n## Acknowledgement\nThe project is built on top of the amazing open-vocabulary detector [GroundingDino](https://github.com/IDEA-Research/GroundingDINO) and multimodal conversation model [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4), which is based on [BLIP2](https://huggingface.co/docs/transformers/main/model_doc/blip-2) and [Lavis](https://github.com/salesforce/LAVIS). \nThanks for these great work!\n\n\nIf you're using DetGPT in your research or applications, please cite using this BibTeX:\n```bibtex\n@misc{pi2023detgpt,\n      title={DetGPT: Detect What You Need via Reasoning}, \n      author={Renjie Pi and Jiahui Gao and Shizhe Diao and Rui Pan and Hanze Dong and Jipeng Zhang and Lewei Yao and Jianhua Han and Hang Xu and Lingpeng Kong and Tong Zhang},\n      year={2023},\n      eprint={2305.14167},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n```\n\n\n## License\nThis repository is released under [BSD 3-Clause License](LICENSE.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foptimalscale%2Fdetgpt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foptimalscale%2Fdetgpt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foptimalscale%2Fdetgpt/lists"}