{"id":13429507,"url":"https://github.com/IDEA-Research/GroundingDINO","last_synced_at":"2025-03-16T03:31:48.470Z","repository":{"id":143013637,"uuid":"611591640","full_name":"IDEA-Research/GroundingDINO","owner":"IDEA-Research","description":"[ECCV 2024] Official implementation of the paper \"Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection\"","archived":false,"fork":false,"pushed_at":"2024-08-12T08:52:02.000Z","size":13120,"stargazers_count":7634,"open_issues_count":287,"forks_count":773,"subscribers_count":46,"default_branch":"main","last_synced_at":"2025-03-15T21:51:35.106Z","etag":null,"topics":["object-detection","open-world","open-world-detection","vision-language","vision-language-transformer"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2303.05499","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IDEA-Research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-09T06:14:41.000Z","updated_at":"2025-03-15T19:32:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"c0c6df12-74cb-4f4b-8ab5-4a9d0f810d2e","html_url":"https://github.com/IDEA-Research/GroundingDINO","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDEA-Research%2FGroundingDINO","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDEA-Research%2FGroundingDINO/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDEA-Research%2FGroundingDINO/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDEA-Research%2FGroundingDINO/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IDEA-Research","download_url":"https://codeload.github.com/IDEA-Research/GroundingDINO/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243796675,"owners_count":20349255,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["object-detection","open-world","open-world-detection","vision-language","vision-language-transformer"],"created_at":"2024-07-31T02:00:40.894Z","updated_at":"2025-03-16T03:31:48.464Z","avatar_url":"https://github.com/IDEA-Research.png","language":"Python","funding_links":[],"categories":["Projects 💻","2 Foundation Models","Python","Segmentation + Vision-Language","4. 机器学习项目 | ML","Paper List","Applications","Repos","🔧 SAM Extensions \u0026 Variants","Object Detection Applications","Vision Model Backbone","Stable Diffusion 扩展插件","UI Understanding and Computer Use","Semantic \u0026 Open-Vocabulary Perception"],"sub_categories":["2.2 Vision Foundation Models","Seminal Papers","提示语（魔法）","🎯 Grounding \u0026 Multi-Modal","整合服务","Projects and references","Open-Vocabulary Detection"],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./.asset/grounding_dino_logo.png\" width=\"30%\"\u003e\n\u003c/div\u003e\n\n# :sauropod: Grounding DINO \n\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/grounding-dino-marrying-dino-with-grounded/zero-shot-object-detection-on-mscoco)](https://paperswithcode.com/sota/zero-shot-object-detection-on-mscoco?p=grounding-dino-marrying-dino-with-grounded) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/grounding-dino-marrying-dino-with-grounded/zero-shot-object-detection-on-odinw)](https://paperswithcode.com/sota/zero-shot-object-detection-on-odinw?p=grounding-dino-marrying-dino-with-grounded) \\\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/grounding-dino-marrying-dino-with-grounded/object-detection-on-coco-minival)](https://paperswithcode.com/sota/object-detection-on-coco-minival?p=grounding-dino-marrying-dino-with-grounded) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/grounding-dino-marrying-dino-with-grounded/object-detection-on-coco)](https://paperswithcode.com/sota/object-detection-on-coco?p=grounding-dino-marrying-dino-with-grounded)\n\n\n**[IDEA-CVR, IDEA-Research](https://github.com/IDEA-Research)** \n\n[Shilong Liu](http://www.lsl.zone/), [Zhaoyang Zeng](https://scholar.google.com/citations?user=U_cvvUwAAAAJ\u0026hl=zh-CN\u0026oi=ao), [Tianhe Ren](https://rentainhe.github.io/), [Feng Li](https://scholar.google.com/citations?user=ybRe9GcAAAAJ\u0026hl=zh-CN), [Hao Zhang](https://scholar.google.com/citations?user=B8hPxMQAAAAJ\u0026hl=zh-CN), [Jie Yang](https://github.com/yangjie-cv), [Chunyuan Li](https://scholar.google.com/citations?user=Zd7WmXUAAAAJ\u0026hl=zh-CN\u0026oi=ao), [Jianwei Yang](https://jwyang.github.io/), [Hang Su](https://scholar.google.com/citations?hl=en\u0026user=dxN1_X0AAAAJ\u0026view_op=list_works\u0026sortby=pubdate), [Jun Zhu](https://scholar.google.com/citations?hl=en\u0026user=axsP38wAAAAJ), [Lei Zhang](https://www.leizhang.org/)\u003csup\u003e:email:\u003c/sup\u003e.\n\n\n[[`Paper`](https://arxiv.org/abs/2303.05499)] [[`Demo`](https://huggingface.co/spaces/ShilongLiu/Grounding_DINO_demo)] [[`BibTex`](#black_nib-citation)]\n\n\nPyTorch implementation and pretrained models for Grounding DINO. For details, see the paper **[Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection](https://arxiv.org/abs/2303.05499)**.\n\n- 🔥 **[Grounded SAM 2](https://github.com/IDEA-Research/Grounded-SAM-2)** is released now, which combines Grounding DINO with [SAM 2](https://github.com/facebookresearch/segment-anything-2) for any object tracking in open-world scenarios.\n- 🔥 **[Grounding DINO 1.5](https://github.com/IDEA-Research/Grounding-DINO-1.5-API)** is released now, which is IDEA Research's **Most Capable** Open-World Object Detection Model!\n- 🔥 **[Grounding DINO](https://arxiv.org/abs/2303.05499)** and **[Grounded SAM](https://arxiv.org/abs/2401.14159)** are now supported in Huggingface. For more convenient use, you can refer to [this documentation](https://huggingface.co/docs/transformers/model_doc/grounding-dino)\n\n## :sun_with_face: Helpful Tutorial\n\n- :grapes: [[Read our arXiv Paper](https://arxiv.org/abs/2303.05499)]\n- :apple:  [[Watch our simple introduction video on YouTube](https://youtu.be/wxWDt5UiwY8)]\n- :blossom:   \u0026nbsp;[[Try the Colab Demo](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/zero-shot-object-detection-with-grounding-dino.ipynb)]\n- :sunflower: [[Try our Official Huggingface Demo](https://huggingface.co/spaces/ShilongLiu/Grounding_DINO_demo)]\n- :maple_leaf: [[Watch the Step by Step Tutorial about GroundingDINO by Roboflow AI](https://youtu.be/cMa77r3YrDk)]\n- :mushroom: [[GroundingDINO: Automated Dataset Annotation and Evaluation by Roboflow AI](https://youtu.be/C4NqaRBz_Kw)]\n- :hibiscus: [[Accelerate Image Annotation with SAM and GroundingDINO by Roboflow AI](https://youtu.be/oEQYStnF2l8)]\n- :white_flower: [[Autodistill: Train YOLOv8 with ZERO Annotations based on Grounding-DINO and Grounded-SAM by Roboflow AI](https://github.com/autodistill/autodistill)]\n\n\u003c!-- Grounding DINO Methods | \n[![arXiv](https://img.shields.io/badge/arXiv-2303.05499-b31b1b.svg)](https://arxiv.org/abs/2303.05499) \n[![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://youtu.be/wxWDt5UiwY8) --\u003e\n\n\u003c!-- Grounding DINO Demos |\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/zero-shot-object-detection-with-grounding-dino.ipynb) --\u003e\n\u003c!-- [![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://youtu.be/cMa77r3YrDk)\n[![HuggingFace space](https://img.shields.io/badge/🤗-HuggingFace%20Space-cyan.svg)](https://huggingface.co/spaces/ShilongLiu/Grounding_DINO_demo)\n[![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://youtu.be/oEQYStnF2l8)\n[![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://youtu.be/C4NqaRBz_Kw) --\u003e\n\n## :sparkles: Highlight Projects\n\n- [Semantic-SAM: a universal image segmentation model to enable segment and recognize anything at any desired granularity.](https://github.com/UX-Decoder/Semantic-SAM), \n- [DetGPT: Detect What You Need via Reasoning](https://github.com/OptimalScale/DetGPT)\n- [Grounded-SAM: Marrying Grounding DINO with Segment Anything](https://github.com/IDEA-Research/Grounded-Segment-Anything)\n- [Grounding DINO with Stable Diffusion](demo/image_editing_with_groundingdino_stablediffusion.ipynb)\n- [Grounding DINO with GLIGEN for Controllable Image Editing](demo/image_editing_with_groundingdino_gligen.ipynb)\n- [OpenSeeD: A Simple and Strong Openset Segmentation Model](https://github.com/IDEA-Research/OpenSeeD)\n- [SEEM: Segment Everything Everywhere All at Once](https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once)\n- [X-GPT: Conversational Visual Agent supported by X-Decoder](https://github.com/microsoft/X-Decoder/tree/xgpt)\n- [GLIGEN: Open-Set Grounded Text-to-Image Generation](https://github.com/gligen/GLIGEN)\n- [LLaVA: Large Language and Vision Assistant](https://github.com/haotian-liu/LLaVA)\n\n\u003c!-- Extensions | [Grounding DINO with Segment Anything](https://github.com/IDEA-Research/Grounded-Segment-Anything); [Grounding DINO with Stable Diffusion](demo/image_editing_with_groundingdino_stablediffusion.ipynb); [Grounding DINO with GLIGEN](demo/image_editing_with_groundingdino_gligen.ipynb)  --\u003e\n\n\n\n\u003c!-- Official PyTorch implementation of [Grounding DINO](https://arxiv.org/abs/2303.05499), a stronger open-set object detector. Code is available now! --\u003e\n\n\n## :bulb: Highlight\n\n- **Open-Set Detection.** Detect **everything** with language!\n- **High Performance.** COCO zero-shot **52.5 AP** (training without COCO data!). COCO fine-tune **63.0 AP**.\n- **Flexible.** Collaboration with Stable Diffusion for Image Editting.\n\n\n\n\n## :fire: News\n- **`2023/07/18`**: We release [Semantic-SAM](https://github.com/UX-Decoder/Semantic-SAM), a universal image segmentation model to enable segment and recognize anything at any desired granularity. **Code** and **checkpoint** are available!\n- **`2023/06/17`**: We provide an example to evaluate Grounding DINO on COCO zero-shot performance.\n- **`2023/04/15`**: Refer to [CV in the Wild Readings](https://github.com/Computer-Vision-in-the-Wild/CVinW_Readings) for those who are interested in open-set recognition!\n- **`2023/04/08`**: We release [demos](demo/image_editing_with_groundingdino_gligen.ipynb) to combine [Grounding DINO](https://arxiv.org/abs/2303.05499) with [GLIGEN](https://github.com/gligen/GLIGEN)  for more controllable image editings.\n- **`2023/04/08`**: We release [demos](demo/image_editing_with_groundingdino_stablediffusion.ipynb) to combine [Grounding DINO](https://arxiv.org/abs/2303.05499) with [Stable Diffusion](https://github.com/Stability-AI/StableDiffusion) for image editings.\n- **`2023/04/06`**: We build a new demo by marrying GroundingDINO with [Segment-Anything](https://github.com/facebookresearch/segment-anything) named **[Grounded-Segment-Anything](https://github.com/IDEA-Research/Grounded-Segment-Anything)** aims to support segmentation in GroundingDINO.\n- **`2023/03/28`**: A YouTube [video](https://youtu.be/cMa77r3YrDk) about Grounding DINO and basic object detection prompt engineering. [[SkalskiP](https://github.com/SkalskiP)]\n- **`2023/03/28`**: Add a [demo](https://huggingface.co/spaces/ShilongLiu/Grounding_DINO_demo) on Hugging Face Space!\n- **`2023/03/27`**: Support CPU-only mode. Now the model can run on machines without GPUs.\n- **`2023/03/25`**: A [demo](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/zero-shot-object-detection-with-grounding-dino.ipynb) for Grounding DINO is available at Colab. [[SkalskiP](https://github.com/SkalskiP)]\n- **`2023/03/22`**: Code is available Now!\n\n\u003cdetails open\u003e\n\u003csummary\u003e\u003cfont size=\"4\"\u003e\nDescription\n\u003c/font\u003e\u003c/summary\u003e\n \u003ca href=\"https://arxiv.org/abs/2303.05499\"\u003ePaper\u003c/a\u003e introduction.\n\u003cimg src=\".asset/hero_figure.png\" alt=\"ODinW\" width=\"100%\"\u003e\nMarrying \u003ca href=\"https://github.com/IDEA-Research/GroundingDINO\"\u003eGrounding DINO\u003c/a\u003e and \u003ca href=\"https://github.com/gligen/GLIGEN\"\u003eGLIGEN\u003c/a\u003e\n\u003cimg src=\"https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/GD_GLIGEN.png\" alt=\"gd_gligen\" width=\"100%\"\u003e\n\u003c/details\u003e\n\n## :star: Explanations/Tips for Grounding DINO Inputs and Outputs\n- Grounding DINO accepts an `(image, text)` pair as inputs.\n- It outputs `900` (by default) object boxes. Each box has similarity scores across all input words. (as shown in Figures below.)\n- We defaultly choose the boxes whose highest similarities are higher than a `box_threshold`.\n- We extract the words whose similarities are higher than the `text_threshold` as predicted labels.\n- If you want to obtain objects of specific phrases, like the `dogs` in the sentence `two dogs with a stick.`, you can select the boxes with highest text similarities with `dogs` as final outputs. \n- Note that each word can be split to **more than one** tokens with different tokenlizers. The number of words in a sentence may not equal to the number of text tokens.\n- We suggest separating different category names with `.` for Grounding DINO.\n![model_explain1](.asset/model_explan1.PNG)\n![model_explain2](.asset/model_explan2.PNG)\n\n## :label: TODO \n\n- [x] Release inference code and demo.\n- [x] Release checkpoints.\n- [x] Grounding DINO with Stable Diffusion and GLIGEN demos.\n- [ ] Release training codes.\n\n## :hammer_and_wrench: Install \n\n**Note:**\n\n0. If you have a CUDA environment, please make sure the environment variable `CUDA_HOME` is set. It will be compiled under CPU-only mode if no CUDA available.\n\nPlease make sure following the installation steps strictly, otherwise the program may produce: \n```bash\nNameError: name '_C' is not defined\n```\n\nIf this happened, please reinstalled the groundingDINO by reclone the git and do all the installation steps again.\n \n#### how to check cuda:\n```bash\necho $CUDA_HOME\n```\nIf it print nothing, then it means you haven't set up the path/\n\nRun this so the environment variable will be set under current shell. \n```bash\nexport CUDA_HOME=/path/to/cuda-11.3\n```\n\nNotice the version of cuda should be aligned with your CUDA runtime, for there might exists multiple cuda at the same time. \n\nIf you want to set the CUDA_HOME permanently, store it using:\n\n```bash\necho 'export CUDA_HOME=/path/to/cuda' \u003e\u003e ~/.bashrc\n```\nafter that, source the bashrc file and check CUDA_HOME:\n```bash\nsource ~/.bashrc\necho $CUDA_HOME\n```\n\nIn this example, /path/to/cuda-11.3 should be replaced with the path where your CUDA toolkit is installed. You can find this by typing **which nvcc** in your terminal:\n\nFor instance, \nif the output is /usr/local/cuda/bin/nvcc, then:\n```bash\nexport CUDA_HOME=/usr/local/cuda\n```\n**Installation:**\n\n1.Clone the GroundingDINO repository from GitHub.\n\n```bash\ngit clone https://github.com/IDEA-Research/GroundingDINO.git\n```\n\n2. Change the current directory to the GroundingDINO folder.\n\n```bash\ncd GroundingDINO/\n```\n\n3. Install the required dependencies in the current directory.\n\n```bash\npip install -e .\n```\n\n4. Download pre-trained model weights.\n\n```bash\nmkdir weights\ncd weights\nwget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth\ncd ..\n```\n\n## :arrow_forward: Demo\nCheck your GPU ID (only if you're using a GPU)\n\n```bash\nnvidia-smi\n```\nReplace `{GPU ID}`, `image_you_want_to_detect.jpg`, and `\"dir you want to save the output\"` with appropriate values in the following command\n```bash\nCUDA_VISIBLE_DEVICES={GPU ID} python demo/inference_on_a_image.py \\\n-c groundingdino/config/GroundingDINO_SwinT_OGC.py \\\n-p weights/groundingdino_swint_ogc.pth \\\n-i image_you_want_to_detect.jpg \\\n-o \"dir you want to save the output\" \\\n-t \"chair\"\n [--cpu-only] # open it for cpu mode\n```\n\nIf you would like to specify the phrases to detect, here is a demo:\n```bash\nCUDA_VISIBLE_DEVICES={GPU ID} python demo/inference_on_a_image.py \\\n-c groundingdino/config/GroundingDINO_SwinT_OGC.py \\\n-p ./groundingdino_swint_ogc.pth \\\n-i .asset/cat_dog.jpeg \\\n-o logs/1111 \\\n-t \"There is a cat and a dog in the image .\" \\\n--token_spans \"[[[9, 10], [11, 14]], [[19, 20], [21, 24]]]\"\n [--cpu-only] # open it for cpu mode\n```\nThe token_spans specify the start and end positions of a phrases. For example, the first phrase is `[[9, 10], [11, 14]]`. `\"There is a cat and a dog in the image .\"[9:10] = 'a'`, `\"There is a cat and a dog in the image .\"[11:14] = 'cat'`. Hence it refers to the phrase `a cat` . Similarly, the `[[19, 20], [21, 24]]` refers to the phrase `a dog`.\n\nSee the `demo/inference_on_a_image.py` for more details.\n\n**Running with Python:**\n\n```python\nfrom groundingdino.util.inference import load_model, load_image, predict, annotate\nimport cv2\n\nmodel = load_model(\"groundingdino/config/GroundingDINO_SwinT_OGC.py\", \"weights/groundingdino_swint_ogc.pth\")\nIMAGE_PATH = \"weights/dog-3.jpeg\"\nTEXT_PROMPT = \"chair . person . dog .\"\nBOX_TRESHOLD = 0.35\nTEXT_TRESHOLD = 0.25\n\nimage_source, image = load_image(IMAGE_PATH)\n\nboxes, logits, phrases = predict(\n    model=model,\n    image=image,\n    caption=TEXT_PROMPT,\n    box_threshold=BOX_TRESHOLD,\n    text_threshold=TEXT_TRESHOLD\n)\n\nannotated_frame = annotate(image_source=image_source, boxes=boxes, logits=logits, phrases=phrases)\ncv2.imwrite(\"annotated_image.jpg\", annotated_frame)\n```\n**Web UI**\n\nWe also provide a demo code to integrate Grounding DINO with Gradio Web UI. See the file `demo/gradio_app.py` for more details.\n\n**Notebooks**\n\n- We release [demos](demo/image_editing_with_groundingdino_gligen.ipynb) to combine [Grounding DINO](https://arxiv.org/abs/2303.05499) with [GLIGEN](https://github.com/gligen/GLIGEN)  for more controllable image editings.\n- We release [demos](demo/image_editing_with_groundingdino_stablediffusion.ipynb) to combine [Grounding DINO](https://arxiv.org/abs/2303.05499) with [Stable Diffusion](https://github.com/Stability-AI/StableDiffusion) for image editings.\n\n## COCO Zero-shot Evaluations\n\nWe provide an example to evaluate Grounding DINO zero-shot performance on COCO. The results should be **48.5**.\n\n```bash\nCUDA_VISIBLE_DEVICES=0 \\\npython demo/test_ap_on_coco.py \\\n -c groundingdino/config/GroundingDINO_SwinT_OGC.py \\\n -p weights/groundingdino_swint_ogc.pth \\\n --anno_path /path/to/annoataions/ie/instances_val2017.json \\\n --image_dir /path/to/imagedir/ie/val2017\n```\n\n\n## :luggage: Checkpoints\n\n\u003c!-- insert a table --\u003e\n\u003ctable\u003e\n  \u003cthead\u003e\n    \u003ctr style=\"text-align: right;\"\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003ename\u003c/th\u003e\n      \u003cth\u003ebackbone\u003c/th\u003e\n      \u003cth\u003eData\u003c/th\u003e\n      \u003cth\u003ebox AP on COCO\u003c/th\u003e\n      \u003cth\u003eCheckpoint\u003c/th\u003e\n      \u003cth\u003eConfig\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003cth\u003e1\u003c/th\u003e\n      \u003ctd\u003eGroundingDINO-T\u003c/td\u003e\n      \u003ctd\u003eSwin-T\u003c/td\u003e\n      \u003ctd\u003eO365,GoldG,Cap4M\u003c/td\u003e\n      \u003ctd\u003e48.4 (zero-shot) / 57.2 (fine-tune)\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth\"\u003eGitHub link\u003c/a\u003e | \u003ca href=\"https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/groundingdino_swint_ogc.pth\"\u003eHF link\u003c/a\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://github.com/IDEA-Research/GroundingDINO/blob/main/groundingdino/config/GroundingDINO_SwinT_OGC.py\"\u003elink\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e2\u003c/th\u003e\n      \u003ctd\u003eGroundingDINO-B\u003c/td\u003e\n      \u003ctd\u003eSwin-B\u003c/td\u003e\n      \u003ctd\u003eCOCO,O365,GoldG,Cap4M,OpenImage,ODinW-35,RefCOCO\u003c/td\u003e\n      \u003ctd\u003e56.7 \u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth\"\u003eGitHub link\u003c/a\u003e  | \u003ca href=\"https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/groundingdino_swinb_cogcoor.pth\"\u003eHF link\u003c/a\u003e \n      \u003ctd\u003e\u003ca href=\"https://github.com/IDEA-Research/GroundingDINO/blob/main/groundingdino/config/GroundingDINO_SwinB_cfg.py\"\u003elink\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n## :medal_military: Results\n\n\u003cdetails open\u003e\n\u003csummary\u003e\u003cfont size=\"4\"\u003e\nCOCO Object Detection Results\n\u003c/font\u003e\u003c/summary\u003e\n\u003cimg src=\".asset/COCO.png\" alt=\"COCO\" width=\"100%\"\u003e\n\u003c/details\u003e\n\n\u003cdetails open\u003e\n\u003csummary\u003e\u003cfont size=\"4\"\u003e\nODinW Object Detection Results\n\u003c/font\u003e\u003c/summary\u003e\n\u003cimg src=\".asset/ODinW.png\" alt=\"ODinW\" width=\"100%\"\u003e\n\u003c/details\u003e\n\n\u003cdetails open\u003e\n\u003csummary\u003e\u003cfont size=\"4\"\u003e\nMarrying Grounding DINO with \u003ca href=\"https://github.com/Stability-AI/StableDiffusion\"\u003eStable Diffusion\u003c/a\u003e for Image Editing\n\u003c/font\u003e\u003c/summary\u003e\nSee our example \u003ca href=\"https://github.com/IDEA-Research/GroundingDINO/blob/main/demo/image_editing_with_groundingdino_stablediffusion.ipynb\"\u003enotebook\u003c/a\u003e for more details.\n\u003cimg src=\".asset/GD_SD.png\" alt=\"GD_SD\" width=\"100%\"\u003e\n\u003c/details\u003e\n\n\n\u003cdetails open\u003e\n\u003csummary\u003e\u003cfont size=\"4\"\u003e\nMarrying Grounding DINO with \u003ca href=\"https://github.com/gligen/GLIGEN\"\u003eGLIGEN\u003c/a\u003e for more Detailed Image Editing.\n\u003c/font\u003e\u003c/summary\u003e\nSee our example \u003ca href=\"https://github.com/IDEA-Research/GroundingDINO/blob/main/demo/image_editing_with_groundingdino_gligen.ipynb\"\u003enotebook\u003c/a\u003e for more details.\n\u003cimg src=\".asset/GD_GLIGEN.png\" alt=\"GD_GLIGEN\" width=\"100%\"\u003e\n\u003c/details\u003e\n\n## :sauropod: Model: Grounding DINO\n\nIncludes: a text backbone, an image backbone, a feature enhancer, a language-guided query selection, and a cross-modality decoder.\n\n![arch](.asset/arch.png)\n\n\n## :hearts: Acknowledgement\n\nOur model is related to [DINO](https://github.com/IDEA-Research/DINO) and [GLIP](https://github.com/microsoft/GLIP). Thanks for their great work!\n\nWe also thank great previous work including DETR, Deformable DETR, SMCA, Conditional DETR, Anchor DETR, Dynamic DETR, DAB-DETR, DN-DETR, etc. More related work are available at [Awesome Detection Transformer](https://github.com/IDEACVR/awesome-detection-transformer). A new toolbox [detrex](https://github.com/IDEA-Research/detrex) is available as well.\n\nThanks [Stable Diffusion](https://github.com/Stability-AI/StableDiffusion) and [GLIGEN](https://github.com/gligen/GLIGEN) for their awesome models.\n\n\n## :black_nib: Citation\n\nIf you find our work helpful for your research, please consider citing the following BibTeX entry.   \n\n```bibtex\n@article{liu2023grounding,\n  title={Grounding dino: Marrying dino with grounded pre-training for open-set object detection},\n  author={Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others},\n  journal={arXiv preprint arXiv:2303.05499},\n  year={2023}\n}\n```\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FIDEA-Research%2FGroundingDINO","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FIDEA-Research%2FGroundingDINO","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FIDEA-Research%2FGroundingDINO/lists"}