{"id":15033333,"url":"https://github.com/ailab-cvc/yolo-world","last_synced_at":"2025-05-12T11:17:34.374Z","repository":{"id":220051146,"uuid":"749596944","full_name":"AILab-CVC/YOLO-World","owner":"AILab-CVC","description":"[CVPR 2024] Real-Time Open-Vocabulary Object Detection","archived":false,"fork":false,"pushed_at":"2025-02-26T18:29:03.000Z","size":3345,"stargazers_count":5390,"open_issues_count":389,"forks_count":517,"subscribers_count":46,"default_branch":"master","last_synced_at":"2025-05-12T11:17:29.740Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://www.yoloworld.cc","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AILab-CVC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-29T02:04:07.000Z","updated_at":"2025-05-12T07:50:59.000Z","dependencies_parsed_at":"2024-02-03T13:23:06.025Z","dependency_job_id":"b4f23f86-58d0-4ec0-ac37-d0f783bce9cb","html_url":"https://github.com/AILab-CVC/YOLO-World","commit_stats":{"total_commits":121,"total_committers":22,"mean_commits":5.5,"dds":0.4132231404958677,"last_synced_commit":"b4fd87838d7f53adc0dbf5844313b92d9e3124c7"},"previous_names":["ailab-cvc/yolo-world"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FYOLO-World","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FYOLO-World/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FYOLO-World/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FYOLO-World/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AILab-CVC","download_url":"https://codeload.github.com/AILab-CVC/YOLO-World/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253726906,"owners_count":21954096,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-09-24T20:20:51.020Z","updated_at":"2025-05-12T11:17:34.351Z","avatar_url":"https://github.com/AILab-CVC.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"./assets/yolo_logo.png\" width=60%\u003e\n\u003cbr\u003e\n\u003ca href=\"https://scholar.google.com/citations?hl=zh-CN\u0026user=PH8rJHYAAAAJ\"\u003eTianheng Cheng\u003c/a\u003e\u003csup\u003e\u003cspan\u003e2,3,*\u003c/span\u003e\u003c/sup\u003e, \n\u003ca href=\"https://linsong.info/\"\u003eLin Song\u003c/a\u003e\u003csup\u003e\u003cspan\u003e1,📧,*\u003c/span\u003e\u003c/sup\u003e,\n\u003ca href=\"https://yxgeee.github.io/\"\u003eYixiao Ge\u003c/a\u003e\u003csup\u003e\u003cspan\u003e1,🌟,2\u003c/span\u003e\u003c/sup\u003e,\n\u003ca href=\"http://eic.hust.edu.cn/professor/liuwenyu/\"\u003e Wenyu Liu\u003c/a\u003e\u003csup\u003e\u003cspan\u003e3\u003c/span\u003e\u003c/sup\u003e,\n\u003ca href=\"https://xwcv.github.io/\"\u003eXinggang Wang\u003c/a\u003e\u003csup\u003e\u003cspan\u003e3,📧\u003c/span\u003e\u003c/sup\u003e,\n\u003ca href=\"https://scholar.google.com/citations?user=4oXBp9UAAAAJ\u0026hl=en\"\u003eYing Shan\u003c/a\u003e\u003csup\u003e\u003cspan\u003e1,2\u003c/span\u003e\u003c/sup\u003e\n\u003c/br\u003e\n\n\\* Equal contribution 🌟 Project lead 📧 Corresponding author\n\n\u003csup\u003e1\u003c/sup\u003e Tencent AI Lab,  \u003csup\u003e2\u003c/sup\u003e ARC Lab, Tencent PCG\n\u003csup\u003e3\u003c/sup\u003e Huazhong University of Science and Technology\n\u003cbr\u003e\n\u003cdiv\u003e\n\n[![arxiv paper](https://img.shields.io/badge/Project-Page-green)](https://wondervictor.github.io/)\n[![arxiv paper](https://img.shields.io/badge/arXiv-Paper-red)](https://arxiv.org/abs/2401.17270)\n\u003ca href=\"https://colab.research.google.com/github/AILab-CVC/YOLO-World/blob/master/inference.ipynb\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"\u003e\u003c/a\u003e\n[![demo](https://img.shields.io/badge/🤗HugginngFace-Spaces-orange)](https://huggingface.co/spaces/stevengrove/YOLO-World)\n[![Replicate](https://replicate.com/zsxkib/yolo-world/badge)](https://replicate.com/zsxkib/yolo-world)\n[![hfpaper](https://img.shields.io/badge/🤗HugginngFace-Paper-yellow)](https://huggingface.co/papers/2401.17270)\n[![license](https://img.shields.io/badge/License-GPLv3.0-blue)](LICENSE)\n[![yoloworldseg](https://img.shields.io/badge/YOLOWorldxEfficientSAM-🤗Spaces-orange)](https://huggingface.co/spaces/SkalskiP/YOLO-World)\n[![yologuide](https://img.shields.io/badge/📖Notebook-roboflow-purple)](https://supervision.roboflow.com/develop/notebooks/zero-shot-object-detection-with-yolo-world)\n[![deploy](https://media.roboflow.com/deploy.svg)](https://inference.roboflow.com/foundation/yolo_world/)\n\n\u003c/div\u003e\n\u003c/div\u003e\n\n## Notice\n\n**YOLO-World is still under active development!**\n\nWe recommend that everyone **use English to communicate on issues**, as this helps developers from around the world discuss, share experiences, and answer questions together.\n\nFor business licensing and other related inquiries, don't hesitate to contact `yixiaoge@tencent.com`.\n\n## 🔥 Updates \n`[2025-2-8]:` We release a new YOLO-World-V2.1, which includes new pre-trained weights and training code for image prompts. Please see the update [YOLO-World-V2.1-Blog](./docs/update_20250123.md) for details.\\\n`[2024-11-5]`: We update the `YOLO-World-Image` and you can try it at HuggingFace [YOLO-World-Image (Preview Version)](https://huggingface.co/spaces/wondervictor/YOLO-World-Image). It's a *preview* version and we are still improving it! Detailed documents about training and few-shot inference are coming soon.\\\n`[2024-7-8]`: YOLO-World now has been integrated into [ComfyUI](https://github.com/StevenGrove/ComfyUI-YOLOWorld)! Come and try adding YOLO-World to your workflow now! You can access it at [StevenGrove/ComfyUI-YOLOWorld](https://github.com/StevenGrove/ComfyUI-YOLOWorld)!  \n`[2024-5-18]:` YOLO-World models have been [integrated with the FiftyOne computer vision toolkit](https://docs.voxel51.com/integrations/ultralytics.html#open-vocabulary-detection) for streamlined open-vocabulary inference across image and video datasets.  \n`[2024-5-16]:` Hey guys! Long time no see! This update contains (1) [fine-tuning guide](https://github.com/AILab-CVC/YOLO-World?#highlights--introduction) and (2) [TFLite Export](./docs/tflite_deploy.md) with INT8 Quantization.  \n`[2024-5-9]:` This update contains the real [`reparameterization`](./docs/reparameterize.md) 🪄, and it's better for fine-tuning on custom datasets and improves the training/inference efficiency 🚀!  \n`[2024-4-28]:` Long time no see! This update contains bugfixs and improvements: (1) ONNX demo; (2) image demo (support tensor input); (2) new pre-trained models; (3) image prompts; (4) simple version for fine-tuning / deployment; (5) guide for installation (include a `requirements.txt`).  \n`[2024-3-28]:` We provide: (1) more high-resolution pre-trained models (e.g., S, M, X) ([#142](https://github.com/AILab-CVC/YOLO-World/issues/142)); (2) pre-trained models with CLIP-Large text encoders. Most importantly, we preliminarily fix the **fine-tuning without `mask-refine`** and explore a new fine-tuning setting ([#160](https://github.com/AILab-CVC/YOLO-World/issues/160),[#76](https://github.com/AILab-CVC/YOLO-World/issues/76)). In addition, fine-tuning YOLO-World with `mask-refine` also obtains significant improvements, check more details in [configs/finetune_coco](./configs/finetune_coco/).  \n`[2024-3-16]:` We fix the bugs about the demo ([#110](https://github.com/AILab-CVC/YOLO-World/issues/110),[#94](https://github.com/AILab-CVC/YOLO-World/issues/94),[#129](https://github.com/AILab-CVC/YOLO-World/issues/129), [#125](https://github.com/AILab-CVC/YOLO-World/issues/125)) with visualizations of segmentation masks, and release [**YOLO-World with Embeddings**](./docs/prompt_yolo_world.md), which supports prompt tuning, text prompts and image prompts.  \n`[2024-3-3]:` We add the **high-resolution YOLO-World**, which supports `1280x1280` resolution with higher accuracy and better performance for small objects!  \n`[2024-2-29]:` We release the newest version of [ **YOLO-World-v2**](./docs/updates.md) with higher accuracy and faster speed! We hope the community can join us to improve YOLO-World!  \n`[2024-2-28]:` Excited to announce that YOLO-World has been accepted by **CVPR 2024**! We're continuing to make YOLO-World faster and stronger, as well as making it better to use for all.  \n`[2024-2-22]:` We sincerely thank [RoboFlow](https://roboflow.com/) and [@Skalskip92](https://twitter.com/skalskip92) for the [**Video Guide**](https://www.youtube.com/watch?v=X7gKBGVz4vs) about YOLO-World, nice work!  \n`[2024-2-18]:` We thank [@Skalskip92](https://twitter.com/skalskip92) for developing the wonderful segmentation demo via connecting YOLO-World and EfficientSAM. You can try it now at the [🤗 HuggingFace Spaces](https://huggingface.co/spaces/SkalskiP/YOLO-World).   \n`[2024-2-17]:` The largest model **X** of YOLO-World is released, which achieves better zero-shot performance!   \n`[2024-2-17]:` We release the code \u0026 models for **YOLO-World-Seg** now! YOLO-World now supports open-vocabulary / zero-shot object segmentation!  \n`[2024-2-15]:` The pre-traind YOLO-World-L with CC3M-Lite is released!     \n`[2024-2-14]:` We provide the [`image_demo`](demo.py) for inference on images or directories.   \n`[2024-2-10]:` We provide the [fine-tuning](./docs/finetuning.md) and [data](./docs/data.md) details for fine-tuning YOLO-World on the COCO dataset or the custom datasets!  \n`[2024-2-3]:` We support the `Gradio` demo now in the repo and you can build the YOLO-World demo on your own device!  \n`[2024-2-1]:` We've released the code and weights of YOLO-World now!  \n`[2024-2-1]:` We deploy the YOLO-World demo on [HuggingFace 🤗](https://huggingface.co/spaces/stevengrove/YOLO-World), you can try it now!  \n`[2024-1-31]:` We are excited to launch **YOLO-World**, a cutting-edge real-time open-vocabulary object detector.  \n\n\n## TODO\n\nYOLO-World is under active development and please stay tuned ☕️! \nIf you have suggestions📃 or ideas💡,**we would love for you to bring them up in the [Roadmap](https://github.com/AILab-CVC/YOLO-World/issues/109)** ❤️!\n\u003e YOLO-World 目前正在积极开发中📃，如果你有建议或者想法💡，**我们非常希望您在 [Roadmap](https://github.com/AILab-CVC/YOLO-World/issues/109) 中提出来** ❤️！\n\n## [FAQ (Frequently Asked Questions)](https://github.com/AILab-CVC/YOLO-World/discussions/149)\n\nWe have set up an FAQ about YOLO-World in the discussion on GitHub. We hope everyone can raise issues or solutions during use here, and we also hope that everyone can quickly find solutions from it.\n\n\u003e 我们在GitHub的discussion中建立了关于YOLO-World的常见问答，这里将收集一些常见问题，同时大家可以在此提出使用中的问题或者解决方案，也希望大家能够从中快速寻找到解决方案\n\n\n## Highlights \u0026 Introduction\n\nThis repo contains the PyTorch implementation, pre-trained weights, and pre-training/fine-tuning code for YOLO-World.\n\n* YOLO-World is pre-trained on large-scale datasets, including detection, grounding, and image-text datasets.\n\n* YOLO-World is the next-generation YOLO detector, with a strong open-vocabulary detection capability and grounding ability.\n\n* YOLO-World presents a *prompt-then-detect* paradigm for efficient user-vocabulary inference, which re-parameterizes vocabulary embeddings as parameters into the model and achieve superior inference speed. You can try to export your own detection model without extra training or fine-tuning in our [online demo](https://huggingface.co/spaces/stevengrove/YOLO-World)!\n\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg width=800px src=\"./assets/yolo_arch.png\"\u003e\n\u003c/div\u003e\n\n### Zero-shot Evaluation Results for Pre-trained Models\nWe evaluate all YOLO-World-V2.1 models on LVIS, LVIS-mini, and COCO in the zero-shot manner, and compare with the previous version (the improvements are annotated in the superscripts).\n\n\u003ctable\u003e\n    \u003ctr\u003e\n        \u003cth rowspan=\"2\"\u003eModel\u003c/th\u003e\u003cth rowspan=\"2\"\u003eResolution\u003c/th\u003e\u003cth colspan=\"4\" style=\"border-right: 1px solid\"\u003eLVIS AP\u003c/th\u003e\u003cth colspan=\"4\"\u003eLVIS-mini\u003c/th\u003e\u003cth colspan=\"4\" style=\"border-left: 1px solid\"\u003eCOCO\u003c/th\u003e\n    \u003c/tr\u003e\n        \u003ctd\u003eAP\u003c/td\u003e\u003ctd\u003eAP\u003csub\u003er\u003c/sub\u003e\u003c/td\u003e\u003ctd\u003eAP\u003csub\u003ec\u003c/sub\u003e\u003c/td\u003e\u003ctd style=\"border-right: 1px solid\"\u003eAP\u003csub\u003ef\u003c/sub\u003e\u003c/td\u003e\u003ctd\u003eAP\u003c/td\u003e\u003ctd\u003eAP\u003csub\u003er\u003c/sub\u003e\u003c/td\u003e\u003ctd\u003eAP\u003csub\u003ec\u003c/sub\u003e\u003c/td\u003e\u003ctd\u003eAP\u003csub\u003ef\u003c/sub\u003e\u003c/td\u003e\u003ctd style=\"border-left: 1px solid\"\u003eAP\u003c/td\u003e\u003ctd\u003eAP\u003csub\u003e50\u003c/sub\u003e\u003c/td\u003e\u003ctd\u003eAP\u003csub\u003e75\u003c/sub\u003e\u003c/td\u003e\n    \u003ctr style=\"border-top: 2px solid\"\u003e\n        \u003ctd\u003eYOLO-World-S\u003c/td\u003e\u003ctd\u003e640\u003c/td\u003e\u003ctd\u003e18.5\u003csup\u003e+1.2\u003c/sup\u003e\u003c/td\u003e\u003ctd\u003e12.6\u003c/td\u003e\u003ctd\u003e15.8\u003c/td\u003e\u003ctd style=\"border-right: 1px solid\"\u003e24.1\u003c/td\u003e\u003ctd\u003e23.6\u003csup\u003e+0.9\u003c/sup\u003e\u003c/td\u003e\u003ctd\u003e16.4\u003c/td\u003e\u003ctd\u003e21.5\u003c/td\u003e\u003ctd\u003e26.6\u003c/td\u003e\u003ctd style=\"border-left: 1px solid\"\u003e36.6\u003c/td\u003e\u003ctd\u003e51.0\u003c/td\u003e\u003ctd\u003e39.7\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eYOLO-World-S\u003c/td\u003e\u003ctd\u003e1280\u003c/td\u003e\u003ctd\u003e19.7\u003csup\u003e+0.9\u003c/sup\u003e\u003c/td\u003e\u003ctd\u003e13.5\u003c/td\u003e\u003ctd\u003e16.3\u003c/td\u003e\u003ctd style=\"border-right: 1px solid\"\u003e26.3\u003c/td\u003e\u003ctd\u003e25.5\u003csup\u003e+1.4\u003c/sup\u003e\u003c/td\u003e\u003ctd\u003e19.1\u003c/td\u003e\u003ctd\u003e22.6\u003c/td\u003e\u003ctd\u003e29.3\u003c/td\u003e\u003ctd style=\"border-left: 1px solid\"\u003e38.2\u003c/td\u003e\u003ctd\u003e54.2\u003c/td\u003e\u003ctd\u003e41.6\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr style=\"border-top: 2px solid\"\u003e\n        \u003ctd\u003eYOLO-World-M\u003c/td\u003e\u003ctd\u003e640\u003c/td\u003e\u003ctd\u003e24.1\u003csup\u003e+0.6\u003c/sup\u003e\u003c/td\u003e\u003ctd\u003e16.9\u003c/td\u003e\u003ctd\u003e21.1\u003c/td\u003e\u003ctd style=\"border-right: 1px solid\"\u003e30.6\u003c/td\u003e\u003ctd\u003e30.6\u003csup\u003e+0.6\u003c/sup\u003e\u003c/td\u003e\u003ctd\u003e19.7\u003c/td\u003e\u003ctd\u003e29.0\u003c/td\u003e\u003ctd\u003e34.1\u003c/td\u003e\u003ctd style=\"border-left: 1px solid\"\u003e43.0\u003c/td\u003e\u003ctd\u003e58.6\u003c/td\u003e\u003ctd\u003e46.7\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eYOLO-World-M\u003c/td\u003e\u003ctd\u003e1280\u003c/td\u003e\u003ctd\u003e26.0\u003csup\u003e+0.7\u003c/sup\u003e\u003c/td\u003e\u003ctd\u003e19.9\u003c/td\u003e\u003ctd\u003e22.5\u003c/td\u003e\u003ctd style=\"border-right: 1px solid\"\u003e32.7\u003c/td\u003e\u003ctd\u003e32.7\u003csup\u003e+1.1\u003c/sup\u003e\u003c/td\u003e\u003ctd\u003e24.4\u003c/td\u003e\u003ctd\u003e30.2\u003c/td\u003e\u003ctd\u003e36.4\u003c/td\u003e\u003ctd style=\"border-left: 1px solid\"\u003e43.8\u003c/td\u003e\u003ctd\u003e60.3\u003c/td\u003e\u003ctd\u003e47.7\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr style=\"border-top: 2px solid\"\u003e\n        \u003ctd\u003eYOLO-World-L\u003c/td\u003e\u003ctd\u003e640\u003c/td\u003e\u003ctd\u003e26.8\u003csup\u003e+0.7\u003c/sup\u003e\u003c/td\u003e\u003ctd\u003e19.8\u003c/td\u003e\u003ctd\u003e23.6\u003c/td\u003e\u003ctd style=\"border-right: 1px solid\"\u003e33.4\u003c/td\u003e\u003ctd\u003e33.8\u003csup\u003e+0.9\u003c/sup\u003e\u003c/td\u003e\u003ctd\u003e24.5\u003c/td\u003e\u003ctd\u003e32.3\u003c/td\u003e\u003ctd\u003e36.8\u003c/td\u003e\u003ctd style=\"border-left: 1px solid\"\u003e44.9\u003c/td\u003e\u003ctd\u003e60.4\u003c/td\u003e\u003ctd\u003e48.9\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eYOLO-World-L\u003c/td\u003e\u003ctd\u003e800\u003c/td\u003e\u003ctd\u003e28.3\u003c/td\u003e\u003ctd\u003e22.5\u003c/td\u003e\u003ctd\u003e24.4\u003c/td\u003e\u003ctd style=\"border-right: 1px solid\"\u003e35.1\u003c/td\u003e\u003ctd\u003e35.2\u003c/td\u003e\u003ctd\u003e27.8\u003c/td\u003e\u003ctd\u003e32.6\u003c/td\u003e\u003ctd\u003e38.8\u003c/td\u003e\u003ctd style=\"border-left: 1px solid\"\u003e47.4\u003c/td\u003e\u003ctd\u003e63.3\u003c/td\u003e\u003ctd\u003e51.8\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eYOLO-World-L\u003c/td\u003e\u003ctd\u003e1280\u003c/td\u003e\u003ctd\u003e28.7\u003csup\u003e+1.1\u003c/sup\u003e\u003c/td\u003e\u003ctd\u003e22.9\u003c/td\u003e\u003ctd\u003e24.9\u003c/td\u003e\u003ctd style=\"border-right: 1px solid\"\u003e35.4\u003c/td\u003e\u003ctd\u003e35.5\u003csup\u003e+1.2\u003c/sup\u003e\u003c/td\u003e\u003ctd\u003e24.4\u003c/td\u003e\u003ctd\u003e34.0\u003c/td\u003e\u003ctd\u003e38.8\u003c/td\u003e\u003ctd style=\"border-left: 1px solid\"\u003e46.0\u003c/td\u003e\u003ctd\u003e62.5\u003c/td\u003e\u003ctd\u003e50.0\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr style=\"border-top: 2px solid\"\u003e\n        \u003ctd\u003eYOLO-World-X\u003c/td\u003e\u003ctd\u003e640\u003c/td\u003e\u003ctd\u003e28.6\u003csup\u003e+0.2\u003c/sup\u003e\u003c/td\u003e\u003ctd\u003e22.0\u003c/td\u003e\u003ctd\u003e25.6\u003c/td\u003e\u003ctd style=\"border-right: 1px solid\"\u003e34.9\u003c/td\u003e\u003ctd\u003e35.8\u003csup\u003e+0.4\u003c/sup\u003e\u003c/td\u003e\u003ctd\u003e31.0\u003c/td\u003e\u003ctd\u003e33.7\u003c/td\u003e\u003ctd\u003e38.5\u003c/td\u003e\u003ctd style=\"border-left: 1px solid\"\u003e46.7\u003c/td\u003e\u003ctd\u003e62.5\u003c/td\u003e\u003ctd\u003e51.0\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd colspan=\"13\"\u003eYOLO-World-X-1280 is coming soon.\u003c/td\u003e\n    \u003c/tr\u003e\n\u003c/table\u003e\n\n### Model Card\n\n\u003ctable\u003e\n    \u003ctr\u003e\n        \u003cth\u003eModel\u003c/th\u003e\u003cth\u003eResolution\u003c/th\u003e\u003cth\u003eTraining\u003c/th\u003e\u003cth\u003eData\u003c/th\u003e\u003cth\u003eModel Weights\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr style=\"border-top: 2px solid\"\u003e\n        \u003ctd\u003eYOLO-World-S\u003c/td\u003e\u003ctd\u003e640\u003c/td\u003e\u003ctd\u003ePT (100e)\u003c/td\u003e\u003ctd\u003eO365v1+GoldG+CC-LiteV2\u003c/td\u003e\u003ctd\u003e\u003ca href=\"https://huggingface.co/wondervictor/YOLO-World-V2.1/resolve/main/x_stage1-62b674ad.pth\"\u003e 🤗 HuggingFace\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eYOLO-World-S\u003c/td\u003e\u003ctd\u003e1280\u003c/td\u003e\u003ctd\u003eCPT (40e)\u003c/td\u003e\u003ctd\u003eO365v1+GoldG+CC-LiteV2\u003c/td\u003e\u003ctd\u003e\u003ca href=\"https://huggingface.co/wondervictor/YOLO-World-V2.1/resolve/main/s_stage2-4466ab94.pth\"\u003e 🤗 HuggingFace\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr style=\"border-top: 2px solid\"\u003e\n        \u003ctd\u003eYOLO-World-M\u003c/td\u003e\u003ctd\u003e640\u003c/td\u003e\u003ctd\u003ePT (100e)\u003c/td\u003e\u003ctd\u003eO365v1+GoldG+CC-LiteV2\u003c/td\u003e\u003ctd\u003e\u003ca href=\"https://huggingface.co/wondervictor/YOLO-World-V2.1/resolve/main/m_stage1-7e1e5299.pth\"\u003e 🤗 HuggingFace\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eYOLO-World-M\u003c/td\u003e\u003ctd\u003e1280\u003c/td\u003e\u003ctd\u003eCPT (40e)\u003c/td\u003e\u003ctd\u003eO365v1+GoldG+CC-LiteV2\u003c/td\u003e\u003ctd\u003e\u003ca href=\"https://huggingface.co/wondervictor/YOLO-World-V2.1/resolve/main/m_stage2-9987dcb1.pth\"\u003e 🤗 HuggingFace\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr style=\"border-top: 2px solid\"\u003e\n        \u003ctd\u003eYOLO-World-L\u003c/td\u003e\u003ctd\u003e640\u003c/td\u003e\u003ctd\u003ePT (100e)\u003c/td\u003e\u003ctd\u003eO365v1+GoldG+CC-LiteV2\u003c/td\u003e\u003ctd\u003e\u003ca href=\"https://huggingface.co/wondervictor/YOLO-World-V2.1/resolve/main/l_stage1-7d280586.pth\"\u003e 🤗 HuggingFace\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eYOLO-World-L\u003c/td\u003e\u003ctd\u003e800 / 1280\u003c/td\u003e\u003ctd\u003eCPT (40e)\u003c/td\u003e\u003ctd\u003eO365v1+GoldG+CC-LiteV2\u003c/td\u003e\u003ctd\u003e\u003ca href=\"https://huggingface.co/wondervictor/YOLO-World-V2.1/resolve/main/l_stage2-b3e3dc3f.pth\"\u003e 🤗 HuggingFace\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr style=\"border-top: 2px solid\"\u003e\n        \u003ctd\u003eYOLO-World-X\u003c/td\u003e\u003ctd\u003e640\u003c/td\u003e\u003ctd\u003ePT (100e)\u003c/td\u003e\u003ctd\u003eO365v1+GoldG+CC-LiteV2\u003c/td\u003e\u003ctd\u003e\u003ca href=\"https://huggingface.co/wondervictor/YOLO-World-V2.1/resolve/main/x_stage1-62b674ad.pth\"\u003e 🤗 HuggingFace\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n\u003c/table\u003e\n\n**Notes:**\n* PT: Pre-training, CPT: continuing pre-training\n* CC-LiteV2: the newly-annotated CC3M subset, including 250k images.\n\n\n## Getting started\n\n### 1. Installation\n\nYOLO-World is developed based on `torch==1.11.0` `mmyolo==0.6.0` and `mmdetection==3.0.0`. Check more details about `requirements` and `mmcv` in [docs/installation](./docs/installation.md).\n\n#### Clone Project \n\n```bash\ngit clone --recursive https://github.com/AILab-CVC/YOLO-World.git\n```\n#### Install\n\n```bash\npip install torch wheel -q\npip install -e .\n```\n\n### 2. Preparing Data\n\nWe provide the details about the pre-training data in [docs/data](./docs/data.md).\n\n\n## Training \u0026 Evaluation\n\nWe adopt the default [training](./tools/train.py) or [evaluation](./tools/test.py) scripts of [mmyolo](https://github.com/open-mmlab/mmyolo).\nWe provide the configs for pre-training and fine-tuning in `configs/pretrain` and `configs/finetune_coco`.\nTraining YOLO-World is easy:\n\n```bash\nchmod +x tools/dist_train.sh\n# sample command for pre-training, use AMP for mixed-precision training\n./tools/dist_train.sh configs/pretrain/yolo_world_l_t2i_bn_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py 8 --amp\n```\n**NOTE:** YOLO-World is pre-trained on 4 nodes with 8 GPUs per node (32 GPUs in total). For pre-training, the `node_rank` and `nnodes` for multi-node training should be specified. \n\nEvaluating YOLO-World is also easy:\n\n```bash\nchmod +x tools/dist_test.sh\n./tools/dist_test.sh path/to/config path/to/weights 8\n```\n\n**NOTE:** We mainly evaluate the performance on LVIS-minival for pre-training.\n\n## Fine-tuning YOLO-World\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"./assets/finetune_yoloworld.png\" width=800px\u003e\n\u003c/div\u003e\n\n\n\u003cdiv align=\"center\"\u003e\n\u003cb\u003e\u003cp\u003eChose your pre-trained YOLO-World and Fine-tune it!\u003c/p\u003e\u003c/b\u003e \n\u003c/div\u003e\n\n\nYOLO-World supports **zero-shot inference**, and three types of **fine-tuning recipes**: **(1) normal fine-tuning**, **(2) prompt tuning**, and **(3) reparameterized fine-tuning**.\n\n* Normal Fine-tuning: we provide the details about fine-tuning YOLO-World in [docs/fine-tuning](./docs/finetuning.md).\n\n* Prompt Tuning: we provide more details ahout prompt tuning in [docs/prompt_yolo_world](./docs/prompt_yolo_world.md).\n\n* Reparameterized Fine-tuning: the reparameterized YOLO-World is more suitable for specific domains far from generic scenes. You can find more details in [docs/reparameterize](./docs/reparameterize.md).\n\n## Deployment\n\nWe provide the details about deployment for downstream applications in [docs/deployment](./docs/deploy.md).\nYou can directly download the ONNX model through the online [demo](https://huggingface.co/spaces/stevengrove/YOLO-World) in Huggingface Spaces 🤗.\n\n- [x] ONNX export and demo: [docs/deploy](https://github.com/AILab-CVC/YOLO-World/blob/master/docs/deploy.md)\n- [x] TFLite and INT8 Quantization: [docs/tflite_deploy](https://github.com/AILab-CVC/YOLO-World/blob/master/docs/tflite_deploy.md)\n- [ ] TensorRT: coming soon.\n- [ ] C++: coming soon.\n\n## Demo\n\nSee [`demo`](./demo) for more details\n\n- [x] `gradio_demo.py`: Gradio demo, ONNX export\n- [x] `image_demo.py`: inference with images or a directory of images\n- [x] `simple_demo.py`: a simple demo of YOLO-World, using `array` (instead of path as input).\n- [x] `video_demo.py`: inference YOLO-World on videos.\n- [x] `inference.ipynb`: jupyter notebook for YOLO-World.\n- [x] [Google Colab Notebook](https://colab.research.google.com/drive/1F_7S5lSaFM06irBCZqjhbN7MpUXo6WwO?usp=sharing): We sincerely thank [Onuralp](https://github.com/onuralpszr) for sharing the [Colab Demo](https://colab.research.google.com/drive/1F_7S5lSaFM06irBCZqjhbN7MpUXo6WwO?usp=sharing), you can have a try 😊！\n\n## Acknowledgement\n\nWe sincerely thank [mmyolo](https://github.com/open-mmlab/mmyolo), [mmdetection](https://github.com/open-mmlab/mmdetection), [GLIP](https://github.com/microsoft/GLIP), and [transformers](https://github.com/huggingface/transformers) for providing their wonderful code to the community!\n\n## Citations\nIf you find YOLO-World is useful in your research or applications, please consider giving us a star 🌟 and citing it.\n\n```bibtex\n@inproceedings{Cheng2024YOLOWorld,\n  title={YOLO-World: Real-Time Open-Vocabulary Object Detection},\n  author={Cheng, Tianheng and Song, Lin and Ge, Yixiao and Liu, Wenyu and Wang, Xinggang and Shan, Ying},\n  booktitle={Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR)},\n  year={2024}\n}\n```\n\n## Licence\nYOLO-World is under the GPL-v3 Licence and is supported for commercial usage. If you need a commercial license for YOLO-World, please feel free to contact us.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Failab-cvc%2Fyolo-world","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Failab-cvc%2Fyolo-world","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Failab-cvc%2Fyolo-world/lists"}