{"id":15030643,"url":"https://github.com/om-ai-lab/omdet","last_synced_at":"2025-04-13T12:40:09.238Z","repository":{"id":229583982,"uuid":"770489628","full_name":"om-ai-lab/OmDet","owner":"om-ai-lab","description":"Real-time and accurate open-vocabulary end-to-end object detection","archived":false,"fork":false,"pushed_at":"2024-12-18T08:04:07.000Z","size":10220,"stargazers_count":1310,"open_issues_count":5,"forks_count":111,"subscribers_count":70,"default_branch":"main","last_synced_at":"2025-04-06T09:01:31.975Z","etag":null,"topics":["coco","computer-vision","lvis","object-detection","open-vocabulary","real-time","vision-and-language","zero-shot","zero-shot-object-detection"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/om-ai-lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-11T16:34:47.000Z","updated_at":"2025-04-02T13:03:26.000Z","dependencies_parsed_at":"2025-02-28T03:12:21.001Z","dependency_job_id":"495f5202-c5f0-43a0-a393-a87feeb16f87","html_url":"https://github.com/om-ai-lab/OmDet","commit_stats":{"total_commits":28,"total_committers":9,"mean_commits":3.111111111111111,"dds":0.6428571428571428,"last_synced_commit":"e15dbf9f77e7ec8dfb0d4128276af89cae5710ad"},"previous_names":["om-ai-lab/omdet"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/om-ai-lab%2FOmDet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/om-ai-lab%2FOmDet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/om-ai-lab%2FOmDet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/om-ai-lab%2FOmDet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/om-ai-lab","download_url":"https://codeload.github.com/om-ai-lab/OmDet/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248716748,"owners_count":21150383,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["coco","computer-vision","lvis","object-detection","open-vocabulary","real-time","vision-and-language","zero-shot","zero-shot-object-detection"],"created_at":"2024-09-24T20:13:56.568Z","updated_at":"2025-04-13T12:40:09.212Z","avatar_url":"https://github.com/om-ai-lab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OmDet-Turbo\n\n\u003cp align=\"center\"\u003e\n \u003ca href=\"https://arxiv.org/abs/2403.06892\"\u003e\u003cstrong\u003e [Paper 📄] \u003c/strong\u003e\u003c/a\u003e \u003ca href=\"https://huggingface.co/omlab/OmDet-Turbo_tiny_SWIN_T\"\u003e\u003cstrong\u003e [Model 🗂️] \u003c/strong\u003e\u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\nFast and accurate open-vocabulary end-to-end object detection\n\u003c/p\u003e\n\n***\n## 🗓️ Updates\n* 09/26/2024：OmDet-Turbo has been integrated into Transformers version 4.45.0. The code is available at [here](https://github.com/huggingface/transformers/tree/main/src/transformers/models/omdet_turbo), and the Hugging Face model is available at [here](https://huggingface.co/omlab/omdet-turbo-swin-tiny-hf).\n* 07/05/2024: Our new open-source project, [OmAget: A multimodal agent framework for solving complex tasks](https://github.com/om-ai-lab/OmAgent) is available !!! Additionally, OmDet has been seamlessly integrated as an OVD tool within it. Feel free to delve into our innovative multimodal agent framework. \n* 06/24/2024: Guidance for [converting OmDet-Turbo to ONNX](https://github.com/om-ai-lab/OmDet#:~:text=How%20To%20Export%20ONNX%20Model)\n* 03/25/2024: Inference code and a pretrained OmDet-Turbo-Tiny model released.\n* 03/12/2024: Github open-source project created\n\n***\n## 🔗 Related Works\nIf you are interested in our research, we welcome you to explore our other wonderful projects.\n\n🔆 [How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection](https://arxiv.org/abs/2308.13177)(AAAI24) \u0026nbsp;🏠[Github Repository](https://github.com/om-ai-lab/OVDEval/tree/main)\n\n🔆 [OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network](https://ietresearch.onlinelibrary.wiley.com/doi/full/10.1049/cvi2.12268)(IET Computer Vision)\n\n***\n## 📖 Introduction\nThis repository is the official PyTorch implementation for **OmDet-Turbo**, a fast transformer-based open-vocabulary object detection model.\n\n**⭐️Highlights**\n1. **OmDet-Turbo** is a transformer-based real-time open-vocabulary\ndetector that combines strong OVD capabilities with fast inference speed.\nThis model addresses the challenges of efficient detection in open-vocabulary\nscenarios while maintaining high detection performance.\n2. We introduce the **Efficient Fusion Head**, a swift multimodal fusion module\ndesigned to alleviate the computational burden on the encoder and reduce\nthe time consumption of the head with ROI. \n3. OmDet-Turbo-Base model, achieves state-of-the-art zero-shot performance on the ODinW and OVDEval datasets, with AP scores\nof **30.1** and **26.86**, respectively. \n4. The inference speed of OmDetTurbo-Base on the COCO val2017 dataset reach **100.2** FPS on an A100 GPU.\n\nFor more details, check out our paper **[Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head](https://arxiv.org/abs/2403.06892)**\n\u003cimg src=\"docs/turbo_model.jpeg\" alt=\"model_structure\" width=\"100%\"\u003e\n\n\n***\n## ⚡️ Inference Speed\nComparison of inference speeds for each component in tiny-size model.\n\u003cimg src=\"docs/speed_compare.jpeg\" alt=\"speed\" width=\"100%\"\u003e\n\n***\n## 🛠️ How To Install \nFollow the [Installation Instructions](install.md) to set up the environments for OmDet-Turbo\n\n***\n## 🚀 How To Run\n### Local Inference\n1. Download our pretrained model and the [CLIP](https://huggingface.co/omlab/OmDet-Turbo_tiny_SWIN_T/resolve/main/ViT-B-16.pt?download=true) checkpoints.\n2. Create a folder named **resources**, put downloaded models into this folder.\n3. Run **run_demo.py**, the images with predicted results will be saved at **./outputs** folder.\n### Run as a API Server\n1. Download our pretrained model and the [CLIP](https://huggingface.co/omlab/OmDet-Turbo_tiny_SWIN_T/resolve/main/ViT-B-16.pt?download=true) checkpoints.\n2. Create a folder named **resources**, put downloaded models into this folder.\n3. Run **run_wsgi.py**, the API server will be started at **http://host_ip:8000/inf_predict**, check **http://host_ip:8000/docs** to have a try.\n\nWe already added language cache while inferring with **run_demo.py**. For more details, please open and check **run_demo.py** scripts. \n\n\n***\n## ⚙️ How To Export ONNX Model\n1. Replace **OmDetV2Turbo** in **OmDet-Turbo_tiny_SWIN_T.yaml** with **OmDetV2TurboInfer**\n2. Run **export.py**, and the omdet.onnx will be exported.\n   \nIn the above example, post processing is not included in onnx model , and all input size are fixed. You can add more post processing and change the input size according to your needs.\n\n\n***\n## 📦 Model Zoo\nThe performance of COCO and LVIS are evaluated under zero-shot setting.\n\nModel | Backbone | Pre-Train Data  | COCO | LVIS | FPS (pytorch/trt) |Weight \n-- |--------|-----------------| -- | -- |-------------------| --\nOmDet-Turbo-Tiny| Swin-T | O365,GoldG | 42.5 | 30.3 | 21.5/140.0 |  [weight](https://huggingface.co/omlab/OmDet-Turbo_tiny_SWIN_T/tree/main)     \n\n***\n## 📝 Main Results\n\u003cimg src=\"docs/main_results.png\" alt=\"main_result\" width=\"100%\"\u003e\n\n***\n## Citation\nPlease consider citing our papers if you use our projects:\n\n```\n@article{zhao2024real,\n  title={Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head},\n  author={Zhao, Tiancheng and Liu, Peng and He, Xuan and Zhang, Lu and Lee, Kyusong},\n  journal={arXiv preprint arXiv:2403.06892},\n  year={2024}\n}\n```\n\n```\n@article{zhao2024omdet,\n  title={OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network},\n  author={Zhao, Tiancheng and Liu, Peng and Lee, Kyusong},\n  journal={IET Computer Vision},\n  year={2024},\n  publisher={Wiley Online Library}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fom-ai-lab%2Fomdet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fom-ai-lab%2Fomdet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fom-ai-lab%2Fomdet/lists"}