{"id":13443322,"url":"https://github.com/YuHengsss/YOLOV","last_synced_at":"2025-03-20T16:31:06.084Z","repository":{"id":57839882,"uuid":"526897031","full_name":"YuHengsss/YOLOV","owner":"YuHengsss","description":"This repo is an implementation of PyTorch version YOLOV Series","archived":false,"fork":false,"pushed_at":"2024-12-19T02:51:46.000Z","size":9528,"stargazers_count":336,"open_issues_count":50,"forks_count":48,"subscribers_count":7,"default_branch":"master","last_synced_at":"2024-12-19T03:30:14.421Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/YuHengsss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-08-20T10:41:40.000Z","updated_at":"2024-12-19T02:51:50.000Z","dependencies_parsed_at":"2024-10-28T05:10:45.336Z","dependency_job_id":null,"html_url":"https://github.com/YuHengsss/YOLOV","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YuHengsss%2FYOLOV","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YuHengsss%2FYOLOV/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YuHengsss%2FYOLOV/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YuHengsss%2FYOLOV/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/YuHengsss","download_url":"https://codeload.github.com/YuHengsss/YOLOV/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244649809,"owners_count":20487496,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T03:01:59.172Z","updated_at":"2025-03-20T16:31:06.078Z","avatar_url":"https://github.com/YuHengsss.png","language":"Python","funding_links":[],"categories":["Python","Frameworks","Object Detection Applications"],"sub_categories":[],"readme":"\n\n# YOLOV and YOLOV++ for video object detection.\n## Update\n* **` July. 30th, 2024`**:  The pre-print version of the YOLOV++ paper is now available on [Arxiv](https://arxiv.org/abs/2407.19650).\n* **` May. 8th, 2024`**:  We release code, log and weights for YOLOV++.\n* **` April. 21th, 2024`**:  Our enhanced model now achieves a 92.9 AP50(w.o post-processing) on the ImageNet VID dataset, thanks to a more robust backbone and algorithm improvements. It maintains a processing time of 26.5ms per image during batch inference on a 3090 GPU. Code release is forthcoming.\n\n\n## Introduction\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/practical-video-object-detection-via-feature/video-object-detection-on-imagenet-vid)](https://paperswithcode.com/sota/video-object-detection-on-imagenet-vid?p=practical-video-object-detection-via-feature)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/yolov-making-still-image-object-detectors/video-object-detection-on-imagenet-vid)](https://paperswithcode.com/sota/video-object-detection-on-imagenet-vid?p=yolov-making-still-image-object-detectors)\n\nYOLOV series are high performance video object detector.  Please refer to [YOLOV](https://arxiv.org/abs/2208.09686) and [YOLOV++](https://arxiv.org/abs/2407.19650) on Arxiv for more details.\n\nThis repo is an implementation of PyTorch version YOLOV and YOLOV++ based on [YOLOX](https://github.com/Megvii-BaseDetection/YOLOX).\n\n## YOLOX Pretain Models on ImageNet VID\n\n| Model            | size | mAP@50\u003csup\u003eval\u003cbr\u003e | Speed 2080Ti(batch size=1)\u003cbr\u003e(ms) | Speed 3090(batch size=32)\u003cbr\u003e(ms) |                                             weights                                              |\n|------------------|:----:|:------------------:|:----------------------------------:|:---------------------------------:|:------------------------------------------------------------------------------------------------:|\n| YOLOX-s          | 576  |        69.5        |                9.4                 |                1.4                |   [google](https://drive.google.com/file/d/1n8wkByqpHdrGy6z9fsoZpBtTa0I3JOcG/view?usp=sharing)   |\n| YOLOX-l          | 576  |        76.1        |                14.8                |                4.2                |   [google](https://drive.google.com/file/d/1rikaPCAHBBIugYUZYV1buyOIRG8xvGKB/view?usp=sharing)   |\n| YOLOX-x          | 576  |        77.8        |                20.4                |                 -                 |   [google](https://drive.google.com/file/d/1OH3hGj7RMfcinMKPESbfI7C5y_RrA3aF/view?usp=sharing)   |\n| YOLOX-SwinTiny   | 576  |        79.2        |                19.0                |                5.5                |[google](https://drive.google.com/file/d/1s1gKLXMX5Hwxkx7e9nZyzJ1oF9iPvEe1/view?usp=drive_link)   |\n| YOLOX-SwinBase   | 576  |        86.5        |                24.9                |               11.8                |[google](https://drive.google.com/drive/folders/1K5897iM2zzN4kcj8qdK3z_FtvW9f3kHN?usp=drive_link) |\n| YOLOX-FocalLarge | 576  |        89.7        |                42.2                |               25.7                |                                                -                                                 |\n\n\n\n## Main result in YOLOV++\n\n\u003cimg src=\"assets/v++_comparision.png\" width=\"500\" \u003e\n\n| Model                     | size | mAP@50\u003csup\u003eval\u003cbr\u003e | Speed 3090(batch size=32)\u003cbr\u003e(ms) |                                                                                                                                weights                                                                                                                                 | logs                                                                                          |\n|---------------------------|:----:|:------------------:|:---------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|-----------------------------------------------------------------------------------------------|\n| YOLOV++ s                 | 576  |        78.7        |                5.3                |                                                                                    [google](https://drive.google.com/file/d/1vlFlwyoRoo_qS2CkfTZE5iQ32MDoA1n4/view?usp=drive_link)                                                                                     | [link](https://drive.google.com/file/d/1wIA71zsNxAtDflPGxLTzrRDdKy0Zl1HZ/view?usp=drive_link) |\n| YOLOV++ l                 | 576  |        84.2        |                7.6                |                                                                                    [google](https://drive.google.com/file/d/1qb_abseRfOmRr8IiOuUSAlCUrBvUhdim/view?usp=drive_link)                                                                                     | -                                                                                             |\n| YOLOV++ SwinTiny          | 576  |        85.6        |                8.4                |                                                                                    [google](https://drive.google.com/file/d/1pCIWAK6cy-BHhDVywmPb1LuuQHzNXdT2/view?usp=drive_link)                                                                                     | [link](https://drive.google.com/file/d/1RmY0LW1sUil6WilvNq2hW1a4obw27531/view?usp=drive_link)                                                                                      |\n| YOLOV++ SwinBase          | 576  |        90.7        |               15.9                |                                                                                    [google](https://drive.google.com/file/d/1RGb499EBcSQjWDxu6KkvN4Tr1wSc6SHb/view?usp=drive_link)                                                                                     | [link](https://drive.google.com/file/d/10qGMScfy0BvmqSMLuTGRPRlZxqkNZ9GX/view?usp=drive_link)                                                                                      |\n| YOLOV++ FocalLarge        | 576  |        92.9        |               27.6                |                                                                                    [google](https://drive.google.com/file/d/11WT_GcZU7HHjWV4i9KoXHhh70zneraEE/view?usp=drive_link)                                                                                     | [link](https://huggingface.co/YuhengSSS/YOLOV/blob/main/V%2B%2B_FocalL.pth)                                                                                      |\n| YOLOV++ FocalLarge + Post | 576  |        93.2        |                 -                 |                                                                                                                                   -                                                                                                                                    |                                                                                      |\n\n\n## Main result in YOLOV\n\n\u003cimg src=\"assets/comparsion.jpg\" width=\"500\" \u003e\n\n| Model                                                                                                               | size | mAP@50\u003csup\u003eval\u003cbr\u003e | Speed 2080Ti(batch size=1)\u003cbr\u003e(ms) |                                           weights                                            |\n|---------------------------------------------------------------------------------------------------------------------|:----:|:------------------:|:----------------------------------:|:--------------------------------------------------------------------------------------------:|\n| YOLOV-s                                                                                                             | 576  |        77.3        |                11.3                | [google](https://drive.google.com/file/d/12X4dQw45aXVYgJjKAAAPk409FO3xValW/view?usp=sharing) |\n| YOLOV-l                                                                                                             | 576  |        83.6        |                16.4                | [google](https://drive.google.com/file/d/1qZ-3iPDlYx1OKe6zz_-n42ceijo_Ntx6/view?usp=sharing) |\n| YOLOV-x                                                                                                             | 576  |        85.5        |                22.7                | [google](https://drive.google.com/file/d/1OIozS-D9wbWA9pDFl5xoFw6XqEcYtzsJ/view?usp=sharing) |\n| YOLOV-x + [post](https://github.com/AlbertoSabater/Robust-and-efficient-post-processing-for-video-object-detection) | 576  |        87.5        |                 -                  |                                              -                                               |\n\n\n## TODO\n- [x] Finish Swin-Transformer based experiments.\n- [ ] Release updated code, model and log.\n\n## Quick Start\n\n\u003cdetails\u003e\n\u003csummary\u003eInstallation\u003c/summary\u003e\n\nInstall YOLOV from source.\n```shell\ngit clone git@github.com:YuHengsss/YOLOV.git\ncd YOLOV\n```\n\nCreate conda env.\n```shell\nconda create -n yolov python=3.7\n\nconda activate yolov\n\npip install -r requirements.txt\n\npip3 install -v -e .\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eDemo\u003c/summary\u003e\n\nStep1. Download a pretrained weights.\n\nStep2. Run yolov demos. For example:\n\n```shell\npython tools/vid_demo.py -f [path to your yolov exp files] -c [path to your yolov weights] --path /path/to/your/video --conf 0.25 --nms 0.5 --tsize 576 --save_result \n```\nFor online mode, exampled with yolov_l, you can run:\n\n```shell\npython tools/yolov_demo_online.py -f ./exp/yolov/yolov_l_online.py -c [path to your weights] --path /path/to/your/video --conf 0.25 --nms 0.5 --tsize 576 --save_result \n```\nFor yolox models, please use python tools/demo.py for inferencing.\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eReproduce our results on VID\u003c/summary\u003e\n\nStep1. Download datasets and weights:\n\nDownload ILSVRC2015 DET and ILSVRC2015 VID dataset from [IMAGENET](https://image-net.org/challenges/LSVRC/2015/2015-downloads) and organise them as follows:\n\n```shell\npath to your datasets/ILSVRC2015/\npath to your datasets/ILSVRC/\n```\n\nDownload our COCO-style annotations for [training](https://drive.google.com/file/d/1HhE4OAcc--CpjUj69JCRXzMvIRsR4ymM/view?usp=sharing), FGFA version training [annotation](https://drive.google.com/file/d/12ceMTsmwkCMCdjYSM268qYfQTQcCDYFU/view?usp=drive_link) and [video sequences](https://drive.google.com/file/d/1vJs8rLl_2oZOWCMJtk3a9ZJmdNn8cu-G/view?usp=sharing). Then, put them in these two directories:\n```shell\nYOLOV/annotations/vid_train_coco.json\nYOLOV/annotations/ILSVRC_FGFA_COCO.json\nYOLOV/yolox/data/dataset/train_seq.npy\n```\n\nChange the data_dir in exp files to [path to your datasets] and Download our weights.\n\nStep2. Generate predictions and convert them to IMDB style for evaluation.\n\n```shell\npython tools/val_to_imdb.py -f exps/yolov/yolov_x.py -c path to your weights/yolov_x.pth --fp16 --output_dir ./yolov_x.pkl\n```\nEvaluation process:\n```shell\npython tools/REPPM.py --repp_cfg ./tools/yolo_repp_cfg.json --predictions_file ./yolov_x.pkl --evaluate --annotations_filename ./annotations/annotations_val_ILSVRC.txt --path_dataset [path to your dataset] --store_imdb --store_coco  (--post)\n```\n(--post) indicates involving post-processing method. Then you will get:\n```shell\n{'mAP_total': 0.8758871720817065, 'mAP_slow': 0.9059275666099181, 'mAP_medium': 0.8691557352372217, 'mAP_fast': 0.7459511040452989}\n```\n\n  \n**Training example**\n```shell\npython tools/vid_train.py -f exps/yolov/yolov_s.py -c weights/yoloxs_vid.pth --fp16\n```\n**Roughly testing**\n```shell\npython tools/vid_eval.py -f exps/yolov/yolov_s.py -c weights/yolov_s.pth --tnum 500 --fp16\n```\ntnum indicates testing sequence number.\n\u003c/details\u003e\n\n\n## Annotation format\n\n\u003cdetails\u003e\n  \n\u003csummary\u003e \u003cb\u003eDetails\u003c/b\u003e \u003c/summary\u003e\n\n**Training base detector**\n\n\nThe train_coco.json is a COCO format annotation file. When trainig the base detector on your own dataset, try to convert the annotation to COCO format.\n\n**Training YOLOV Series**\n\n\nThe train_seq.npy and val_seq.npy files are numpy arrays of lists. They can be loaded using the following command:\n```shell\nnumpy.load('./yolox/data/datasets/train_seq.npy',allow_pickle=True)\n```\nEach list contains the paths to all images in a video. The specific annotations(xml annotation in VID dataset) are loaded via these image paths, refer to https://github.com/YuHengsss/YOLOV/blob/f5a57ddea2f3660875d6d75fc5fa2ddbb95028a7/yolox/data/datasets/vid.py#L125 for more details.\n\n\u003c/details\u003e\n\n\n## Training on Custom Datasets\n\n\u003cdetails\u003e\n\u003csummary\u003e \u003cb\u003eDetails\u003c/b\u003e \u003c/summary\u003e\n  \n1. Finetuing the base detector(YOLOX) on your custom dataset with COCO format annotation. You need to modify the YOLOX experiment file. For instance, the experiment file for the Imagenet VID dataset is modified as [this example](https://github.com/YuHengsss/YOLOV/blob/master/exps/swin_base/swin_tiny_vid.py). Initialized weights with COCO pretraining is essential for the performance, you can find these coco pretrained weights in YOLOX official repo (YOLOX-S~YOLOX-X) and this [huggingface repo](https://huggingface.co/YuhengSSS/YOLOV/tree/main) (YOLOX-SwinTiny and SwinBase). Take the Swin-Tiny on ImagenetVID dataset as an example, you may run the finetuning script as:\n\n   ```shell\n     python tools/train.py -f exps/swin_base/swin_tiny_vid.py -c [yolox_swintiny pretrained weights on COCO] -b [batch size] -d [your devices] --fp16\n   ```\n\n\n2. Construct your dataset in the COCO format. Here is a template for the dataset structure (sourced from [OVIS](https://songbai.site/ovis/)):\n    ```shell\n    {\n    \"info\" : info,\n    \"videos\" : [video],\n    \"annotations\" : [annotation] or None,\n    \"categories\" : [category],\n    }\n    video{\n        \"id\" : int,\n        \"width\" : int,\n        \"height\" : int,\n        \"length\" : int,\n        \"file_names\" : [file_name],\n    }\n    annotation{\n        \"id\" : int, \n        \"video_id\" : int, \n        \"category_id\" : int, \n        \"areas\" : [float or None], \n        \"bboxes\" : [[x,y,width,height] or None], \n        \"iscrowd\" : 0 or 1,\n    }\n    category{\n        \"id\" : int, \n        \"name\" : str, \n        \"supercategory\" : str,\n    }\n    ```\n\n    After preparing the COCO format dataset, we provide [code](https://github.com/YuHengsss/YOLOV/blob/8873e06cac9912c60c31ca2ef3061d0bfe5b2f36/yolox/data/datasets/ovis.py#L238) which converts the COCO format annotation for video object detection.\n    You can construct your experiment file for YOLOV such as [YOLOVs_OVIS](https://github.com/YuHengsss/YOLOV/blob/master/exps/yolov_ovis/yolovs_ovis_75_75_750.py). \n    For YOLOV++, please refer example in *exps/customed_example/v++_SwinTiny_example.py*, please config the OVIS in the *get_data_loader* and *get_eval_loader* according to your own dataset. \n    Remember to change the category information in the [evaluator](https://github.com/YuHengsss/YOLOV/blob/98ade28ce975291023be947b7d5d57b05f9600ba/yolox/evaluators/vid_evaluator_v2.py#L41).\n\n3. Initialize the YOLOV or YOLOV++ with finetuned weights obtained by Step 1. You may adjust the hyperparameters such as [proposal numbers](https://github.com/YuHengsss/YOLOV/blob/8873e06cac9912c60c31ca2ef3061d0bfe5b2f36/exps/yolov_ovis/yolovs_ovis_75_75_750.py#L56) according to your dataset for getting better performance:\n    \n     ```shell\n     python tools/vid_train.py -f exps/customed_example/v++_SwinTiny_example.py -c [path to your weights] --fp16\n     ```\n     Note that the batch size when training video detector is determined by the lframe and gframe, refer to this [line](https://github.com/YuHengsss/YOLOV/blob/98ade28ce975291023be947b7d5d57b05f9600ba/exps/yolov/yolov_base.py#L320). You can adjust the batch size according to your GPU memory. However, a very small batch size (\u003c4) may lead to poor performance.\n\u003c/details\u003e\n\n## Acknowledgements\n\n\u003cdetails\u003e\u003csummary\u003e \u003cb\u003eExpand\u003c/b\u003e \u003c/summary\u003e\n\n* [https://github.com/Megvii-BaseDetection/YOLOX](https://github.com/Megvii-BaseDetection/YOLOX)\n* [https://github.com/AlbertoSabater/Robust-and-efficient-post-processing-for-video-object-detection](https://github.com/AlbertoSabater/Robust-and-efficient-post-processing-for-video-object-detection)\n\u003c/details\u003e\n\n## Cite YOLOV and YOLOV++\nIf YOLOV series are helpful for your research, please cite the following paper:\n\n\n```latex\n\n@article{shi2024yolovpp,\n      title={Practical Video Object Detection via Feature Selection and Aggregation}, \n      author={Yuheng Shi and Tong Zhang and Xiaojie Guo},\n      journal={arXiv preprint arXiv:2407.19650},\n      year={2024},\n}\n\n@article{shi2022yolov,\n  title={YOLOV: Making Still Image Object Detectors Great at Video Object Detection},\n  author={Shi, Yuheng and Wang, Naiyan and Guo, Xiaojie},\n  journal={arXiv preprint arXiv:2208.09686},\n  year={2022}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FYuHengsss%2FYOLOV","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FYuHengsss%2FYOLOV","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FYuHengsss%2FYOLOV/lists"}