Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
https://github.com/YuHengsss/YOLOV

This repo is an implementation of PyTorch version YOLOV Series
https://github.com/YuHengsss/YOLOV
Last synced: 12 days ago
JSON representation
This repo is an implementation of PyTorch version YOLOV Series
Host: GitHub
URL: https://github.com/YuHengsss/YOLOV
Owner: YuHengsss
License: apache-2.0
Created: 2022-08-20T10:41:40.000Z (almost 2 years ago)
Default Branch: master
Last Pushed: 2024-05-20T03:11:40.000Z (about 1 month ago)
Last Synced: 2024-05-23T06:32:41.430Z (about 1 month ago)
Language: Python
Homepage:
Size: 9.07 MB
Stars: 265
Watchers: 6
Forks: 39
Open Issues: 21
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists

awesome-yolo-object-detection - YOLOV
awesome-video-object-detection - YOLOV
awesome-stars - YuHengsss/YOLOV - This repo is an implementation of PyTorch version YOLOV Series (Python)
README

        

# YOLOV and YOLOV++ for video object detection.

## Update

* **` April. 21th, 2024`**:  Our enhanced model now achieves a 92.9 AP50(w.o post-processing) on the ImageNet VID dataset, thanks to a more robust backbone and algorithm improvements. It maintains a processing time of 26.5ms per image during batch inference on a 3090 GPU. Code release is forthcoming.

* **` May. 8th, 2024`**:  We release code, log and weights for YOLOV++.

## Introduction

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/yolov-making-still-image-object-detectors/video-object-detection-on-imagenet-vid)](https://paperswithcode.com/sota/video-object-detection-on-imagenet-vid?p=yolov-making-still-image-object-detectors)

YOLOV series are high performance video object detector.  Please refer to [YOLOV](https://arxiv.org/abs/2208.09686) on Arxiv for more details. Paper for YOLOV++ will be released soon. 

This repo is an implementation of PyTorch version YOLOV and YOLOV++ based on [YOLOX](https://github.com/Megvii-BaseDetection/YOLOX).

## YOLOX Pretain Models on ImageNet VID

| Model            | size | mAP@50^{val
 | Speed 2080Ti(batch size=1)
(ms) | Speed 3090(batch size=32)
(ms) |                                             weights                                              |

|------------------|:----:|:------------------:|:----------------------------------:|:---------------------------------:|:------------------------------------------------------------------------------------------------:|

| YOLOX-s          | 576  |        69.5        |                9.4                 |                1.4                |   [google](https://drive.google.com/file/d/1n8wkByqpHdrGy6z9fsoZpBtTa0I3JOcG/view?usp=sharing)   |

| YOLOX-l          | 576  |        76.1        |                14.8                |                4.2                |   [google](https://drive.google.com/file/d/1rikaPCAHBBIugYUZYV1buyOIRG8xvGKB/view?usp=sharing)   |

| YOLOX-x          | 576  |        77.8        |                20.4                |                 -                 |   [google](https://drive.google.com/file/d/1OH3hGj7RMfcinMKPESbfI7C5y_RrA3aF/view?usp=sharing)   |

| YOLOX-SwinTiny   | 576  |        79.2        |                19.0                |                5.5                |[google](https://drive.google.com/file/d/1s1gKLXMX5Hwxkx7e9nZyzJ1oF9iPvEe1/view?usp=drive_link)   |

| YOLOX-SwinBase   | 576  |        86.5        |                24.9                |               11.8                |[google](https://drive.google.com/drive/folders/1K5897iM2zzN4kcj8qdK3z_FtvW9f3kHN?usp=drive_link) |

| YOLOX-FocalLarge | 576  |        89.7        |                42.2                |               25.7                |                                                -                                                 |}

## Main result in YOLOV++



| Model                     | size | mAP@50^{val
 | Speed 3090(batch size=32)
(ms) |                                                                                                                                weights                                                                                                                                 | logs                                                                                          |

|---------------------------|:----:|:------------------:|:---------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|-----------------------------------------------------------------------------------------------|

| YOLOV++ s                 | 576  |        78.7        |                5.3                |                                                                                    [google](https://drive.google.com/file/d/1vlFlwyoRoo_qS2CkfTZE5iQ32MDoA1n4/view?usp=drive_link)                                                                                     | [link](https://drive.google.com/file/d/1wIA71zsNxAtDflPGxLTzrRDdKy0Zl1HZ/view?usp=drive_link) |

| YOLOV++ l                 | 576  |        84.2        |                7.6                |                                                                                    [google](https://drive.google.com/file/d/1qb_abseRfOmRr8IiOuUSAlCUrBvUhdim/view?usp=drive_link)                                                                                     | -                                                                                             |

| YOLOV++ SwinTiny          | 576  |        85.6        |                8.4                |                                                                                    [google](https://drive.google.com/file/d/1pCIWAK6cy-BHhDVywmPb1LuuQHzNXdT2/view?usp=drive_link)                                                                                     | [link](https://drive.google.com/file/d/1RmY0LW1sUil6WilvNq2hW1a4obw27531/view?usp=drive_link)                                                                                      |

| YOLOV++ SwinBase          | 576  |        90.7        |               15.9                |                                                                                    [google](https://drive.google.com/file/d/1RGb499EBcSQjWDxu6KkvN4Tr1wSc6SHb/view?usp=drive_link)                                                                                     | [link](https://drive.google.com/file/d/10qGMScfy0BvmqSMLuTGRPRlZxqkNZ9GX/view?usp=drive_link)                                                                                      |

| YOLOV++ FocalLarge        | 576  |        92.9        |               27.6                |                                                                                    [google](https://drive.google.com/file/d/11WT_GcZU7HHjWV4i9KoXHhh70zneraEE/view?usp=drive_link)                                                                                     | [link](https://drive.google.com/file/d/1QvU9pvNLZF2_jrshEqx-Pryb14ayfGw7/view?usp=drive_link)                                                                                      |

| YOLOV++ FocalLarge + Post | 576  |        93.2        |                 -                 |                                                                                                                                   -                                                                                                                                    |                                                                                      |}

## Main result in YOLOV



| Model                                                                                                               | size | mAP@50^{val
 | Speed 2080Ti(batch size=1)
(ms) |                                           weights                                            |

|---------------------------------------------------------------------------------------------------------------------|:----:|:------------------:|:----------------------------------:|:--------------------------------------------------------------------------------------------:|

| YOLOV-s                                                                                                             | 576  |        77.3        |                11.3                | [google](https://drive.google.com/file/d/12X4dQw45aXVYgJjKAAAPk409FO3xValW/view?usp=sharing) |

| YOLOV-l                                                                                                             | 576  |        83.6        |                16.4                | [google](https://drive.google.com/file/d/1qZ-3iPDlYx1OKe6zz_-n42ceijo_Ntx6/view?usp=sharing) |

| YOLOV-x                                                                                                             | 576  |        85.5        |                22.7                | [google](https://drive.google.com/file/d/1OIozS-D9wbWA9pDFl5xoFw6XqEcYtzsJ/view?usp=sharing) |

| YOLOV-x + [post](https://github.com/AlbertoSabater/Robust-and-efficient-post-processing-for-video-object-detection) | 576  |        87.5        |                 -                  |                                              -                                               |}

## TODO

- [x] Finish Swin-Transformer based experiments.

- [ ] Release updated code, model and log.

## Quick Start

Installation

Install YOLOV from source.

```shell

git clone [email protected]:YuHengsss/YOLOV.git

cd YOLOV

```

Create conda env.

```shell

conda create -n yolov python=3.7

conda activate yolov

pip install -r requirements.txt

pip3 install -v -e .

```

Demo

Step1. Download a pretrained weights.

Step2. Run yolov demos. For example:

```shell

python tools/vid_demo.py -f [path to your yolov exp files] -c [path to your yolov weights] --path /path/to/your/video --conf 0.25 --nms 0.5 --tsize 576 --save_result 

```

For online mode, exampled with yolov_l, you can run:

```shell

python tools/yolov_demo_online.py -f ./exp/yolov/yolov_l_online.py -c [path to your weights] --path /path/to/your/video --conf 0.25 --nms 0.5 --tsize 576 --save_result 

```

For yolox models, please use python tools/demo.py for inferencing.

Reproduce our results on VID

Step1. Download datasets and weights:

Download ILSVRC2015 DET and ILSVRC2015 VID dataset from [IMAGENET](https://image-net.org/challenges/LSVRC/2015/2015-downloads) and organise them as follows:

```shell

path to your datasets/ILSVRC2015/

path to your datasets/ILSVRC/

```

Download our COCO-style annotations for [training](https://drive.google.com/file/d/1HhE4OAcc--CpjUj69JCRXzMvIRsR4ymM/view?usp=sharing), FGFA version training [annotation](https://drive.google.com/file/d/12ceMTsmwkCMCdjYSM268qYfQTQcCDYFU/view?usp=drive_link) and [video sequences](https://drive.google.com/file/d/1vJs8rLl_2oZOWCMJtk3a9ZJmdNn8cu-G/view?usp=sharing). Then, put them in these two directories:

```shell

YOLOV/annotations/vid_train_coco.json

YOLOV/annotations/ILSVRC_FGFA_COCO.json

YOLOV/yolox/data/dataset/train_seq.npy

```

Change the data_dir in exp files to [path to your datasets] and Download our weights.

Step2. Generate predictions and convert them to IMDB style for evaluation.

```shell

python tools/val_to_imdb.py -f exps/yolov/yolov_x.py -c path to your weights/yolov_x.pth --fp16 --output_dir ./yolov_x.pkl

```

Evaluation process:

```shell

python tools/REPPM.py --repp_cfg ./tools/yolo_repp_cfg.json --predictions_file ./yolov_x.pkl --evaluate --annotations_filename ./annotations/annotations_val_ILSVRC.txt --path_dataset [path to your dataset] --store_imdb --store_coco  (--post)

```

(--post) indicates involving post-processing method. Then you will get:

```shell

{'mAP_total': 0.8758871720817065, 'mAP_slow': 0.9059275666099181, 'mAP_medium': 0.8691557352372217, 'mAP_fast': 0.7459511040452989}

```

  

**Training example**

```shell

python tools/vid_train.py -f exps/yolov/yolov_s.py -c weights/yoloxs_vid.pth --fp16

```

**Roughly testing**

```shell

python tools/vid_eval.py -f exps/yolov/yolov_s.py -c weights/yolov_s.pth --tnum 500 --fp16

```

tnum indicates testing sequence number.

## Annotation format

  

 Details 

**Training base detector**

The train_coco.json is a COCO format annotation file. When trainig the base detector on your own dataset, try to convert the annotation to COCO format.

**Training YOLOV Series**

The train_seq.npy and val_seq.npy files are numpy arrays of lists. They can be loaded using the following command:

```shell

numpy.load('./yolox/data/datasets/train_seq.npy',allow_pickle=True)

```

Each list contains the paths to all images in a video. The specific annotations(xml annotation in VID dataset) are loaded via these image paths, refer to https://github.com/YuHengsss/YOLOV/blob/f5a57ddea2f3660875d6d75fc5fa2ddbb95028a7/yolox/data/datasets/vid.py#L125 for more details.

## Acknowledgements

 Expand 

* [https://github.com/Megvii-BaseDetection/YOLOX](https://github.com/Megvii-BaseDetection/YOLOX)

* [https://github.com/AlbertoSabater/Robust-and-efficient-post-processing-for-video-object-detection](https://github.com/AlbertoSabater/Robust-and-efficient-post-processing-for-video-object-detection)

## Cite YOLOV and YOLOV++

If YOLOV series are helpful for your research, please cite the following paper:

```latex

@article{shi2022yolov,

  title={YOLOV: Making Still Image Object Detectors Great at Video Object Detection},

  author={Shi, Yuheng and Wang, Naiyan and Guo, Xiaojie},

  journal={arXiv preprint arXiv:2208.09686},

  year={2022}

}

```