Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/dog-qiuqiu/Yolo-Fastest

:zap: Based on yolo's ultra-lightweight universal target detection algorithm, the calculation amount is only 250mflops, the ncnn model size is only 666kb, the Raspberry Pi 3b can run up to 15fps+, and the mobile terminal can run up to 178fps+
https://github.com/dog-qiuqiu/Yolo-Fastest

Last synced: 12 days ago
JSON representation

Host: GitHub
URL: https://github.com/dog-qiuqiu/Yolo-Fastest
Owner: dog-qiuqiu
License: other
Created: 2020-08-21T03:37:01.000Z (almost 4 years ago)
Default Branch: master
Last Pushed: 2021-08-11T16:55:55.000Z (almost 3 years ago)
Last Synced: 2024-02-29T08:35:37.205Z (4 months ago)
Language: C
Homepage:
Size: 37.7 MB
Stars: 1,929
Watchers: 50
Forks: 425
Open Issues: 55
Metadata Files:
- Readme: README.md
- License: LICENSE

Lists

awesome-stars - dog-qiuqiu/Yolo-Fastest - :zap: Based on yolo's ultra-lightweight universal target detection algorithm, the calculation amount is only 250mflops, the ncnn model size is only 666kb, the Raspberry Pi 3b can run up to 15fps+, and (C)
awesome-yolo-object-detection - dog-qiuqiu/Yolo-Fastest - qiuqiu/Yolo-Fastest?style=social"/> : Yolo-Fastest：超超超快的开源ARM实时目标检测算法。 [Zenodo 2021](http://doi.org/10.5281/zenodo.5131532). "知乎「马雪浩」《[Yolo-Fastest：超超超快的开源ARM实时目标检测算法](https://zhuanlan.zhihu.com/p/234506503)》"。 (Lighter and Deployment Frameworks)

README

        * 2021.8.12: yolo-fastestV2版本已发布:https://github.com/dog-qiuqiu/Yolo-FastestV2

* 2021.3.21: 对模型结构进行细微调整优化，更新Yolo-Fastest-1.1模型

* 2021.3.19: NCNN Camera Demo https://github.com/dog-qiuqiu/Yolo-Fastest/tree/master/sample/ncnn

* 2021.3.16: 修复分组卷积在某些旧架构GPU推理耗时异常的问题

# :zap:Yolo-Fastest:zap:[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5131532.svg)](https://doi.org/10.5281/zenodo.5131532)

* Simple, fast, compact, easy to transplant

* A real-time target detection algorithm for all platforms

* The fastest and smallest known universal target detection algorithm based on yolo

* ***Optimized design for ARM mobile terminal, optimized to support [NCNN](https://github.com/Tencent/ncnn) reasoning framework***

* Based on NCNN deployed on RK3399 ,Raspberry Pi 4b... and other embedded devices to achieve full real-time 30fps+

![image](https://github.com/dog-qiuqiu/Yolo-Fastest/blob/master/data/fast.jpg)

* ***中文介绍https://zhuanlan.zhihu.com/p/234506503*** 

* ***相比AlexeyAB/darknet，此版本的darknet修复分组卷积在某些旧架构GPU推理耗时异常的问题(例如1050ti:40ms->4ms速度提升10倍)，强烈建议用此仓库框架训练模型***

* ***Compared with AlexeyAB/darknet, this version of darknet fixes the problem of abnormal time-consuming inference of grouped convolution in some old architecture GPUs (for example, 1050ti:40ms->4ms speed up 10 times), it is strongly recommended to use this warehouse framework for training model***

* ***Darknet CPU推理效率优化不好，不建议使用Darknet作为CPU端的推理框架，建议使用NCNN***

* ***Darknet CPU reasoning efficiency optimization is not good, it is not recommended to use Darknet as the CPU side reasoning framework, it is recommended to use ncnn***

* ***Based on pytorch training framework: https://github.com/dog-qiuqiu/yolov3***

# Evaluating indicator/Benchmark

Network|COCO mAP(0.5)|Resolution|Run Time(Ncnn 4xCore)|Run Time(Ncnn 1xCore)|FLOPS|Params|Weight size

:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:

[Yolo-Fastest-1.1](https://github.com/dog-qiuqiu/Yolo-Fastest/tree/master/ModelZoo/yolo-fastest-1.1_coco)|24.40 %|320X320|5.59 ms|7.52 ms|0.252BFlops|0.35M|1.4M

[Yolo-Fastest-1.1-xl](https://github.com/dog-qiuqiu/Yolo-Fastest/tree/master/ModelZoo/yolo-fastest-1.1_coco)|34.33 %|320X320|9.27ms|15.72ms|0.725BFlops|0.925M|3.7M

[Yolov3-Tiny-Prn](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3-tiny-prn.cfg)|33.1%|416X416|%ms|%ms|3.5BFlops|4.7M|18.8M

[Yolov4-Tiny](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-tiny.cfg)|40.2%|416X416|23.67ms|40.14ms|6.9 BFlops|5.77M|23.1M

* ***Test platform Mi 11 Snapdragon 888 CPU，Based on [NCNN](https://github.com/Tencent/ncnn)***

* COCO 2017 Val mAP（no group label）

* Suitable for hardware with extremely tight computing resources

* This model is recommended to do some simple single object detection suitable for simple application scenarios

# Yolo-Fastest-1.1 Multi-platform benchmark

Equipment|Computing backend|System|Framework|Run time

:---:|:---:|:---:|:---:|:---:

Mi 11|Snapdragon 888|Android(arm64)|ncnn|5.59ms

Mate 30|Kirin 990|Android(arm64)|ncnn|6.12ms

Meizu 16|Snapdragon 845|Android(arm64)|ncnn|7.72ms

Development board|Snapdragon 835(Monkey version)|Android(arm64)|ncnn|20.52ms

Development board|RK3399|Linux(arm64)|ncnn|35.04ms

Raspberrypi 3B|4xCortex-A53|Linux(arm64)|ncnn|62.31ms

Orangepi Zero Lts|H2+ 4xCortex-A7|Linux(armv7)|ncnn|550ms

Nvidia|Gtx 1050ti|Ubuntu(x64)|darknet|4.73ms

Intel|i7-8700|Ubuntu(x64)|ncnn|5.78ms

* The above is a multi-core test benchmark

* The above speed benchmark is tested by ***big core*** in big.little CPU

* Raspberrypi 3B enable bf16s optimization，[Raspberrypi 64 Bit OS](http://downloads.raspberrypi.org/raspios_arm64/images/raspios_arm64-2020-08-24/)

* [Rk3399 needs to lock the cpu to the highest frequency](http://blog.sina.com.cn/s/blog_15d5280590102yarw.html), ncnn and enable bf16s optimization

# Pascal VOC performance index comparison

Network|Model Size|mAP(VOC 2007)|FLOPS

:---:|:---:|:---:|:---:

Tiny YOLOv2|60.5MB|57.1%|6.97BFlops

Tiny YOLOv3|33.4MB|58.4%|5.52BFlops

YOLO Nano|4.0MB|69.1%|4.51Bflops

MobileNetv2-SSD-Lite|13.8MB|68.6%|&Bflops

MobileNetV2-YOLOv3|11.52MB|70.20%|2.02Bflos

Pelee-SSD|21.68MB|70.09%|2.40Bflos

***Yolo Fastest***|1.3MB|61.02%|0.23Bflops

***Yolo Fastest-XL***|3.5MB|69.43%|0.70Bflops

***MobileNetv2-Yolo-Lite***|8.0MB|73.26%|1.80Bflops

* Performance indicators reference from the papers and public indicators in the github project

* MobileNetv2-Yolo-Lite: https://github.com/dog-qiuqiu/MobileNet-Yolo#mobilenetv2-yolov3-litenano-darknet

# Yolo-Fastest-1.1 Pedestrian detection

Equipment|System|Framework|Run time

:---:|:---:|:---:|:---:

Raspberrypi 3B|Linux(arm64)|ncnn|62ms

* Simple real-time pedestrian detection model based on yolo-fastest-1.1

* Enable bf16s optimization，[Raspberrypi 64 Bit OS](http://downloads.raspberrypi.org/raspios_arm64/images/raspios_arm64-2020-08-24/)

## Demo

![image](https://github.com/dog-qiuqiu/Yolo-Fastest/blob/master/data/p1.jpg)

![image](https://github.com/dog-qiuqiu/Yolo-Fastest/blob/master/data/p2.jpg)

# Compile 

## How to compile on Linux

* This repo is based on Darknet project so the instructions for compiling the project are same

(https://github.com/MuhammadAsadJaved/darknet#how-to-compile-on-windows-legacy-way)

Just do `make` in the Yolo-Fastest-master directory. Before make, you can set such options in the `Makefile`: [link](https://github.com/dog-qiuqiu/Yolo-Fastest/blob/master/Makefile#L1)

* `GPU=1` to build with CUDA to accelerate by using GPU (CUDA should be in `/usr/local/cuda`)

* `CUDNN=1` to build with cuDNN v5-v7 to accelerate training by using GPU (cuDNN should be in `/usr/local/cudnn`)

* `CUDNN_HALF=1` to build for Tensor Cores (on Titan V / Tesla V100 / DGX-2 and later) speedup Detection 3x, Training 2x

* `OPENCV=1` to build with OpenCV 4.x/3.x/2.4.x - allows to detect on video files and video streams from network cameras or web-cams

* Set the other options in the `Makefile` according to your need.

# Test/Demo

*Run Yolo-Fastest , Yolo-Fastest-xl  , Yolov3 or Yolov4 on image or video inputs

## Demo on image input

*Note: change  .data , .cfg , .weights and input image file in `image_yolov3.sh` for Yolo-Fastest-x1, Yolov3 and Yolov4

```

  sh image_yolov3.sh

```

## Demo on video input

*Note: Use any input video and place in the `data` folder or use `0` in the `video_yolov3.sh` for webcam

*Note: change  .data , .cfg , .weights and input video file in `video_yolov3.sh` for Yolo-Fastest-x1, Yolov3 and Yolov4

```

  sh video_yolov3.sh

```

## Yolo-Fastest Test

![image](https://github.com/dog-qiuqiu/Yolo-Fastest/blob/master/data/predictions_2.png)

## Yolo-Fastest-xl Test

![image](https://github.com/dog-qiuqiu/Yolo-Fastest/blob/master/data/projections.jpg)

# How to Train

## Generate a pre-trained model for the initialization of the model backbone

```

  ./darknet partial yolo-fastest.cfg yolo-fastest.weights yolo-fastest.conv.109 109

```

## Train

* 交流qq群:1062122604

* https://github.com/AlexeyAB/darknet

```

  ./darknet detector train voc.data yolo-fastest.cfg yolo-fastest.conv.109 

```

# Deploy

## NCNN

### NCNN Conversion Tutorial

* Benchmark:https://github.com/Tencent/ncnn/tree/master/benchmark

* NCNN supports direct conversion of darknet models

* darknet2ncnn: https://github.com/Tencent/ncnn/tree/master/tools/darknet

### NCNN Sample

* CamSample:https://github.com/dog-qiuqiu/Yolo-Fastest/tree/master/sample/ncnn

* AndroidSample: https://github.com/WZTENG/YOLOv5_NCNN

## MNN&TNN&MNN

* https://github.com/dog-qiuqiu/MobileNet-Yolo#darknet2caffe-tutorial

* ***Based on MNN: https://github.com/geekzhu001/Yolo-Fastest-MNN Run on : raspberry pi 4B 2G Input size : 320*320 Average inference time : 0.035s*** 

## ONNX&TensorRT

* https://github.com/CaoWGG/TensorRT-YOLOv4

* It is not efficient to run on Psacal and earlier GPU architectures. It is not recommended to deploy on such devices such as jeston nano(17ms/img), Tx1, Tx2, but there is no such problem in Turing GPU, such as jetson-Xavier-NX Can run efficiently

## OpenCV DNN

* https://blog.csdn.net/nihate/article/details/108670542

# Thanks

* https://github.com/AlexeyAB/darknet

* https://github.com/Tencent/ncnn

## Cite as

dog-qiuqiu. (2021, July 24). dog-qiuqiu/Yolo-Fastest: 

yolo-fastest-v1.1.0 (Version v.1.1.0). Zenodo. 

http://doi.org/10.5281/zenodo.5131532