https://github.com/fsx950223/mobilenetv2-yolov3

yolov3 with mobilenetv2 and efficientnet
https://github.com/fsx950223/mobilenetv2-yolov3

darknet53 efficientnet keras mobilenetv2 tensorflow tf2 yolov3

Last synced: 2 months ago
JSON representation

yolov3 with mobilenetv2 and efficientnet

Host: GitHub
URL: https://github.com/fsx950223/mobilenetv2-yolov3
Owner: fsx950223
License: mit
Created: 2019-03-09T03:54:37.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2022-11-21T21:06:44.000Z (over 2 years ago)
Last Synced: 2025-03-30T03:08:12.132Z (3 months ago)
Topics: darknet53, efficientnet, keras, mobilenetv2, tensorflow, tf2, yolov3
Language: Python
Homepage: https://fsx950223.github.io/mobilenetv2-yolov3/tfjs
Size: 35.5 MB
Stars: 291
Watchers: 15
Forks: 101
Open Issues: 21
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-yolo-object-detection - fsx950223/mobilenetv2-yolov3 - yolov3?style=social"/> : yolov3 with mobilenetv2 and efficientnet. (Lighter and Deployment Frameworks)
awesome-yolo-object-detection - fsx950223/mobilenetv2-yolov3 - yolov3?style=social"/> : yolov3 with mobilenetv2 and efficientnet. (Lighter and Deployment Frameworks)

README

# Mobilenetv2-Yolov3
Tensorflow implementation mobilenetv2-yolov3 and efficientnet-yolov3 inspired by [keras-yolo3](https://github.com/qqwweee/keras-yolo3.git)

---
# Update
Backend:
- [x] MobilenetV2
- [x] Efficientnet
- [x] Darknet53

Callback:
- [x] mAP
- [ ] Tensorboard extern callback

Loss:
- [x] MSE
- [x] GIOU
- [x] Adversarial loss

Train:
- [x] Cosine learning rate
- [x] Auto augment

Tensorflow:
- [x] Tensorflow2 Ready
- [x] tf.data pipeline
- [ ] Convert model to tensorflow lite model
- [x] Multi GPU training
- [ ] TPU support
- [x] TensorRT support

Serving:
- [x] Tensorflow Serving warm up request
- [x] Tensorflow Serving JAVA Client
- [x] Tensorflow Serving Python Client
- [x] Tensorflow Serving Service Control Client
- [x] Tensorflow Serving Server Build and Plugins develop

---

# Usage
### Install:
``` bash
pip install -r requirements.txt
```
### Get help info:
``` bash
python main.py --help
```
### Train:
1. Format file name like [name]_[number].[extension]

Example:

```
voc_train_3998.txt
```

2. If you are using txt dataset, please format records like [image_path] [,[xmin ymin xmax ymax class]]
(for convenience, you can modify voc_text.py to parse your data to specific data format), else you should modify voc_annotation.py, then run

``` bash
python voc_annotation.py
```
to parse your data to tfrecords.

Example:

```
/image/path 179 66 272 290 14 172 38 317 349 14 276 2 426 252 14 1 32 498 365 13
```

3. Run:

``` bash
python main.py --mode=TRAIN --train_dataset_glob= --epochs=50 --epochs=50 --mode=TRAIN
```

### Predict:
``` bash
python main.py --mode=IMAGE --model=
```

### MAP:
``` bash
python main.py --mode=MAP --model= --test_dataset_glob=
```

### Export serving model:
``` bash
python main.py --mode=SERVING --model=
```

### Use custom config file:
``` bash
python main.py --config=mobilenetv2.yaml
```

---
# Set up tensorflow.js model (Live Demo: https://fsx950223.github.io/mobilenetv2-yolov3/tfjs/)
1. Create a web server on project folder

2. Open browser and enter [your_url:your_port]/tfjs

---
# Resources
* Download pascal tfrecords from [here](https://drive.google.com/drive/folders/172sH75LPeUd2yyzAnrce0LLe2UR_kFqF).
* Download pre-trained mobilenetv2-yolov3 model(VOC2007) [here](https://drive.google.com/open?id=1B0vVQsuWY-zfuyol38-R5XJs1mntIwqZ)
* Download pre-trained efficientnet-yolov3 model(VOC2007) [here](https://drive.google.com/open?id=10A2BqNrQp5_hIcBzGXu6Xiv4mCQzga2q)
* Download pre-trained efficientnet-yolov3 model(VOC2007+2012) [here](https://drive.google.com/open?id=1dYfi1z5EeNsXMLACwoeR4jGj7RWyCcZp)

---

# Performance
Network: Mobilenetv2+Yolov3

Input size: 416*416

Train Dataset: VOC2007

Test Dataset: VOC2007

mAP:

```
aeroplane ap: 0.6721874861775297
bicycle ap: 0.7844226664948993
bird ap: 0.6863393529648882
boat ap: 0.5102715372530052
bottle ap: 0.4098093697072679
bus ap: 0.7646277543282962
car ap: 0.8000339732789448
cat ap: 0.8681120849855787
chair ap: 0.4021823009684314
cow ap: 0.6768311030872428
diningtable ap: 0.626045232887253
dog ap: 0.8293983813984888
horse ap: 0.8315961581768014
motorbike ap: 0.771283337747543
person ap: 0.7298645793931624
pottedplant ap: 0.3081565644702266
sheep ap: 0.6510012751038824
sofa ap: 0.6442699680945367
train ap: 0.8025086962000969
tvmonitor ap: 0.6239227675451299
mAP: 0.6696432295131602
```
GPU inference time (GTX1080Ti): 19ms

CPU inference time (i7-8550U): 112ms

Model size: 37M

Network: Efficientnet+Yolov3

Input size: 380*380

Train Dataset: VOC2007

Test Dataset: VOC2007

mAP:

```
aeroplane ap: 0.7770436248733187
bicycle ap: 0.822183784348553
bird ap: 0.7346967323068865
boat ap: 0.6142903989882571
bottle ap: 0.4518063126765959
bus ap: 0.782237197681936
car ap: 0.8138978890046222
cat ap: 0.8800232369515162
chair ap: 0.4531520519719176
cow ap: 0.6992367978932157
diningtable ap: 0.6765065569475968
dog ap: 0.8612118810883834
horse ap: 0.8559580684256001
motorbike ap: 0.8027311717682002
person ap: 0.7280218883512792
pottedplant ap: 0.35520418960051925
sheep ap: 0.6833401035128458
sofa ap: 0.6753841073186044
train ap: 0.8107647793504738
tvmonitor ap: 0.6726791558585905
mAP: 0.7075184964459456
```
GPU inference time (GTX1080Ti): 23ms

CPU inference time (i7-8550U): 168ms

Model size: 77M

Network: Efficientnet+Yolov3

Input size: 380*380

Train Dataset: VOC2007+VOC2012

Test Dataset: VOC2007

mAP:

```
aeroplane ap: 0.8572154850266848
bicycle ap: 0.8129962658687486
bird ap: 0.8325678324285539
boat ap: 0.7061501348114156
bottle ap: 0.5603823420846883
bus ap: 0.8536452418769342
car ap: 0.8395446870008888
cat ap: 0.9200504816535645
chair ap: 0.514644868267842
cow ap: 0.8202171886452714
diningtable ap: 0.7370149790284737
dog ap: 0.900374518831019
horse ap: 0.8632567146990895
motorbike ap: 0.8147344820261591
person ap: 0.7690434789031615
pottedplant ap: 0.4576271726152926
sheep ap: 0.8006580581981677
sofa ap: 0.7478146395952494
train ap: 0.8783508559769437
tvmonitor ap: 0.6923886096918628
mAP: 0.7689339018615006
```
GPU inference time (GTX1080Ti): 23ms

CPU inference time (i7-8550U): 168ms

Model size: 77M

---

# Reference
paper:

- [YOLOv3: An Incremental Improvement](https://arxiv.org/abs/1804.02767)

- [An Analysis of Scale Invariance in Object Detection - SNIP](https://arxiv.org/abs/1711.08189)

- [Scale-Aware Trident Networks for Object Detection](https://arxiv.org/abs/1901.01892)

- [Understanding the Effective Receptive Field in Deep Convolutional Neural Networks](https://arxiv.org/abs/1701.04128)

- [Bag of Freebies for Training Object Detection Neural Networks](https://arxiv.org/pdf/1902.04103.pdf)

- [Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression](https://arxiv.org/abs/1902.09630)

- [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fsx950223/mobilenetv2-yolov3

Awesome Lists containing this project

README