https://github.com/jwyang/faster-rcnn.pytorch

A faster pytorch implementation of faster r-cnn
https://github.com/jwyang/faster-rcnn.pytorch

faster-rcnn pytorch

Last synced: about 1 year ago
JSON representation

A faster pytorch implementation of faster r-cnn

Host: GitHub
URL: https://github.com/jwyang/faster-rcnn.pytorch
Owner: jwyang
License: mit
Created: 2017-08-03T19:46:54.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2022-05-20T09:01:28.000Z (about 4 years ago)
Last Synced: 2025-04-03T09:05:28.189Z (about 1 year ago)
Topics: faster-rcnn, pytorch
Language: Python
Homepage:
Size: 6.09 MB
Stars: 7,766
Watchers: 90
Forks: 2,329
Open Issues: 423
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-pytorch-list - faster-rcnn.pytorch - CNN 的 PyTorch 实现。 (常见论文实现)
Awesome-pytorch-list-CNVersion - faster-rcnn.pytorch - CNN implementation, aimed to accelerating the training of faster R-CNN object detection models. (Paper implementations｜论文实现 / Other libraries｜其他库:)
CV-pretrained-model - faster-rcnn.pytorch - CNN implementation, aimed to accelerating the training of faster R-CNN object detection models. | `PyTorch`| [MIT License]( https://raw.githubusercontent.com/jwyang/faster-rcnn.pytorch/master/LICENSE ) (Model Deployment library / PyTorch <a name="pytorch"/>)
StarryDivineSky - jwyang/faster-rcnn.pytorch - CNN目标检测算法在PyTorch框架下的高效实现，主要特点是优化了计算速度并保持了算法精度。项目基于经典的Faster R-CNN架构，通过改进区域建议网络（RPN）和检测头的设计，实现了更高效的特征提取和目标定位。代码采用PyTorch 0.4.0版本开发，支持ResNet-101和ResNet-50等主流骨干网络，预训练模型在COCO数据集上取得了较高的检测精度（mAP 36.8%）。项目提供了完整的训练和推理流程，用户可通过修改配置文件快速切换不同网络结构，支持多尺度训练和数据增强技术提升模型鲁棒性。代码结构清晰，包含数据预处理、模型定义、训练脚本和评估工具，特别优化了训练过程中的GPU内存占用。项目还支持可视化工具和结果分析模块，方便用户调试和验证模型效果。相比其他Faster R-CNN实现，该项目通过使用更高效的卷积操作和优化器配置，将训练速度提升了约20%，同时保持了与原始论文相当的检测性能。开发者提供了详细的使用说明和常见问题解决方案，适用于研究和工业检测场景，可作为PyTorch目标检测项目的参考模板。 (对象检测_分割 / 资源传输下载)
Awesome-pytorch-list - faster-rcnn.pytorch - CNN implementation, aimed to accelerating the training of faster R-CNN object detection models. (Paper implementations / Other libraries:)

README

          # A *Faster* Pytorch Implementation of Faster R-CNN

## Write at the beginning

[05/29/2020] This repo was initaited about two years ago, developed as the first open-sourced object detection code which supports multi-gpu training. It has been integrating tremendous efforts from many people. However, we have seen many high-quality repos emerged in the last years, such as:

* [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark)

* [detectron2](https://github.com/facebookresearch/detectron2)

* [mmdetection](https://github.com/open-mmlab/mmdetection)

**At this point, I think this repo is out-of-data in terms of the pipeline and coding style, and will not maintain actively. Though you can still use this repo as a playground, I highly recommend you move to the above repos to delve into west world of object detection!**

## Introduction

### :boom: Good news! This repo supports pytorch-1.0 now!!! We borrowed some code and techniques from [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark). Just go to pytorch-1.0 branch!

This project is a *faster* pytorch implementation of faster R-CNN, aimed to accelerating the training of faster R-CNN object detection models. Recently, there are a number of good implementations:

* [rbgirshick/py-faster-rcnn](https://github.com/rbgirshick/py-faster-rcnn), developed based on Pycaffe + Numpy

* [longcw/faster_rcnn_pytorch](https://github.com/longcw/faster_rcnn_pytorch), developed based on Pytorch + Numpy

* [endernewton/tf-faster-rcnn](https://github.com/endernewton/tf-faster-rcnn), developed based on TensorFlow + Numpy

* [ruotianluo/pytorch-faster-rcnn](https://github.com/ruotianluo/pytorch-faster-rcnn), developed based on Pytorch + TensorFlow + Numpy

During our implementing, we referred the above implementations, especailly [longcw/faster_rcnn_pytorch](https://github.com/longcw/faster_rcnn_pytorch). However, our implementation has several unique and new features compared with the above implementations:

* **It is pure Pytorch code**. We convert all the numpy implementations to pytorch!

* **It supports multi-image batch training**. We revise all the layers, including dataloader, rpn, roi-pooling, etc., to support multiple images in each minibatch.

* **It supports multiple GPUs training**. We use a multiple GPU wrapper (nn.DataParallel here) to make it flexible to use one or more GPUs, as a merit of the above two features.

* **It supports three pooling methods**. We integrate three pooling methods: roi pooing, roi align and roi crop. More importantly, we modify all of them to support multi-image batch training.

* **It is memory efficient**. We limit the image aspect ratio, and group images with similar aspect ratios into a minibatch. As such, we can train resnet101 and VGG16 with batchsize = 4 (4 images) on a single Titan X (12 GB). When training with 8 GPU, the maximum batchsize for each GPU is 3 (Res101), totaling 24.

* **It is faster**. Based on the above modifications, the training is much faster. We report the training speed on NVIDIA TITAN Xp in the tables below.

### What we are doing and going to do

- [x] Support both python2 and python3 (great thanks to [cclauss](https://github.com/cclauss)).

- [x] Add deformable pooling layer (mainly supported by [Xander](https://github.com/xanderchf)).

- [x] Support pytorch-0.4.0 (this branch).

- [x] Support tensorboardX.

- [x] Support pytorch-1.0 (go to pytorch-1.0 branch).

## Other Implementations

* [Feature Pyramid Network (FPN)](https://github.com/jwyang/fpn.pytorch)

* [Mask R-CNN](https://github.com/roytseng-tw/mask-rcnn.pytorch) (~~ongoing~~ already implemented by [roytseng-tw](https://github.com/roytseng-tw))

* [Graph R-CNN](https://github.com/jwyang/graph-rcnn.pytorch) (extension to scene graph generation)

## Tutorial

* [Blog](http://www.telesens.co/2018/03/11/object-detection-and-classification-using-r-cnns/) by [ankur6ue](https://github.com/ankur6ue)

## Benchmarking

We benchmark our code thoroughly on three datasets: pascal voc, coco and visual genome, using two different network architectures: vgg16 and resnet101. Below are the results:

1). PASCAL VOC 2007 (Train/Test: 07trainval/07test, scale=600, ROI Align)

model    | #GPUs | batch size | lr        | lr_decay | max_epoch     |  time/epoch | mem/GPU | mAP

---------|--------|-----|--------|-----|-----|-------|--------|-----

[VGG-16](https://www.dropbox.com/s/6ief4w7qzka6083/faster_rcnn_1_6_10021.pth?dl=0)     | 1 | 1 | 1e-3 | 5   | 6   |  0.76 hr | 3265MB   | 70.1

[VGG-16](https://www.dropbox.com/s/cpj2nu35am0f9hp/faster_rcnn_1_9_2504.pth?dl=0)     | 1 | 4 | 4e-3 | 8   | 9  |  0.50 hr | 9083MB   | 69.6

[VGG-16](https://www.dropbox.com/s/1a31y7vicby0kvy/faster_rcnn_1_10_625.pth?dl=0)     | 8 | 16| 1e-2 | 8   | 10  |  0.19 hr | 5291MB   | 69.4

[VGG-16](https://www.dropbox.com/s/hkj7i6mbhw9tq4k/faster_rcnn_1_11_416.pth?dl=0)     | 8 | 24| 1e-2 | 10  | 11  |  0.16 hr | 11303MB  | 69.2

[Res-101](https://www.dropbox.com/s/4v3or0054kzl19q/faster_rcnn_1_7_10021.pth?dl=0)   | 1 | 1 | 1e-3 | 5   | 7   |  0.88 hr | 3200 MB  | 75.2

[Res-101](https://www.dropbox.com/s/8bhldrds3mf0yuj/faster_rcnn_1_10_2504.pth?dl=0)    | 1 | 4 | 4e-3 | 8   | 10  |  0.60 hr | 9700 MB  | 74.9

[Res-101](https://www.dropbox.com/s/5is50y01m1l9hbu/faster_rcnn_1_10_625.pth?dl=0)    | 8 | 16| 1e-2 | 8   | 10  |  0.23 hr | 8400 MB  | 75.2 

[Res-101](https://www.dropbox.com/s/cn8gneumg4gjo9i/faster_rcnn_1_12_416.pth?dl=0)    | 8 | 24| 1e-2 | 10  | 12  |  0.17 hr | 10327MB  | 75.1  

2). COCO (Train/Test: coco_train+coco_val-minival/minival, scale=800, max_size=1200, ROI Align)

model     | #GPUs | batch size |lr        | lr_decay | max_epoch     |  time/epoch | mem/GPU | mAP

---------|--------|-----|--------|-----|-----|-------|--------|-----

VGG-16     | 8 | 16    |1e-2| 4   | 6  |  4.9 hr | 7192 MB  | 29.2

[Res-101](https://www.dropbox.com/s/5if6l7mqsi4rfk9/faster_rcnn_1_6_14657.pth?dl=0)    | 8 | 16    |1e-2| 4   | 6  |  6.0 hr    |10956 MB  | 36.2

[Res-101](https://www.dropbox.com/s/be0isevd22eikqb/faster_rcnn_1_10_14657.pth?dl=0)    | 8 | 16    |1e-2| 4   | 10  |  6.0 hr    |10956 MB  | 37.0

**NOTE**. Since the above models use scale=800, you need add "--ls" at the end of test command.

3). COCO (Train/Test: coco_train+coco_val-minival/minival, scale=600, max_size=1000, ROI Align)

model     | #GPUs | batch size |lr        | lr_decay | max_epoch     |  time/epoch | mem/GPU | mAP

---------|--------|-----|--------|-----|-----|-------|--------|-----

[Res-101](https://www.dropbox.com/s/y171ze1sdw1o2ph/faster_rcnn_1_6_9771.pth?dl=0)    | 8 | 24    |1e-2| 4   | 6  |  5.4 hr    |10659 MB  | 33.9

[Res-101](https://www.dropbox.com/s/dpq6qv0efspelr3/faster_rcnn_1_10_9771.pth?dl=0)    | 8 | 24    |1e-2| 4   | 10  |  5.4 hr    |10659 MB  | 34.5

4). Visual Genome (Train/Test: vg_train/vg_test, scale=600, max_size=1000, ROI Align, category=2500)

model     | #GPUs | batch size |lr        | lr_decay | max_epoch     |  time/epoch | mem/GPU | mAP

---------|--------|-----|--------|-----|-----|-------|--------|-----

[VGG-16](http://data.lip6.fr/cadene/faster-rcnn.pytorch/faster_rcnn_1_19_48611.pth)    | 1 P100 | 4    |1e-3| 5   | 20  |  3.7 hr    |12707 MB  | 4.4

Thanks to [Remi](https://github.com/Cadene) for providing the pretrained detection model on visual genome!

* Click the links in the above tables to download our pre-trained faster r-cnn models.

* If not mentioned, the GPU we used is NVIDIA Titan X Pascal (12GB).

## Preparation

First of all, clone the code

```

git clone https://github.com/jwyang/faster-rcnn.pytorch.git

```

Then, create a folder:

```

cd faster-rcnn.pytorch && mkdir data

```

### prerequisites

* Python 2.7 or 3.6

* Pytorch 0.4.0 (**now it does not support 0.4.1 or higher**)

* CUDA 8.0 or higher

### Data Preparation

* **PASCAL_VOC 07+12**: Please follow the instructions in [py-faster-rcnn](https://github.com/rbgirshick/py-faster-rcnn#beyond-the-demo-installation-for-training-and-testing-models) to prepare VOC datasets. Actually, you can refer to any others. After downloading the data, create softlinks in the folder data/.

* **COCO**: Please also follow the instructions in [py-faster-rcnn](https://github.com/rbgirshick/py-faster-rcnn#beyond-the-demo-installation-for-training-and-testing-models) to prepare the data.

* **Visual Genome**: Please follow the instructions in [bottom-up-attention](https://github.com/peteanderson80/bottom-up-attention) to prepare Visual Genome dataset. You need to download the images and object annotation files first, and then perform proprecessing to obtain the vocabulary and cleansed annotations based on the scripts provided in this repository.

### Pretrained Model

We used two pretrained models in our experiments, VGG and ResNet101. You can download these two models from:

* VGG16: [Dropbox](https://www.dropbox.com/s/s3brpk0bdq60nyb/vgg16_caffe.pth?dl=0), [VT Server](https://filebox.ece.vt.edu/~jw2yang/faster-rcnn/pretrained-base-models/vgg16_caffe.pth)

* ResNet101: [Dropbox](https://www.dropbox.com/s/iev3tkbz5wyyuz9/resnet101_caffe.pth?dl=0), [VT Server](https://filebox.ece.vt.edu/~jw2yang/faster-rcnn/pretrained-base-models/resnet101_caffe.pth)

Download them and put them into the data/pretrained_model/.

**NOTE**. We compare the pretrained models from Pytorch and Caffe, and surprisingly find Caffe pretrained models have slightly better performance than Pytorch pretrained. We would suggest to use Caffe pretrained models from the above link to reproduce our results.

**If you want to use pytorch pre-trained models, please remember to transpose images from BGR to RGB, and also use the same data transformer (minus mean and normalize) as used in pretrained model.**

### Compilation

As pointed out by [ruotianluo/pytorch-faster-rcnn](https://github.com/ruotianluo/pytorch-faster-rcnn), choose the right `-arch` in `make.sh` file, to compile the cuda code:

  | GPU model  | Architecture |

  | ------------- | ------------- |

  | TitanX (Maxwell/Pascal) | sm_52 |

  | GTX 960M | sm_50 |

  | GTX 1080 (Ti) | sm_61 |

  | Grid K520 (AWS g2.2xlarge) | sm_30 |

  | Tesla K80 (AWS p2.xlarge) | sm_37 |

More details about setting the architecture can be found [here](https://developer.nvidia.com/cuda-gpus) or [here](http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/)

Install all the python dependencies using pip:

```

pip install -r requirements.txt

```

Compile the cuda dependencies using following simple commands:

```

cd lib

sh make.sh

```

It will compile all the modules you need, including NMS, ROI_Pooing, ROI_Align and ROI_Crop. The default version is compiled with Python 2.7, please compile by yourself if you are using a different python version.

**As pointed out in this [issue](https://github.com/jwyang/faster-rcnn.pytorch/issues/16), if you encounter some error during the compilation, you might miss to export the CUDA paths to your environment.**

## Train

Before training, set the right directory to save and load the trained models. Change the arguments "save_dir" and "load_dir" in trainval_net.py and test_net.py to adapt to your environment.

To train a faster R-CNN model with vgg16 on pascal_voc, simply run:

```

CUDA_VISIBLE_DEVICES=$GPU_ID python trainval_net.py \

                   --dataset pascal_voc --net vgg16 \

                   --bs $BATCH_SIZE --nw $WORKER_NUMBER \

                   --lr $LEARNING_RATE --lr_decay_step $DECAY_STEP \

                   --cuda

```

where 'bs' is the batch size with default 1. Alternatively, to train with resnet101 on pascal_voc, simple run:

```

 CUDA_VISIBLE_DEVICES=$GPU_ID python trainval_net.py \

                    --dataset pascal_voc --net res101 \

                    --bs $BATCH_SIZE --nw $WORKER_NUMBER \

                    --lr $LEARNING_RATE --lr_decay_step $DECAY_STEP \

                    --cuda

```

Above, BATCH_SIZE and WORKER_NUMBER can be set adaptively according to your GPU memory size. **On Titan Xp with 12G memory, it can be up to 4**.

If you have multiple (say 8) Titan Xp GPUs, then just use them all! Try:

```

python trainval_net.py --dataset pascal_voc --net vgg16 \

                       --bs 24 --nw 8 \

                       --lr $LEARNING_RATE --lr_decay_step $DECAY_STEP \

                       --cuda --mGPUs

```

Change dataset to "coco" or 'vg' if you want to train on COCO or Visual Genome.

## Test

If you want to evaluate the detection performance of a pre-trained vgg16 model on pascal_voc test set, simply run

```

python test_net.py --dataset pascal_voc --net vgg16 \

                   --checksession $SESSION --checkepoch $EPOCH --checkpoint $CHECKPOINT \

                   --cuda

```

Specify the specific model session, checkepoch and checkpoint, e.g., SESSION=1, EPOCH=6, CHECKPOINT=416.

## Demo

If you want to run detection on your own images with a pre-trained model, download the pretrained model listed in above tables or train your own models at first, then add images to folder $ROOT/images, and then run

```

python demo.py --net vgg16 \

               --checksession $SESSION --checkepoch $EPOCH --checkpoint $CHECKPOINT \

               --cuda --load_dir path/to/model/directoy

```

Then you will find the detection results in folder $ROOT/images.

**Note the default demo.py merely support pascal_voc categories. You need to change the [line](https://github.com/jwyang/faster-rcnn.pytorch/blob/530f3fdccaa60d05fa068bc2148695211586bd88/demo.py#L156) to adapt your own model.**

Below are some detection results:



 



## Webcam Demo

You can use a webcam in a real-time demo by running

```

python demo.py --net vgg16 \

               --checksession $SESSION --checkepoch $EPOCH --checkpoint $CHECKPOINT \

               --cuda --load_dir path/to/model/directoy \

               --webcam $WEBCAM_ID

```

The demo is stopped by clicking the image window and then pressing the 'q' key.

## Authorship

This project is equally contributed by [Jianwei Yang](https://github.com/jwyang) and [Jiasen Lu](https://github.com/jiasenlu), and many others (thanks to them!).

## Citation

    @article{jjfaster2rcnn,

        Author = {Jianwei Yang and Jiasen Lu and Dhruv Batra and Devi Parikh},

        Title = {A Faster Pytorch Implementation of Faster R-CNN},

        Journal = {https://github.com/jwyang/faster-rcnn.pytorch},

        Year = {2017}

    }

    @inproceedings{renNIPS15fasterrcnn,

        Author = {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun},

        Title = {Faster {R-CNN}: Towards Real-Time Object Detection

                 with Region Proposal Networks},

        Booktitle = {Advances in Neural Information Processing Systems ({NIPS})},

        Year = {2015}

    }

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jwyang/faster-rcnn.pytorch

Awesome Lists containing this project

README