https://github.com/vadimkantorov/caffemodel2pytorch

Convert Caffe models to PyTorch
https://github.com/vadimkantorov/caffemodel2pytorch
caffe converter pytorch
Last synced: 3 months ago
JSON representation
Convert Caffe models to PyTorch
Host: GitHub
URL: https://github.com/vadimkantorov/caffemodel2pytorch
Owner: vadimkantorov
Created: 2018-01-30T01:02:52.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2023-03-30T07:08:33.000Z (over 2 years ago)
Last Synced: 2025-03-29T15:06:52.148Z (4 months ago)
Topics: caffe, converter, pytorch
Language: Python
Homepage:
Size: 80.1 KB
Stars: 389
Watchers: 8
Forks: 83
Open Issues: 10
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

Awesome-pytorch-list-CNVersion - caffemodel2pytorch
Awesome-pytorch-list - caffemodel2pytorch
README

        This converter can be useful for porting Caffe code and layers to PyTorch. Features:

* dump caffemodel weights to hdf5, npy, pt and json formats

* load Caffe models and use them from PyTorch

* mock PyCaffe API to allow for smooth porting of Caffe-using code (drop-in script for [OICR](https://github.com/ppengtang/oicr) for changing backend in train/eval to PyTorch is below):

  * Net, Blob, SGDSolver

* wrapping Caffe's Python layers (see the OICR example)

* example of ROI pooling in PyTorch without manual CUDA code compilation (see the OICR example)

The layer support isn't as complete as in https://github.com/marvis/pytorch-caffe. Currently it supports the following Caffe layers:

* convolution (num_output, kernel_size, stride, pad, dilation; constant and gaussian weight/bias fillers)

* inner_product (num_output; constant and gaussian weight/bias fillers)

* max / avg pooling (kernel_size, stride, pad)

* relu

* dropout (dropout_ratio)

* eltwise (prod, sum, max)

* softmax (axis)

* local response norm (local_size, alpha, beta)

Dependencies: protobuf with Python bindings, including `protoc` binary in `PATH`.

PRs to enable other layers or layer params are very welcome (see the definition of the `modules` dictionary in the code)!

License is MIT.

## Dump weights to PT or HDF5

```shell

# prototxt and caffemodel from https://gist.github.com/ksimonyan/211839e770f7b538e2d8#file-readme-md

# dumps to PT by default to VGG_ILSVRC_16_layers.caffemodel.pt

python -m caffemodel2pytorch VGG_ILSVRC_16_layers.caffemodel

# dumps to HDF5 converted.h5

python -m caffemodel2pytorch VGG_ILSVRC_16_layers.caffemodel -o converted.h5

```

```python

# load dumped VGG16 in PyTorch

import collections, torch, torchvision, numpy, h5py

model = torchvision.models.vgg16()

model.features = torch.nn.Sequential(collections.OrderedDict(zip(['conv1_1', 'relu1_1', 'conv1_2', 'relu1_2', 'pool1', 'conv2_1', 'relu2_1', 'conv2_2', 'relu2_2', 'pool2', 'conv3_1', 'relu3_1', 'conv3_2', 'relu3_2', 'conv3_3', 'relu3_3', 'pool3', 'conv4_1', 'relu4_1', 'conv4_2', 'relu4_2', 'conv4_3', 'relu4_3', 'pool4', 'conv5_1', 'relu5_1', 'conv5_2', 'relu5_2', 'conv5_3', 'relu5_3', 'pool5'], model.features)))

model.classifier = torch.nn.Sequential(collections.OrderedDict(zip(['fc6', 'relu6', 'drop6', 'fc7', 'relu7', 'drop7', 'fc8'], model.classifier)))

state_dict = h5py.File('converted.h5', 'r') # torch.load('VGG_ILSVRC_16_layers.caffemodel.pt')

model.load_state_dict({l : torch.from_numpy(numpy.array(v)).view_as(p) for k, v in state_dict.items() for l, p in model.named_parameters() if k in l})

```

## Run Caffe models using PyTorch as backend

```python

import torch

import caffemodel2pytorch

model = caffemodel2pytorch.Net(

	prototxt = 'VGG_ILSVRC_16_layers_deploy.prototxt',

	weights = 'VGG_ILSVRC_16_layers.caffemodel',

	caffe_proto = 'https://raw.githubusercontent.com/BVLC/caffe/master/src/caffe/proto/caffe.proto'

)

model.cuda()

model.eval()

torch.set_grad_enabled(False)

# make sure to have right procedure of image normalization and channel reordering

image = torch.Tensor(8, 3, 224, 224).cuda()

# outputs dict of PyTorch Variables

# in this example the dict contains the only key "prob"

#output_dict = model(data = image)

# you can remove unneeded layers:

del model.prob

del model.fc8

# a single input variable is interpreted as an input blob named "data"

# in this example the dict contains the only key "fc7"

output_dict = model(image)

```

## Imitate pycaffe interface to help in porting

```python

import numpy as np

import caffemodel2pytorch as caffe

caffe.set_mode_gpu()

caffe.set_device(0)

# === LOADING AND USING THE NET IN EVAL MODE ===

net = caffe.Net('VGG_ILSVRC_16_layers_deploy.prototxt', caffe.TEST, weights = 'VGG_ILSVRC_16_layers.caffemodel')

# outputs a dict of NumPy arrays, data layer is sidestepped

blobs_out = net.forward(data = np.zeros((8, 3, 224, 224), dtype = np.float32))

# access the last layer

layer = net.layers[-1]

# converts and provides the output as NumPy array

numpy_array = net.blobs['conv1_1'].data

# access the loss weights

loss_weights = net.blob_loss_weights

# === BASIC OPTIMIZER ===

# this example uses paths from https://github.com/ppengtang/oicr

# create an SGD solver, loads the net in train mode

# it knows about base_lr, weight_decay, momentum, lr_mult, decay_mult, iter_size, lr policy step, step_size, gamma

# it finds train.prototxt from the solver.prototxt's train_net or net parameters

solver = caffe.SGDSolver('oicr/models/VGG16/solver.prototxt')

# load pretrained weights

solver.net.copy_from('oicr/data/imagenet_models/VGG16.v2.caffemodel')

# runs one iteration of forward, backward, optimization; returns a float loss value

# data layer must be registered or inputs must be provided as keyword arguments

loss = solver.step(1)

```

## Drop-in script for OICR enabling PyTorch as backend for eval and training

Place `caffe_pytorch_oicr.py` and `caffemodel2pytorch.py` in the root `oicr` directory. To use the PyTorch backend in testing and in training, put a line `import caffe_pytorch_oicr` at the very top (before `import _init_paths`) in `tools/test_net.py` and `tools/train_net.py` respectively. It requires PyTorch and CuPy (for on-the-fly CUDA kernel compilation).

```python

# caffe_pytorch_oicr.py

import collections

import torch

import torch.nn.functional as F

import cupy

import caffemodel2pytorch

caffemodel2pytorch.initialize('./caffe-oicr/src/caffe/proto/caffe.proto') # needs to be called explicitly for these porting scenarios to enable caffe.proto.caffe_pb2 variable

caffemodel2pytorch.set_mode_gpu()

caffemodel2pytorch.modules['GlobalSumPooling'] = lambda param: lambda pred: pred.sum(dim = 0, keepdim = True)

caffemodel2pytorch.modules['MulticlassCrossEntropyLoss'] = lambda param: lambda pred, labels, eps = 1e-6: F.binary_cross_entropy(pred.clamp(eps, 1 - eps), labels)

caffemodel2pytorch.modules['data'] = lambda param: __import__('roi_data_layer.layer').layer.RoIDataLayer() # wrapping a PyCaffe layer

caffemodel2pytorch.modules['OICRLayer'] = lambda param: OICRLayer # wrapping a PyTorch function

caffemodel2pytorch.modules['WeightedSoftmaxWithLoss'] = lambda param: WeightedSoftmaxWithLoss

caffemodel2pytorch.modules['ReLU'] = lambda param: torch.nn.ReLU(inplace = True) # wrapping a PyTorch module

caffemodel2pytorch.modules['ROIPooling'] = lambda param: lambda input, rois: RoiPooling(param['pooled_h'], param['pooled_w'], param['spatial_scale'])(input, rois) # wrapping a PyTorch autograd function

def WeightedSoftmaxWithLoss(prob, labels_ic, cls_loss_weights, eps = 1e-12):

	loss = -cls_loss_weights * F.log_softmax(prob, dim = -1).gather(-1, labels_ic.long().unsqueeze(-1)).squeeze(-1)

	valid_sum = cls_loss_weights.gt(eps).float().sum()

	return loss.sum() / (loss.numel() if valid_sum == 0 else valid_sum)

def OICRLayer(boxes, cls_prob, im_labels, cfg_TRAIN_FG_THRESH = 0.5):

    cls_prob = (cls_prob if cls_prob.size(-1) == im_labels.size(-1) else cls_prob[..., 1:]).clone()

    boxes = boxes[..., 1:]

    gt_boxes, gt_classes, gt_scores = [], [], []

    for i in im_labels.eq(1).nonzero()[:, 1]:

        max_index = int(cls_prob[:, i].max(dim = 0)[1])

        gt_boxes.append(boxes[max_index])

        gt_classes.append(int(i) + 1)

        gt_scores.append(float(cls_prob[max_index, i]))

        cls_prob[max_index] = 0

    max_overlaps, gt_assignment = overlap(boxes, torch.stack(gt_boxes)).max(dim = 1)

    return gt_assignment.new(gt_classes)[gt_assignment] * (max_overlaps > cfg_TRAIN_FG_THRESH).type_as(gt_assignment), max_overlaps.new(gt_scores)[gt_assignment]

class RoiPooling(torch.autograd.Function):

	CUDA_NUM_THREADS = 1024

	GET_BLOCKS = staticmethod(lambda N: (N + RoiPooling.CUDA_NUM_THREADS - 1) // RoiPooling.CUDA_NUM_THREADS)

	Stream = collections.namedtuple('Stream', ['ptr'])

	kernel_forward = b'''

	#define FLT_MAX 340282346638528859811704183484516925440.0f

	typedef float Dtype;

	#define CUDA_KERNEL_LOOP(i, n) for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < (n); i += blockDim.x * gridDim.x)

	extern "C"

	__global__ void ROIPoolForward(const int nthreads, const Dtype* bottom_data,

		const Dtype spatial_scale, const int channels, const int height,

		const int width, const int pooled_height, const int pooled_width,

		const Dtype* bottom_rois, Dtype* top_data, int* argmax_data) {

	  CUDA_KERNEL_LOOP(index, nthreads) { 

		// (n, c, ph, pw) is an element in the pooled output

		int pw = index % pooled_width;

		int ph = (index / pooled_width) % pooled_height;

		int c = (index / pooled_width / pooled_height) % channels;

		int n = index / pooled_width / pooled_height / channels;

		bottom_rois += n * 5;

		int roi_batch_ind = bottom_rois[0];

		int roi_start_w = round(bottom_rois[1] * spatial_scale);

		int roi_start_h = round(bottom_rois[2] * spatial_scale);

		int roi_end_w = round(bottom_rois[3] * spatial_scale);

		int roi_end_h = round(bottom_rois[4] * spatial_scale);

		// Force malformed ROIs to be 1x1

		int roi_width = max(roi_end_w - roi_start_w + 1, 1);

		int roi_height = max(roi_end_h - roi_start_h + 1, 1);

		Dtype bin_size_h = static_cast(roi_height)

						   / static_cast(pooled_height);

		Dtype bin_size_w = static_cast(roi_width)

						   / static_cast(pooled_width);

		int hstart = static_cast(floor(static_cast(ph)

											* bin_size_h));

		int wstart = static_cast(floor(static_cast(pw)

											* bin_size_w));

		int hend = static_cast(ceil(static_cast(ph + 1)

										 * bin_size_h));

		int wend = static_cast(ceil(static_cast(pw + 1)

										 * bin_size_w));

		// Add roi offsets and clip to input boundaries

		hstart = min(max(hstart + roi_start_h, 0), height);

		hend = min(max(hend + roi_start_h, 0), height);

		wstart = min(max(wstart + roi_start_w, 0), width);

		wend = min(max(wend + roi_start_w, 0), width);

		bool is_empty = (hend <= hstart) || (wend <= wstart);

		// Define an empty pooling region to be zero

		Dtype maxval = is_empty ? 0 : -FLT_MAX;

		// If nothing is pooled, argmax = -1 causes nothing to be backprop'd

		int maxidx = -1;

		bottom_data += (roi_batch_ind * channels + c) * height * width;

		for (int h = hstart; h < hend; ++h) {

		  for (int w = wstart; w < wend; ++w) {

			int bottom_index = h * width + w;

			if (bottom_data[bottom_index] > maxval) {

			  maxval = bottom_data[bottom_index];

			  maxidx = bottom_index;

			}

		  }

		}

		top_data[index] = maxval;

		argmax_data[index] = maxidx;

	  }

	}

	'''

	kernel_backward = b'''

	typedef float Dtype;

	#define CUDA_KERNEL_LOOP(i, n) for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < (n); i += blockDim.x * gridDim.x)

	extern "C"

	__global__ void ROIPoolBackward(const int nthreads, const Dtype* top_diff,

		const int* argmax_data, const int num_rois, const Dtype spatial_scale,

		const int channels, const int height, const int width,

		const int pooled_height, const int pooled_width, Dtype* bottom_diff,

		const Dtype* bottom_rois) {

	  CUDA_KERNEL_LOOP(index, nthreads) {

		// (n, c, h, w) coords in bottom data

		int w = index % width;

		int h = (index / width) % height;

		int c = (index / width / height) % channels;

		int n = index / width / height / channels;

		Dtype gradient = 0;

		// Accumulate gradient over all ROIs that pooled this element

		for (int roi_n = 0; roi_n < num_rois; ++roi_n) {

		  const Dtype* offset_bottom_rois = bottom_rois + roi_n * 5;

		  int roi_batch_ind = offset_bottom_rois[0];

		  // Skip if ROI's batch index doesn't match n

		  if (n != roi_batch_ind) {

			continue;

		  }

		  int roi_start_w = round(offset_bottom_rois[1] * spatial_scale);

		  int roi_start_h = round(offset_bottom_rois[2] * spatial_scale);

		  int roi_end_w = round(offset_bottom_rois[3] * spatial_scale);

		  int roi_end_h = round(offset_bottom_rois[4] * spatial_scale);

		  // Skip if ROI doesn't include (h, w)

		  const bool in_roi = (w >= roi_start_w && w <= roi_end_w &&

							   h >= roi_start_h && h <= roi_end_h);

		  if (!in_roi) {

			continue;

		  }

		  int offset = (roi_n * channels + c) * pooled_height * pooled_width;

		  const Dtype* offset_top_diff = top_diff + offset;

		  const int* offset_argmax_data = argmax_data + offset;

		  // Compute feasible set of pooled units that could have pooled

		  // this bottom unit

		  // Force malformed ROIs to be 1x1

		  int roi_width = max(roi_end_w - roi_start_w + 1, 1);

		  int roi_height = max(roi_end_h - roi_start_h + 1, 1);

		  Dtype bin_size_h = static_cast(roi_height)

							 / static_cast(pooled_height);

		  Dtype bin_size_w = static_cast(roi_width)

							 / static_cast(pooled_width);

		  int phstart = floor(static_cast(h - roi_start_h) / bin_size_h);

		  int phend = ceil(static_cast(h - roi_start_h + 1) / bin_size_h);

		  int pwstart = floor(static_cast(w - roi_start_w) / bin_size_w);

		  int pwend = ceil(static_cast(w - roi_start_w + 1) / bin_size_w);

		  phstart = min(max(phstart, 0), pooled_height);

		  phend = min(max(phend, 0), pooled_height);

		  pwstart = min(max(pwstart, 0), pooled_width);

		  pwend = min(max(pwend, 0), pooled_width);

		  for (int ph = phstart; ph < phend; ++ph) {

			for (int pw = pwstart; pw < pwend; ++pw) {

			  if (offset_argmax_data[ph * pooled_width + pw] == (h * width + w)) {

				gradient += offset_top_diff[ph * pooled_width + pw];

			  }

			}

		  }

		}

		bottom_diff[index] = gradient;

	  }

	}

	'''

	cupy_init = cupy.array([])

	compiled_forward = cupy.cuda.compiler.compile_with_cache(kernel_forward).get_function('ROIPoolForward')

	compiled_backward = cupy.cuda.compiler.compile_with_cache(kernel_backward).get_function('ROIPoolBackward')

	def __init__(self, pooled_height, pooled_width, spatial_scale):

		self.pooled_height = pooled_height

		self.pooled_width = pooled_width

		self.spatial_scale = spatial_scale

	def forward(self, images, rois):

		output = torch.cuda.FloatTensor(len(rois), images.size(1) * self.pooled_height * self.pooled_width)

		self.argmax = torch.cuda.IntTensor(output.size()).fill_(-1)

		self.input_size = images.size()

		self.save_for_backward(rois)

		RoiPooling.compiled_forward(grid = (RoiPooling.GET_BLOCKS(output.numel()), 1, 1), block = (RoiPooling.CUDA_NUM_THREADS, 1, 1), args=[

			output.numel(), images.data_ptr(), cupy.float32(self.spatial_scale), self.input_size[-3], self.input_size[-2], self.input_size[-1],

			self.pooled_height, self.pooled_width, rois.data_ptr(), output.data_ptr(), self.argmax.data_ptr()

			  ], stream=RoiPooling.Stream(ptr=torch.cuda.current_stream().cuda_stream))

		return output

	def backward(self, grad_output):

		rois, = self.saved_tensors

		grad_input = torch.cuda.FloatTensor(*self.input_size).zero_()

		RoiPooling.compiled_backward(grid = (RoiPooling.GET_BLOCKS(grad_input.numel()), 1, 1), block = (RoiPooling.CUDA_NUM_THREADS, 1, 1), args=[

			grad_input.numel(), grad_output.data_ptr(), self.argmax.data_ptr(), len(rois), cupy.float32(self.spatial_scale), self.input_size[-3],

			self.input_size[-2], self.input_size[-1], self.pooled_height, self.pooled_width, grad_input.data_ptr(), rois.data_ptr()

			  ], stream=RoiPooling.Stream(ptr=torch.cuda.current_stream().cuda_stream))

		return grad_input, None

		

def overlap(box1, box2):

    b1, b2 = box1.t().contiguous(), box2.t().contiguous()

    xx1 = torch.max(b1[0].unsqueeze(1), b2[0].unsqueeze(0))

    yy1 = torch.max(b1[1].unsqueeze(1), b2[1].unsqueeze(0))

    xx2 = torch.min(b1[2].unsqueeze(1), b2[2].unsqueeze(0))

    yy2 = torch.min(b1[3].unsqueeze(1), b2[3].unsqueeze(0))

    inter = area(x1 = xx1, y1 = yy1, x2 = xx2, y2 = yy2)

    return inter / (area(b1.t()).unsqueeze(1) + area(b2.t()).unsqueeze(0) - inter)

def area(boxes = None, x1 = None, y1 = None, x2 = None, y2 = None):

    return (boxes[..., 3] - boxes[..., 1] + 1) * (boxes[..., 2] - boxes[..., 0] + 1) if boxes is not None else (x2 - x1 + 1).clamp(min = 0) * (y2 - y1 + 1).clamp(min = 0)

```

**Note:** I've also had to replace `utils/bbox.pyx` by `utils/cython_bbox.pyx` and `utils/nms.pyx` by `utils/cython_nms.pyx` in `lib/setup.py` to deal with some `setup.py` issues.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vadimkantorov/caffemodel2pytorch

Awesome Lists containing this project

README