https://github.com/guerrantif/efficientconvolution

Implementation of an efficient convolution between 3D tensors and 4D tensors.
https://github.com/guerrantif/efficientconvolution

convolution cpp high-performance image-convolution kernel modern-cpp multithreading parallel-computing parallel-programming thread

Last synced: 5 months ago
JSON representation

Implementation of an efficient convolution between 3D tensors and 4D tensors.

Host: GitHub
URL: https://github.com/guerrantif/efficientconvolution
Owner: guerrantif
Created: 2021-04-09T14:05:54.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2021-11-11T21:08:01.000Z (over 4 years ago)
Last Synced: 2023-09-23T11:22:35.308Z (over 2 years ago)
Topics: convolution, cpp, high-performance, image-convolution, kernel, modern-cpp, multithreading, parallel-computing, parallel-programming, thread
Language: C++
Homepage:
Size: 8.53 MB
Stars: 2
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Efficient Convolution

Implementation of an efficient convolution algorithm between 3D and/or 4D tensors.

> For further details we suggest to refer to the [slides][slides].

## Abstract

The aim of this project is to implement the efficient Direct Convolution algorithm based on the paper [High Performance Zero-Memory Overhead Direct Convolutions][main-paper] by Zhang et al.

The main problem when performing convolutions in deep neural network is that, usually, those higly specialized algorithms trade space for time, incurring in an important memory overhead. The direct convolution could allow us to reduce the memory overhead while keeping performances high.

---

**Table of contents**

* [Tensor class](#tensor-class)

  * [Attributes](#class-attributes)

  * [Constructors](#class-constructors)

  * [Operators-at](#operators-at)

  * [Convolve threads](#convolve-threads)

  * [Convolution](#convolution)

* [Tests](#tests)

* [Directory structure](#directory-structure)

* [Documentation and references](#documentation-and-references)

* [Info](#info)

---

## Tensor class

The project is entirely base on the `Tensor` class, which allows us to handle 3D and 4D tensor. Those tensors will be used as input images and kernels for the convolution operation.

### Class attributes

```c++

private:

   // Main class members

   T* data;

   uint32_t nElements;

   uint32_t nChannels;

   uint32_t height;

   uint32_t width;

   // Secondary class members

   uint32_t size;

   std::vector shape;

   bool valid;

```

![](/img/tensor_to_data.png)

### Class constructors

```c++

public:

   // Default constructor

   Tensor();

   // 3D constructor

   Tensor(const uint32_t& nChannels_, const uint32_t& height_, const uint32_t& width_, const tensor::init& init);

   // 4D constructor

   Tensor(const uint32_t& nElements_, const uint32_t& nChannels_, const uint32_t& height_, const uint32_t& width_, const tensor::init& init);

   // Copy constructor

   Tensor(const Tensor& other);

   // Move constructor

   Tensor(Tensor&& other);

```

### Operators-at

The convolution operation is mainly based on the `at()` (`_at()`) operator, that is used in the inner loop of the `convolveThread` method itself and provides high flexibility due to its overloading.

The operator comes in a `public` interface and in a `private` one. The former is more reliable and error-free, while the latter is used for performance issues.

```c++

public:

   // 3D operator at() const

   const T& at(const int32_t& C_idx, const int32_t& H_idx, const int32_t& W_idx) const;

   // 3D operator at() non-const

   T& at(const int32_t& C_idx, const int32_t& H_idx, const int32_t& W_idx);

   // 4D operator at() const

   const T& at(const int32_t& E_idx, const int32_t& C_idx, const int32_t& H_idx, const int32_t& W_idx) const;

   // 4D operator at() non-const

   T& at(const int32_t& E_idx, const int32_t& C_idx, const int32_t& H_idx, const int32_t& W_idx);

```

```c++

private:

   // 3D operator _at() const

   const T& _at(const int32_t& C_idx, const int32_t& H_idx, const int32_t& W_idx) const;

   // 3D operator _at() non-const

   T& _at(const int32_t& C_idx, const int32_t& H_idx, const int32_t& W_idx);

   // 4D operator _at() const

   const T& _at(const int32_t& E_idx, const int32_t& C_idx, const int32_t& H_idx, const int32_t& W_idx) const;

   // 4D operator _at() non-const

   T& _at(const int32_t& E_idx, const int32_t& C_idx, const int32_t& H_idx, const int32_t& W_idx);

```

### Convolve threads

The convolution operation is parallelized using several threads, each of them performing the convolution on different section of the original `data` pointers (input and kernel tensors). The involved method is `convolveThread` and is implemented in the fashion exposed in the [original paper](main-paper).

### Convolution

Once the `convolveThread` operation is implemented, one can decide the dimension in which to parallelize, whether the number of elments, the number of channels or the height.

![](/img/convolveThread.png)

```c++

public:

   // Convolution operator (parallel) - dimension: output height

   Tensor& convolveParallelHo(const Tensor& kernel, const int32_t stride, const int32_t padding, const uint32_t nThreads) const;

   // Convolution operator (parallel) - dimension: output nChannels

   Tensor& convolveParallelCo(const Tensor& kernel, const int32_t stride, const int32_t padding, const uint32_t nThreads) const;

   // Convolution operator (parallel) - dimension: output nElements

   Tensor& convolveParallelEo(const Tensor& kernel, const int32_t stride, const int32_t padding, const uint32_t nThreads) const;

   

   // Convolution Naive (sequential)

   Tensor& convolveNaive(const Tensor& kernel, const int32_t stride, const int32_t padding) const;

   // Convolution operator that select automatically dimension for parallelization

   Tensor& convolve(const Tensor& kernel, const int32_t stride, const int32_t padding, const uint32_t nThreads) const;

   // Convolution operator that select automatically dimension for parallelization and number of threads

   Tensor& convolve(const Tensor& kernel, const int32_t stride, const int32_t padding) const;

```

## Tests

### Speed-up: w.r.t. Naive impl. for different thread number

![](/img/results1.png)

### Speed-up: w.r.t. Naive impl. for 8 threads and different inputs

![](/img/results2.png)

## Directory structure

```

.

├── bin

│   ├── benchmark_nopt

│   └── benchmark_opt

├── build

│   ├── benchmark.o

│   ├── Chronometer.o

│   ├── Statistics.o

│   └── Tensor.o

├── doc

│   └── todo.txt

├── include

│   ├── Chronometer.hh

│   ├── Statistics.hh

│   └── Tensor.hh

├── src

│   ├── Chronometer.cpp

│   ├── Statistics.cpp

│   └── Tensor.cpp

├── test

│   ├── benchmark.cpp

│   └── testTensor.cpp

├── Makefile

└── README.md

```

## Documentation and references

[\[1\]][main-paper] Zhang, J., Franchetti, F. & Low, T.M.. (2018). High Performance Zero-Memory Overhead Direct Convolutions. *Proceedings of the 35th International Conference on Machine Learning*, in *Proceedings of Machine Learning Research* 80:5776-5785

[\[2\]][concurrency-book] Williams, A. (2019). C++ concurrency in action (Second edition). *Manning Publications Co.*

## Info

Authors: 

- Filippo Guerranti* \

- Mirco Mannino* \

> \* Equal contribution.

We are M.Sc. students in Computer and Automation Engineering at [University of Siena][unisi], [Department of Information Engineering and Mathematical Sciences][diism]. This project is inherent the Design of Applications, Systems and Servises held by prof. [Sandro Bartolini][bartolini].

For any suggestion or doubts please contact one us by email.

Link to this project: [https://github.com/filippoguerranti/efficientconvolution][project]

[main-paper]: http://proceedings.mlr.press/v80/zhang18d/zhang18d.pdf

[concurrency-book]: https://www.manning.com/books/c-plus-plus-concurrency-in-action-second-edition

[slides]: https://github.com/filippoguerranti/EfficientConvolution/blob/main/doc/EfficientConvolution.pdf

[project]: https://github.com/filippoguerranti/efficientconvolution

[unisi]: https://www.unisi.it/

[diism]: https://www.diism.unisi.it/it

[bartolini]: http://frankie.dii.unisi.it/sandroHome/

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/guerrantif/efficientconvolution

Awesome Lists containing this project

README