{"id":30725336,"url":"https://github.com/guerrantif/efficientconvolution","last_synced_at":"2025-09-03T12:13:09.995Z","repository":{"id":46126513,"uuid":"356292078","full_name":"guerrantif/EfficientConvolution","owner":"guerrantif","description":"Implementation of an efficient convolution between 3D tensors and 4D tensors.","archived":false,"fork":false,"pushed_at":"2021-11-11T21:08:01.000Z","size":8942,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2023-09-23T11:22:35.308Z","etag":null,"topics":["convolution","cpp","high-performance","image-convolution","kernel","modern-cpp","multithreading","parallel-computing","parallel-programming","thread"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/guerrantif.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-04-09T14:05:54.000Z","updated_at":"2022-09-16T17:46:35.000Z","dependencies_parsed_at":"2022-09-23T06:40:15.382Z","dependency_job_id":null,"html_url":"https://github.com/guerrantif/EfficientConvolution","commit_stats":null,"previous_names":["guerrantif/efficientconvolution"],"tags_count":0,"template":null,"template_full_name":null,"purl":"pkg:github/guerrantif/EfficientConvolution","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guerrantif%2FEfficientConvolution","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guerrantif%2FEfficientConvolution/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guerrantif%2FEfficientConvolution/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guerrantif%2FEfficientConvolution/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/guerrantif","download_url":"https://codeload.github.com/guerrantif/EfficientConvolution/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guerrantif%2FEfficientConvolution/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273439984,"owners_count":25106080,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-03T02:00:09.631Z","response_time":76,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["convolution","cpp","high-performance","image-convolution","kernel","modern-cpp","multithreading","parallel-computing","parallel-programming","thread"],"created_at":"2025-09-03T12:13:03.179Z","updated_at":"2025-09-03T12:13:09.986Z","avatar_url":"https://github.com/guerrantif.png","language":"C++","readme":"# Efficient Convolution\nImplementation of an efficient convolution algorithm between 3D and/or 4D tensors.\n\n\u003e For further details we suggest to refer to the [slides][slides].\n\n## Abstract\nThe aim of this project is to implement the efficient Direct Convolution algorithm based on the paper [High Performance Zero-Memory Overhead Direct Convolutions][main-paper] by Zhang et al.\nThe main problem when performing convolutions in deep neural network is that, usually, those higly specialized algorithms trade space for time, incurring in an important memory overhead. The direct convolution could allow us to reduce the memory overhead while keeping performances high.\n\n\n---\n**Table of contents**\n\n* [Tensor class](#tensor-class)\n  * [Attributes](#class-attributes)\n  * [Constructors](#class-constructors)\n  * [Operators-at](#operators-at)\n  * [Convolve threads](#convolve-threads)\n  * [Convolution](#convolution)\n* [Tests](#tests)\n* [Directory structure](#directory-structure)\n* [Documentation and references](#documentation-and-references)\n* [Info](#info)\n\n---\n\n## Tensor class\n\nThe project is entirely base on the `Tensor` class, which allows us to handle 3D and 4D tensor. Those tensors will be used as input images and kernels for the convolution operation.\n\n### Class attributes\n```c++\nprivate:\n   // Main class members\n   T* data;\n   uint32_t nElements;\n   uint32_t nChannels;\n   uint32_t height;\n   uint32_t width;\n   // Secondary class members\n   uint32_t size;\n   std::vector\u003cuint32_t\u003e shape;\n   bool valid;\n```\n![](/img/tensor_to_data.png)\n\n### Class constructors\n\n```c++\npublic:\n   // Default constructor\n   Tensor();\n   // 3D constructor\n   Tensor(const uint32_t\u0026 nChannels_, const uint32_t\u0026 height_, const uint32_t\u0026 width_, const tensor::init\u0026 init);\n   // 4D constructor\n   Tensor(const uint32_t\u0026 nElements_, const uint32_t\u0026 nChannels_, const uint32_t\u0026 height_, const uint32_t\u0026 width_, const tensor::init\u0026 init);\n   // Copy constructor\n   Tensor(const Tensor\u003cT\u003e\u0026 other);\n   // Move constructor\n   Tensor(Tensor\u003cT\u003e\u0026\u0026 other);\n```\n\n### Operators-at\nThe convolution operation is mainly based on the `at()` (`_at()`) operator, that is used in the inner loop of the `convolveThread` method itself and provides high flexibility due to its overloading.\n\nThe operator comes in a `public` interface and in a `private` one. The former is more reliable and error-free, while the latter is used for performance issues.\n```c++\npublic:\n   // 3D operator at() const\n   const T\u0026 at(const int32_t\u0026 C_idx, const int32_t\u0026 H_idx, const int32_t\u0026 W_idx) const;\n   // 3D operator at() non-const\n   T\u0026 at(const int32_t\u0026 C_idx, const int32_t\u0026 H_idx, const int32_t\u0026 W_idx);\n\n   // 4D operator at() const\n   const T\u0026 at(const int32_t\u0026 E_idx, const int32_t\u0026 C_idx, const int32_t\u0026 H_idx, const int32_t\u0026 W_idx) const;\n   // 4D operator at() non-const\n   T\u0026 at(const int32_t\u0026 E_idx, const int32_t\u0026 C_idx, const int32_t\u0026 H_idx, const int32_t\u0026 W_idx);\n```\n```c++\nprivate:\n   // 3D operator _at() const\n   const T\u0026 _at(const int32_t\u0026 C_idx, const int32_t\u0026 H_idx, const int32_t\u0026 W_idx) const;\n   // 3D operator _at() non-const\n   T\u0026 _at(const int32_t\u0026 C_idx, const int32_t\u0026 H_idx, const int32_t\u0026 W_idx);\n\n   // 4D operator _at() const\n   const T\u0026 _at(const int32_t\u0026 E_idx, const int32_t\u0026 C_idx, const int32_t\u0026 H_idx, const int32_t\u0026 W_idx) const;\n   // 4D operator _at() non-const\n   T\u0026 _at(const int32_t\u0026 E_idx, const int32_t\u0026 C_idx, const int32_t\u0026 H_idx, const int32_t\u0026 W_idx);\n\n```\n\n### Convolve threads\nThe convolution operation is parallelized using several threads, each of them performing the convolution on different section of the original `data` pointers (input and kernel tensors). The involved method is `convolveThread` and is implemented in the fashion exposed in the [original paper](main-paper).\n\n### Convolution\nOnce the `convolveThread` operation is implemented, one can decide the dimension in which to parallelize, whether the number of elments, the number of channels or the height.\n![](/img/convolveThread.png)\n```c++\npublic:\n   // Convolution operator (parallel) - dimension: output height\n   Tensor\u003cT\u003e\u0026 convolveParallelHo(const Tensor\u003cT\u003e\u0026 kernel, const int32_t stride, const int32_t padding, const uint32_t nThreads) const;\n   // Convolution operator (parallel) - dimension: output nChannels\n   Tensor\u003cT\u003e\u0026 convolveParallelCo(const Tensor\u003cT\u003e\u0026 kernel, const int32_t stride, const int32_t padding, const uint32_t nThreads) const;\n   // Convolution operator (parallel) - dimension: output nElements\n   Tensor\u003cT\u003e\u0026 convolveParallelEo(const Tensor\u003cT\u003e\u0026 kernel, const int32_t stride, const int32_t padding, const uint32_t nThreads) const;\n   \n   // Convolution Naive (sequential)\n   Tensor\u003cT\u003e\u0026 convolveNaive(const Tensor\u003cT\u003e\u0026 kernel, const int32_t stride, const int32_t padding) const;\n\n   // Convolution operator that select automatically dimension for parallelization\n   Tensor\u003cT\u003e\u0026 convolve(const Tensor\u003cT\u003e\u0026 kernel, const int32_t stride, const int32_t padding, const uint32_t nThreads) const;\n   // Convolution operator that select automatically dimension for parallelization and number of threads\n   Tensor\u003cT\u003e\u0026 convolve(const Tensor\u003cT\u003e\u0026 kernel, const int32_t stride, const int32_t padding) const;\n\n```\n\n\n## Tests\n\n### Speed-up: w.r.t. Naive impl. for different thread number\n![](/img/results1.png)\n\n### Speed-up: w.r.t. Naive impl. for 8 threads and different inputs\n![](/img/results2.png)\n\n\n\n## Directory structure\n\n```\n.\n├── bin\n│   ├── benchmark_nopt\n│   └── benchmark_opt\n├── build\n│   ├── benchmark.o\n│   ├── Chronometer.o\n│   ├── Statistics.o\n│   └── Tensor.o\n├── doc\n│   └── todo.txt\n├── include\n│   ├── Chronometer.hh\n│   ├── Statistics.hh\n│   └── Tensor.hh\n├── src\n│   ├── Chronometer.cpp\n│   ├── Statistics.cpp\n│   └── Tensor.cpp\n├── test\n│   ├── benchmark.cpp\n│   └── testTensor.cpp\n├── Makefile\n└── README.md\n```\n\n## Documentation and references\n\n[\\[1\\]][main-paper] Zhang, J., Franchetti, F. \u0026amp; Low, T.M.. (2018). High Performance Zero-Memory Overhead Direct Convolutions. *Proceedings of the 35th International Conference on Machine Learning*, in *Proceedings of Machine Learning Research* 80:5776-5785\n\n[\\[2\\]][concurrency-book] Williams, A. (2019). C++ concurrency in action (Second edition). *Manning Publications Co.*\n\n\n## Info\n\nAuthors: \n\n- Filippo Guerranti* \\\u003cfilippo.guerranti@student.unisi.it\\\u003e\n- Mirco Mannino* \\\u003cmirco.mannino@student.unisi.it\\\u003e\n\n\u003e \\* Equal contribution.\n\nWe are M.Sc. students in Computer and Automation Engineering at [University of Siena][unisi], [Department of Information Engineering and Mathematical Sciences][diism]. This project is inherent the Design of Applications, Systems and Servises held by prof. [Sandro Bartolini][bartolini].\n\nFor any suggestion or doubts please contact one us by email.\n\nLink to this project: [https://github.com/filippoguerranti/efficientconvolution][project]\n\n\n\n[main-paper]: http://proceedings.mlr.press/v80/zhang18d/zhang18d.pdf\n[concurrency-book]: https://www.manning.com/books/c-plus-plus-concurrency-in-action-second-edition\n\n[slides]: https://github.com/filippoguerranti/EfficientConvolution/blob/main/doc/EfficientConvolution.pdf\n[project]: https://github.com/filippoguerranti/efficientconvolution\n[unisi]: https://www.unisi.it/\n[diism]: https://www.diism.unisi.it/it\n[bartolini]: http://frankie.dii.unisi.it/sandroHome/\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguerrantif%2Fefficientconvolution","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fguerrantif%2Fefficientconvolution","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguerrantif%2Fefficientconvolution/lists"}