Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/galaxies99/inception-cuda

CUDA Implementation of Inception
https://github.com/galaxies99/inception-cuda

cuda inception-v3

Last synced: 12 days ago
JSON representation

CUDA Implementation of Inception

Host: GitHub
URL: https://github.com/galaxies99/inception-cuda
Owner: Galaxies99
License: mit
Created: 2021-12-15T02:08:46.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2022-01-01T08:37:27.000Z (over 2 years ago)
Last Synced: 2024-02-26T05:35:36.039Z (4 months ago)
Topics: cuda, inception-v3
Language: Cuda
Homepage:
Size: 3.61 MB
Stars: 1
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Lists

awesome-cs - @Galaxies99 @KoalaYan @zhao-hr, 2021 Fall

README

        # Inception-v3 Inference Booster

[[Report](assets/inceptionv3-inference-booster.pdf)]

**Authors**: [Hongjie Fang](https://github.com/galaxies99/), [Peishen Yan](https://github.com/koalayan/), [Haoran Zhao](https://github.com/zhao-hr/).

This is the inference booster of the InceptionV3[1] model. Features includes:

- Implementation of convolution in CPU, CUDA, CUDNN.

- Optimization of convolution (implicit im2col and tilling method).

- Implementation of pooling and FC layer in CPU, CUDA, CUDNN.

- Optimization of the FC layer using tilling method.

- Implementation of the full Inception-v3 network in CPU, CUDA and CUDNN.

- Pytorch inference implementation[2] of Inception-v3 network (only for debug use).

- ONNX-to-JSON formatter for Inception-v3 onnx model.

This is also the final project of course "CS433: Parallel and Distributed Computing" of Shanghai Jiao Tong University, taught by Prof. Xiaoyao Liang.

## Usage

Compile the source codes.

```bash

cd src

make

cd ..

```

You may need to change the `nvcc` path in `src/makefile`. Different compile options are required for different architecture. We only provide compile options for our experiment architecture (Tesla V100, CUDA 10.2).

Download data from [Baidu Netdisk](https://pan.baidu.com/s/1u5jJfNBL9m8prtRMRHuj7Q) (Verify code: csov), and put it in the `data` folder under the root directory of the repository. Then, you can test the inception code using the given model, input and output.

```bash

cd test

./inception_main

cd ..

```

The experiment will run for approximately 10 minutes, which includes 5,000 inference experiments. Here are some experiment statistics.

| Implementation method | Average Inference Time | Max GPU occupation |

| :-: | :-: | :-: |

| CPU | ~180,000 ms | - |

| Our basic CUDA Implementaion | ~36,000 ms | **530 MB** |

| CUDNN | 102.594 ms | 750 MB |

| Our CUDA Implementation | **61.096 ms** | **530 MB** | 

The result show that our implementation is faster than the default implementation of CUDNN.

  

   Test result of our implementations Test result of our CUDNN implementations 

## Reference

[1] Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016;

[2] https://github.com/zt1112/pytorch_inceptionv3/blob/master/inception3.py.