Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/galaxies99/inception-cuda
CUDA Implementation of Inception
https://github.com/galaxies99/inception-cuda
cuda inception-v3
Last synced: about 1 month ago
JSON representation
CUDA Implementation of Inception
- Host: GitHub
- URL: https://github.com/galaxies99/inception-cuda
- Owner: Galaxies99
- License: mit
- Created: 2021-12-15T02:08:46.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2022-01-01T08:37:27.000Z (almost 3 years ago)
- Last Synced: 2024-06-12T08:57:13.991Z (6 months ago)
- Topics: cuda, inception-v3
- Language: Cuda
- Homepage:
- Size: 3.61 MB
- Stars: 1
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-cs - @Galaxies99 @KoalaYan @zhao-hr, 2021 Fall
README
# Inception-v3 Inference Booster
[[Report](assets/inceptionv3-inference-booster.pdf)]
**Authors**: [Hongjie Fang](https://github.com/galaxies99/), [Peishen Yan](https://github.com/koalayan/), [Haoran Zhao](https://github.com/zhao-hr/).
This is the inference booster of the InceptionV3[1] model. Features includes:
- Implementation of convolution in CPU, CUDA, CUDNN.
- Optimization of convolution (implicit im2col and tilling method).
- Implementation of pooling and FC layer in CPU, CUDA, CUDNN.
- Optimization of the FC layer using tilling method.
- Implementation of the full Inception-v3 network in CPU, CUDA and CUDNN.
- Pytorch inference implementation[2] of Inception-v3 network (only for debug use).
- ONNX-to-JSON formatter for Inception-v3 onnx model.This is also the final project of course "CS433: Parallel and Distributed Computing" of Shanghai Jiao Tong University, taught by Prof. Xiaoyao Liang.
## Usage
Compile the source codes.
```bash
cd src
make
cd ..
```You may need to change the `nvcc` path in `src/makefile`. Different compile options are required for different architecture. We only provide compile options for our experiment architecture (Tesla V100, CUDA 10.2).
Download data from [Baidu Netdisk](https://pan.baidu.com/s/1u5jJfNBL9m8prtRMRHuj7Q) (Verify code: csov), and put it in the `data` folder under the root directory of the repository. Then, you can test the inception code using the given model, input and output.
```bash
cd test
./inception_main
cd ..
```The experiment will run for approximately 10 minutes, which includes 5,000 inference experiments. Here are some experiment statistics.
| Implementation method | Average Inference Time | Max GPU occupation |
| :-: | :-: | :-: |
| CPU | ~180,000 ms | - |
| Our basic CUDA Implementaion | ~36,000 ms | **530 MB** |
| CUDNN | 102.594 ms | 750 MB |
| Our CUDA Implementation | **61.096 ms** | **530 MB** |The result show that our implementation is faster than the default implementation of CUDNN.
Test result of our implementations Test result of our CUDNN implementations## Reference
[1] Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016;
[2] https://github.com/zt1112/pytorch_inceptionv3/blob/master/inception3.py.