{"id":13608733,"url":"https://github.com/galaxies99/inception-cuda","last_synced_at":"2025-04-12T17:32:40.593Z","repository":{"id":87785829,"uuid":"438464515","full_name":"Galaxies99/inception-cuda","owner":"Galaxies99","description":"CUDA Implementation of Inception","archived":false,"fork":false,"pushed_at":"2022-01-01T08:37:27.000Z","size":3787,"stargazers_count":2,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-11-07T14:42:00.494Z","etag":null,"topics":["cuda","inception-v3"],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Galaxies99.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-12-15T02:08:46.000Z","updated_at":"2024-10-16T19:41:17.000Z","dependencies_parsed_at":"2023-05-13T18:30:54.971Z","dependency_job_id":null,"html_url":"https://github.com/Galaxies99/inception-cuda","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Galaxies99%2Finception-cuda","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Galaxies99%2Finception-cuda/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Galaxies99%2Finception-cuda/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Galaxies99%2Finception-cuda/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Galaxies99","download_url":"https://codeload.github.com/Galaxies99/inception-cuda/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248605306,"owners_count":21132147,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","inception-v3"],"created_at":"2024-08-01T19:01:29.551Z","updated_at":"2025-04-12T17:32:35.585Z","avatar_url":"https://github.com/Galaxies99.png","language":"Cuda","funding_links":[],"categories":["资源清单"],"sub_categories":["CS4302 (原 CS433) - 并行与分布式计算"],"readme":"# Inception-v3 Inference Booster\n\n[[Report](assets/inceptionv3-inference-booster.pdf)]\n\n**Authors**: [Hongjie Fang](https://github.com/galaxies99/), [Peishen Yan](https://github.com/koalayan/), [Haoran Zhao](https://github.com/zhao-hr/).\n\nThis is the inference booster of the InceptionV3[1] model. Features includes:\n\n- Implementation of convolution in CPU, CUDA, CUDNN.\n- Optimization of convolution (implicit im2col and tilling method).\n- Implementation of pooling and FC layer in CPU, CUDA, CUDNN.\n- Optimization of the FC layer using tilling method.\n- Implementation of the full Inception-v3 network in CPU, CUDA and CUDNN.\n- Pytorch inference implementation[2] of Inception-v3 network (only for debug use).\n- ONNX-to-JSON formatter for Inception-v3 onnx model.\n\nThis is also the final project of course \"CS433: Parallel and Distributed Computing\" of Shanghai Jiao Tong University, taught by Prof. Xiaoyao Liang.\n\n## Usage\n\nCompile the source codes.\n\n```bash\ncd src\nmake\ncd ..\n```\n\nYou may need to change the `nvcc` path in `src/makefile`. Different compile options are required for different architecture. We only provide compile options for our experiment architecture (Tesla V100, CUDA 10.2).\n\nDownload data from [Baidu Netdisk](https://pan.baidu.com/s/1u5jJfNBL9m8prtRMRHuj7Q) (Verify code: csov), and put it in the `data` folder under the root directory of the repository. Then, you can test the inception code using the given model, input and output.\n\n```bash\ncd test\n./inception_main\ncd ..\n```\n\nThe experiment will run for approximately 10 minutes, which includes 5,000 inference experiments. Here are some experiment statistics.\n\n| Implementation method | Average Inference Time | Max GPU occupation |\n| :-: | :-: | :-: |\n| CPU | ~180,000 ms | - |\n| Our basic CUDA Implementaion | ~36,000 ms | **530 MB** |\n| CUDNN | 102.594 ms | 750 MB |\n| Our CUDA Implementation | **61.096 ms** | **530 MB** | \n\nThe result show that our implementation is faster than the default implementation of CUDNN.\n\n\u003ctable\u003e\n  \u003ctr\u003e\u003ctd\u003e\u003cimg src='assets/images/test_our.png' width=480px\u003e\u003c/td\u003e\u003ctd\u003e\u003cimg src='assets/images/test_cudnn.png' width = 480px\u003e\u003c/td\u003e\u003c/tr\u003e\n  \u003ctr\u003e\u003ctd align=\"center\"\u003e Test result of our implementations\u003c/td\u003e\u003ctd align=\"center\"\u003e Test result of our CUDNN implementations \u003c/td\u003e\u003c/tr\u003e\n\u003c/table\u003e\n\n## Reference\n\n[1] Szegedy, Christian, et al. \"Rethinking the inception architecture for computer vision.\" Proceedings of the IEEE conference on computer vision and pattern recognition. 2016;\n\n[2] https://github.com/zt1112/pytorch_inceptionv3/blob/master/inception3.py.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgalaxies99%2Finception-cuda","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgalaxies99%2Finception-cuda","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgalaxies99%2Finception-cuda/lists"}