{"id":13443293,"url":"https://github.com/godweiyang/NN-CUDA-Example","last_synced_at":"2025-03-20T16:30:55.154Z","repository":{"id":39092901,"uuid":"349084040","full_name":"godweiyang/NN-CUDA-Example","owner":"godweiyang","description":"Several simple examples for popular neural network toolkits calling custom CUDA operators.","archived":false,"fork":false,"pushed_at":"2021-04-29T10:10:08.000Z","size":123,"stargazers_count":1415,"open_issues_count":9,"forks_count":196,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-03-19T02:47:49.751Z","etag":null,"topics":["cpp","cuda","neural-network","python","pytorch","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/godweiyang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-03-18T13:27:51.000Z","updated_at":"2025-03-18T23:12:23.000Z","dependencies_parsed_at":"2022-07-11T09:52:01.270Z","dependency_job_id":null,"html_url":"https://github.com/godweiyang/NN-CUDA-Example","commit_stats":null,"previous_names":[],"tags_count":0,"template":true,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/godweiyang%2FNN-CUDA-Example","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/godweiyang%2FNN-CUDA-Example/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/godweiyang%2FNN-CUDA-Example/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/godweiyang%2FNN-CUDA-Example/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/godweiyang","download_url":"https://codeload.github.com/godweiyang/NN-CUDA-Example/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244649771,"owners_count":20487488,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","cuda","neural-network","python","pytorch","tensorflow"],"created_at":"2024-07-31T03:01:58.742Z","updated_at":"2025-03-20T16:30:55.134Z","avatar_url":"https://github.com/godweiyang.png","language":"Python","readme":"# Neural Network CUDA Example\n![logo](./image/logo.png)\n\nSeveral simple examples for neural network toolkits (PyTorch, TensorFlow, etc.) calling custom CUDA operators.\n\nWe provide several ways to compile the CUDA kernels and their cpp wrappers, including jit, setuptools and cmake.\n\nWe also provide several python codes to call the CUDA kernels, including kernel time statistics and model training.\n\n*For more accurate time statistics, you'd best use **nvprof** or **nsys** to run the code.*\n\n## Environments\n* NVIDIA Driver: 418.116.00\n* CUDA: 11.0\n* Python: 3.7.3\n* PyTorch: 1.7.0+cu110\n* TensorFlow: 2.4.1\n* CMake: 3.16.3\n* Ninja: 1.10.0\n* GCC: 8.3.0\n\n*Cannot ensure successful running in other environments.*\n\n## Code structure\n```shell\n├── include\n│   └── add2.h # header file of add2 cuda kernel\n├── kernel\n│   └── add2_kernel.cu # add2 cuda kernel\n├── pytorch\n│   ├── add2_ops.cpp # torch wrapper of add2 cuda kernel\n│   ├── time.py # time comparison of cuda kernel and torch\n│   ├── train.py # training using custom cuda kernel\n│   ├── setup.py\n│   └── CMakeLists.txt\n├── tensorflow\n│   ├── add2_ops.cpp # tensorflow wrapper of add2 cuda kernel\n│   ├── time.py # time comparison of cuda kernel and tensorflow\n│   ├── train.py # training using custom cuda kernel\n│   └── CMakeLists.txt\n├── LICENSE\n└── README.md\n```\n\n## PyTorch\n### Compile cpp and cuda\n**JIT**  \nDirectly run the python code.\n\n**Setuptools**  \n```shell\npython3 pytorch/setup.py install\n```\n\n**CMake**  \n```shell\nmkdir build\ncd build\ncmake ../pytorch\nmake\n```\n\n### Run python\n**Compare kernel running time**  \n```shell\npython3 pytorch/time.py --compiler jit\npython3 pytorch/time.py --compiler setup\npython3 pytorch/time.py --compiler cmake\n```\n\n**Train model**  \n```shell\npython3 pytorch/train.py --compiler jit\npython3 pytorch/train.py --compiler setup\npython3 pytorch/train.py --compiler cmake\n```\n\n## TensorFlow\n### Compile cpp and cuda\n**CMake**  \n```shell\nmkdir build\ncd build\ncmake ../tensorflow\nmake\n```\n\n### Run python\n**Compare kernel running time**  \n```shell\npython3 tensorflow/time.py --compiler cmake\n```\n\n**Train model**  \n```shell\npython3 tensorflow/train.py --compiler cmake\n```\n\n## Implementation details (in Chinese)\n[PyTorch自定义CUDA算子教程与运行时间分析](https://godweiyang.com/2021/03/18/torch-cpp-cuda)  \n[详解PyTorch编译并调用自定义CUDA算子的三种方式](https://godweiyang.com/2021/03/21/torch-cpp-cuda-2)  \n[三分钟教你如何PyTorch自定义反向传播](https://godweiyang.com/2021/03/24/torch-cpp-cuda-3)\n\n## F.A.Q\n\u003e **Q.** ImportError: libc10.so: cannot open shared object file: No such file or directory  \n**A.** You must do `import torch` before `import add2`.\n\n\u003e **Q.** tensorflow.python.framework.errors_impl.NotFoundError: build/libadd2.so: undefined symbol: _ZTIN10tensorflow8OpKernelE  \n**A.** Check if `${TF_LFLAGS}` in `CmakeLists.txt` is correct.","funding_links":[],"categories":["Python","Learning Resources"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgodweiyang%2FNN-CUDA-Example","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgodweiyang%2FNN-CUDA-Example","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgodweiyang%2FNN-CUDA-Example/lists"}