Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/openmlsys/openmlsys-cuda
Tutorials for writing high-performance GPU operators in AI frameworks.
https://github.com/openmlsys/openmlsys-cuda
cuda gpu machine-learning
Last synced: about 2 months ago
JSON representation
Tutorials for writing high-performance GPU operators in AI frameworks.
- Host: GitHub
- URL: https://github.com/openmlsys/openmlsys-cuda
- Owner: openmlsys
- Created: 2022-04-22T10:50:11.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-08-12T14:47:36.000Z (over 1 year ago)
- Last Synced: 2023-08-12T15:45:41.402Z (over 1 year ago)
- Topics: cuda, gpu, machine-learning
- Language: Cuda
- Homepage:
- Size: 81.1 KB
- Stars: 66
- Watchers: 2
- Forks: 8
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# openmlsys-cuda
Examples for beginners to write your own high-performance AI operators. We introduced optimizations tricks like using shared memory and pipeline rearrangement to maximize the throughput. We also provided an example for using CUTLASS to implement an FC + ReLU fused operator.
## Dependencies
- Eigen: CPU linear algebra template library
- OpenMP: Enable multi-threads acceleration on CPU
- CUDA toolkit: Compile GPU kernels and analyse GPU executions
- Gflags: Commandline flags library released by Google
- CUTLASS: GPU GEMM template library### Installation Hints
- Eigen: Use package manager, e.g. `apt install libeigen3-dev`, or download from
the [official website](https://eigen.tuxfamily.org/) and build from source.
- OpenMP: Most time the compilers have already integrated with OpenMP. If your compiler does not support OpenMP,
try `apt install libgomp-dev` or `apt install libomp-dev` for GCC or Clang separately.
- CUDA toolkit: It's recommended to install following
the [official instructions](https://developer.nvidia.com/cuda-toolkit).
- Gflags: Use package manager, e.g. `apt install libgflags-dev`, or download from
the [official website](https://gflags.github.io/gflags/) and build from source.
- CUTLASS: We have registered it to our git module, so you do not have to install by yourself.## Compilation
Once you have installed the dependencies, you can use the following instruction to compile the project:
```bash
git clone [email protected]:openmlsys/openmlsys-cuda.git
cd openmlsys-cuda
git submodule init && git submodule sync
mkdir build && cd build
cmake ..
make -j4
```## Examples
- `first_attempt`: The naive implementation
- `gemm`: Collection of implementations using different optimization tricks
- `fc_relu`: Example for fusing FC and ReLU by using CUTLASS