https://github.com/Cjkkkk/CUDA_gemm

A simple high performance CUDA GEMM implementation.
https://github.com/Cjkkkk/CUDA_gemm

Last synced: 15 days ago
JSON representation

A simple high performance CUDA GEMM implementation.

Host: GitHub
URL: https://github.com/Cjkkkk/CUDA_gemm
Owner: Cjkkkk
Created: 2019-12-26T15:02:14.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2024-01-04T16:33:32.000Z (over 1 year ago)
Last Synced: 2024-11-19T02:38:51.900Z (6 months ago)
Language: Cuda
Homepage:
Size: 677 KB
Stars: 335
Watchers: 5
Forks: 36
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-cuda-and-hpc - Cjkkkk/CUDA_gemm
awesome-cuda-and-hpc - Cjkkkk/CUDA_gemm

README

## introduction
A simple high performance CUDA GEMM, Block Sparse GEMM and Non-uniform Quantized GEMM implementation.
```
C = alpha * A * B + beta * C
```
## algorithm
**located in src/cuda/**

* MatrixMulCUDA
* one element of C is assigned one thread
* global memory coalesce of B
* MatrixMulCUDA1
* texture load
* MatrixMulCUDA2
* one 4 * 4 grid of C is assigned one thread
* MatrixMulCUDA3
* vectorized A B load
* MatrixMulCUDA4
* vectorized C store
* MatrixMulCUDA5
* block sparse version
* MatrixMulCUDA6
* vectorized A B load coalesce
* MatrixMulCUDA7
* warp shuffle to enable C store coalesce
* MatrixMulCUDAQuantize8bit
* 8 bit non-uniform quantized matmul

## experiments
**located in benchmark/**
* benchmark_dense
* Compare My Gemm with Cublas
* benchmark_sparse
* Compare My block sparse Gemm with Cusparse
* benchmark_quantization_8bit
* Compare My Gemm with Cublas
* benchmark_quantization
* Compare My Gemm with My quantized non-uniform 8 bit Gemm

## TODO
* (MatrixMulCUDA7) write back to C matrix, warp shuffle to enable global memory coalesce
* (MatrixMulCUDA8) double buffering

## run
```
mkdir builds
make benchmark_[experiment name]
bash scripts/benchmark_[experiment name].sh
```

## Note
* sparsity约为1%的时候, cusparse的性能可以超越cublas
* 合理分配寄存器尽可能让参数在编译器确定节省计算资源和寄存器数目

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Cjkkkk/CUDA_gemm

Awesome Lists containing this project

README