https://github.com/tgautam03/xgemm

Accelerated General (FP32) Matrix Multiplication from scratch in CUDA
https://github.com/tgautam03/xgemm

cuda-programming gpu-programming matrix-multiplication sgemm

Last synced: 7 months ago
JSON representation

Accelerated General (FP32) Matrix Multiplication from scratch in CUDA

Host: GitHub
URL: https://github.com/tgautam03/xgemm
Owner: tgautam03
License: mit
Created: 2024-08-11T21:36:15.000Z (about 1 year ago)
Default Branch: master
Last Pushed: 2025-01-09T21:13:38.000Z (9 months ago)
Last Synced: 2025-03-30T19:05:53.006Z (7 months ago)
Topics: cuda-programming, gpu-programming, matrix-multiplication, sgemm
Language: Cuda
Homepage:
Size: 5.8 MB
Stars: 111
Watchers: 2
Forks: 6
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# xGeMM
Accelerated General (FP32) Matrix Multiplication. Tested on NVIDIA RTX 3090 using Ubuntu 24.04.1 LTS with nvidia-driver-550 and CUDA 12.4.

**Watch the YouTube video (click the image below)**

[![VideoThumbnail](https://raw.githubusercontent.com/tgautam03/xGeMM/refs/heads/master/Thumbnail.png)](https://youtu.be/GetaI7KhbzM?si=i9sMAfGqO4zyJZhq)

## Dependencies
- [Eigen 3.4.0](https://gitlab.com/libeigen/eigen/-/releases/3.4.0) (Put it in `lib`)

## Running Benchmarks
### 1. Eigen (CPU) matrix multiplication

**Compile**: `make 00a_benchmark_cpu.out`

**Execute**: `./00a_benchmark_cpu.out`

### 2. cuBLAS (GPU) matrix multiplication:

**Compile**: `make 00b_benchmark_cuBLAS.out`

**Execute**: `./00b_benchmark_cuBLAS.out`

### 3. Naive (GPU) matrix multiplication:

**Compile**: `make 01_benchmark_naive.out`

**Execute**: `./01_benchmark_naive.out`

### 4. Coalesced (GPU) matrix multiplication:

**Compile**: `make 02_benchmark_coalesced.out`

**Execute**: `./02_benchmark_coalesced.out`

### 5. Tiled (GPU) matrix multiplication:

**Compile**: `make 03_benchmark_tiled.out`

**Execute**: `./03_benchmark_tiled.out`

### 6. 1D thread coarsening (GPU) matrix multiplication:

**Compile**: `make 04_benchmark_coarse_1d.out`

**Execute**: `./04_benchmark_coarse_1d.out`

### 7. 2D thread coarsening (GPU) matrix multiplication:

**Compile**: `make 05_benchmark_coarse_2d.out`

**Execute**: `./05_benchmark_coarse_2d.out`

### 8. Vectorized Mmemory accesses (GPU) matrix multiplication:

**Compile**: `make 06_benchmark_coarse_2d_vec.out`

**Execute**: `./06_benchmark_coarse_2d_vec.out`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tgautam03/xgemm

Awesome Lists containing this project

README