An open API service indexing awesome lists of open source software.

https://github.com/nikhilrout/thegemmcoreproject

SystemVerilog Implementation of Nvidia's CUDA/Tensor Core GEMM Operations
https://github.com/nikhilrout/thegemmcoreproject

cuda floating-point gemm gpgpu hybrid-precision-training sparse-matrix systolic-array tensorcore tpu

Last synced: about 2 months ago
JSON representation

SystemVerilog Implementation of Nvidia's CUDA/Tensor Core GEMM Operations

Awesome Lists containing this project

README

          

# TheGEMMCoreProject
SystemVerilog implementation of Nvidia's SIMT CUDA, Hybrid-Precision Tensor Core, and Google's Systolic Array TPU MXU GEMM Operations.
These modules are by no means really emulating the actual microarchitecture executing CUDA/Tensor Core instructions, instead they're simply performing the same operation for direct usage in FPGA designs.

Go check out my work on the Vortex GPGPU's [Tensor Core Unit (TCU) extension's DRL Floating Point RTL backend](https://github.com/vortexgpgpu/vortex/tree/bug_fixes/hw/rtl/tcu) for a more optimized, realistic microarchitecture implementation.

## Tensor Core Versions
### TensorCore v0: Volta Architecture [FP16MUL FP32ADD]


Volta Tensor Core Architecture Diagram


Volta Tensor Core Architecture Diagram

### TensorCore v1: Ampere Architecture [TF32MUL FP32ADD / BF16MUL FP32ADD] + Fine-Grained Structured Sparsity


Ampere Tensor Core Architecture Diagram


Ampere Tensor Core Architecture Diagram

### TensorCore v2: Hopper Architecture [FP8(E5M2/E4M3)MUL FP16ADD]


Hopper Tensor Core Architecture Diagram