Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-gemm

A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and software.
https://github.com/jssonx/awesome-gemm

Last synced: 5 days ago
JSON representation

General Optimization Techniques
- How To Optimize Gemm
- GEMM: From Pure C to SSE Optimized Micro Kernels - depth look into optimizing GEMM from basic C to SSE.
Frameworks
- BLIS - performance BLAS-like dense linear algebra libraries.
- BLISlab - like GEMM algorithms.
- SHPC at UT Austin (formerly FLAME)
Libraries
- NVIDIA CUTLASS 3.3
- Google gemmlowp: a small self-contained low-precision GEMM library - precision GEMM optimization by Google.
- OpenBLAS
- cutlass_fpA_intB_gemm
- CUSP
- CUV
- Eigen
- MAGMA (Matrix Algebra on GPU and Multicore Architectures) - generation linear algebra libraries for heterogeneous computing.
- LAPACK
- Xianyi Zhang
- NumPy
- TensorFlow - source software library for machine learning.
- PyTorch - source software library for machine learning.
- NVIDIA cuBLAS
- NVIDIA cuSPARSE
- libFLAME
- ViennaCL - source linear algebra library for computations on many-core architectures (GPUs, MIC) and multi-core CPUs. The library is written in C++ and supports CUDA, OpenCL, and OpenMP (including switches at runtime).
- Boost uBlas
- Armadillo
- Blaze
- ARM Compute Library - level machine learning functions optimized for Arm® Cortex®-A, Arm® Neoverse® and Arm® Mali™ GPUs architectures.
Development Software: Debugging and Profiling
- MegPeak
- Memcheck (Valgrind)
- Intel VTune Profiler
- gprof
- FPChecker - point accuracy problems.
- HPCToolkit
Selected Papers
Blogs
Other Learning Resources
Tiny Examples
- SGEMM_CUDA - by-step optimization of matrix multiplication, implemented in CUDA.
- simple-gemm
- YHs_Sample
- how-to-optimize-gemm - major matmul optimization tutorial.
- GEMM
- BLIS.jl - level Julia wrapper for BLIS typed interface.
- blis_apple
- DGEMM on Int8 Tensor Core
- chgemm
Fundamental Theories and Concepts
- General Matrix Multiply (GeMM)
- General Matrix Multiply (Intel)
University Courses & Tutorials
Lecture Notes

Programming Languages

C++ 8 C 6 Cuda 4 Julia 2 Assembly 1 TeX 1

Categories

Libraries 21 Blogs 13 Tiny Examples 9 Development Software: Debugging and Profiling 6 Selected Papers 5 University Courses & Tutorials 5 Lecture Notes 4 Other Learning Resources 4 Frameworks 3 Fundamental Theories and Concepts 2 General Optimization Techniques 2

Sub Categories

Keywords

blis 4 gemm 4 matrix-multiplication 4 cuda 4 code-optimization 2 blas 2 armv7 2 gemm-optimization 2 cpp 2 lapacke 1 lapack 1 nvidia 1 gpu 1 deep-learning-library 1 deep-learning 1 optimization 1 matrix-library 1 matrix-functions 1 matrix-calculations 1 matrix 1 linear-algebra-library 1 linear-algebra 1 hpc 1 high-performance-computing 1 high-performance 1 blas-libraries 1 gotoblas 1 sve 1 simd 1 opencl 1 neural-network 1 neon 1 machine-learning 1 linux 1 computer-vision 1 armv8 1 arm 1 android 1 aarch64 1 tensorcores 1 tensorcore 1 mixed-precision 1 wrapper 1 matrix-multiplications 1 julia 1 c 1 vulkan 1 ptx 1 int4 1 cuda-kernel 1