Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-gemm

📚 A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and software
https://github.com/jssonx/awesome-gemm

Last synced: about 2 hours ago
JSON representation

Frameworks and Development Tools 🛠️
- BLIS - A modular framework for building high-performance BLAS-like libraries.
- BLISlab - Educational framework for experimenting with BLIS-like GEMM algorithms.
- Tensile - AMD ROCm JIT compiler for GPU kernels, specializing in GEMM and tensor contractions.
Libraries 🗂️
- GPU Libraries ⚡
- Cross-Platform Libraries 🌍
- Language-Specific Libraries 🔤
  - BLIS.jl - 3-Clause)
  - Armadillo - 2.0/MIT)
  - Boost uBlas
  - NumPy - 3-Clause)
  - SciPy - 3-Clause)
  - TensorFlow - 2.0) & [XLA](https://www.tensorflow.org/xla)
  - JAX - 2.0)
  - PyTorch - 3-Clause)
  - GemmKernels.jl - 3-Clause)
  - Eigen
- CPU Libraries 💻
  - blis_apple: BLIS optimized for Apple M1 - 3-Clause)
  - libFLAME: High-performance dense linear algebra library - 3-Clause)
  - OpenBLAS: Optimized BLAS implementation based on GotoBLAS2 - 3-Clause)
  - Intel MKL: Highly optimized math routines for Intel CPUs
  - FBGEMM: Meta's CPU GEMM for optimized server inference - 3-Clause)
  - gemmlowp: Google's low-precision GEMM library - 2.0)
  - BLASFEO: Optimized for small- to medium-sized dense matrices - 2-Clause)
  - LIBXSMM: Specializing in small/micro GEMM kernels - 3-Clause)
Libraries
- - OpenBLAS
  - Eigen
  - MAGMA (Matrix Algebra on GPU and Multicore Architectures) - generation linear algebra libraries for heterogeneous computing.
  - NumPy
  - TensorFlow - source software library for machine learning.
  - PyTorch - source software library for machine learning.
  - ViennaCL - source linear algebra library for computations on many-core architectures (GPUs, MIC) and multi-core CPUs. The library is written in C++ and supports CUDA, OpenCL, and OpenMP (including switches at runtime).
  - Blaze
- GPU Libraries
  - cutlass_fpA_intB_gemm - 2.0`](https://github.com/tlc-pack/cutlass_fpA_intB_gemm/blob/main/LICENSE)
  - DGEMM on Int8 Tensor Core
  - hipBLAS-common - common/blob/develop/LICENSE.md)
  - OpenAI GEMM - gemm/blob/master/LICENSE)
  - ArrayFire - purpose GPU library that simplifies GPU computing with high-level functions, including matrix operations. [`BSD-3-Clause`](https://github.com/arrayfire/arrayfire/blob/master/LICENSE)
- CPU Libraries
  - Xianyi Zhang
  - libFLAME - performance dense linear algebra library. [`BSD-3-Clause`](https://github.com/flame/libflame/blob/master/LICENSE.txt)
  - OpenBLAS - 3-Clause`](https://github.com/xianyi/OpenBLAS/blob/develop/LICENSE)
- Language-Specific Libraries
  - Boost uBlas
Debugging and Profiling Tools 🔍
- Language-Specific Libraries 🔤
Learning Resources 📚
Example Implementations 💡
- Blogs 🖋️
Example Implementations
- Other Resources
  - YHs_Sample - Li/YHs_Sample/blob/master/LICENSE)
  - GEMM
  - GEMM Optimization with LIBXSMM - 3-Clause`](https://github.com/libxsmm/libxsmm/blob/main/LICENSE.md)
Fundamental Theories and Concepts 🧠
- Spatial-lang GEMM - High-level overview.
- General Matrix Multiply (Intel) - Intro from Intel.
- Strassen's Algorithm - Faster asymptotic complexity for large matrices.
- Winograd's Algorithm - Reduced multiplication count for improved performance.
General Optimization Techniques 🚀
- GEMM: From Pure C to SSE Optimized Micro Kernels - Detailed tutorial on going from naive to vectorized implementations.
- How To Optimize GEMM - Hands-on optimization guide.
Frameworks and Development Tools
- SHPC at UT Austin (formerly FLAME)
Development Software: Debugging and Profiling
- - gprof
  - FPChecker - point accuracy problems.
  - HPCToolkit
- Language-Specific Libraries
  - Perf - level metrics. [`GPLv2`](https://github.com/torvalds/linux/blob/master/COPYING)
  - gprofng-gui - 3.0.html)
  - nvprof - line profiler for CUDA applications. [`NVIDIA End User License Agreement`](https://docs.nvidia.com/cuda/eula/index.html)
Learning Resources
- University Courses & Tutorials
  - UT Austin: EE382 System-on-Chip (SoC) Design
  - GPU MODE
- Blogs
- Other Resources
  - Triton
University Courses & Tutorials
- UT Austin (Flame): LAFF-On Programming for High Performance
Other Learning Resources
- NVIDIA Developer Blog

Programming Languages

C++ 17 C 9 Cuda 8 Python 5 Julia 3 JavaScript 1 Fortran 1 Assembly 1 TeX 1

Categories

Learning Resources 📚 46 Libraries 🗂️ 38 Debugging and Profiling Tools 🔍 18 Example Implementations 💡 17 Libraries 17 Learning Resources 9 Development Software: Debugging and Profiling 6 Fundamental Theories and Concepts 🧠 4 Example Implementations 3 Frameworks and Development Tools 🛠️ 3 General Optimization Techniques 🚀 2 Frameworks and Development Tools 1 University Courses & Tutorials 1 Other Learning Resources 1

Sub Categories

Blogs 🖋️ 46 Language-Specific Libraries 🔤 28 University Courses & Tutorials 🎓 12 GPU Libraries ⚡ 12 CPU Libraries 💻 8 Cross-Platform Libraries 🌍 8 Blogs 6 Selected Papers 📝 5 GPU Libraries 5 Other Resources 4 Language-Specific Libraries 4 CPU Libraries 3 University Courses & Tutorials 2

Keywords

cuda 11 gemm 7 blas 6 gpu 6 matrix-multiplication 6 python 5 deep-learning 5 blis 4 opencl 4 machine-learning 4 cpp 4 hip 3 neural-network 3 c 3 lapack 3 linear-algebra 3 hpc 3 aarch64 2 armv7 2 armv8 2 linux 2 numpy 2 high-performance-computing 2 code-optimization 2 openmp 2 oneapi 2 performance 2 deep-neural-networks 2 scientific-computing 2 assembly 2 high-performance 2 gpu-computing 2 blas-libraries 2 matrix-library 2 cuda-programming 2 linear-algebra-library 2 matrix-functions 2 lapacke 2 gemm-optimization 2 julia 2 clblas 1 arrayfire 1 c-plus-plus 1 matrix 1 gpgpu 1 matrix-calculations 1 nvidia 1 amd 1 deep-learning-library 1 auto-tuning 1