Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/flame/how-to-optimize-gemm
https://github.com/flame/how-to-optimize-gemm
blis code-optimization gemm gotoblas matrix-multiplication
Last synced: about 13 hours ago
JSON representation
- Host: GitHub
- URL: https://github.com/flame/how-to-optimize-gemm
- Owner: flame
- Created: 2016-08-09T20:59:23.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2023-07-29T07:16:04.000Z (over 1 year ago)
- Last Synced: 2025-01-11T15:04:15.254Z (8 days ago)
- Topics: blis, code-optimization, gemm, gotoblas, matrix-multiplication
- Language: C
- Size: 2.18 MB
- Stars: 1,787
- Watchers: 44
- Forks: 355
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-cuda-triton-hpc - flame/how-to-optimize-gemm - to-optimize-gemm?style=social"/> : How To Optimize Gemm wiki pages. [https://github.com/flame/how-to-optimize-gemm/wiki](https://github.com/flame/how-to-optimize-gemm/wiki) (Learning Resources)
- awesome-cuda-triton-hpc - flame/how-to-optimize-gemm - to-optimize-gemm?style=social"/> : How To Optimize Gemm wiki pages. [https://github.com/flame/how-to-optimize-gemm/wiki](https://github.com/flame/how-to-optimize-gemm/wiki) (Learning Resources)
- awesome-gemm - How To Optimize GEMM - Hands-on optimization guide. (General Optimization Techniques 🚀)
README
# How To Optimize Gemm wiki pages
https://github.com/flame/how-to-optimize-gemm/wikiCopyright by Prof. Robert van de Geijn ([email protected]).
Adapted to Github Markdown Wiki by Jianyu Huang ([email protected]).
# Table of contents
* [The GotoBLAS/BLIS Approach to Optimizing Matrix-Matrix Multiplication - Step-by-Step](../../wiki#the-gotoblasblis-approach-to-optimizing-matrix-matrix-multiplication---step-by-step)
* [NOTICE ON ACADEMIC HONESTY](../../wiki#notice-on-academic-honesty)
* [References](../../wiki#references)
* [Set Up](../../wiki#set-up)
* [Step-by-step optimizations](../../wiki#step-by-step-optimizations)
* [Computing four elements of C at a time](../../wiki#computing-four-elements-of-c-at-a-time)
* [Hiding computation in a subroutine](../../wiki#hiding-computation-in-a-subroutine)
* [Computing four elements at a time](../../wiki#computing-four-elements-at-a-time)
* [Further optimizing](../../wiki#further-optimizing)
* [Computing a 4 x 4 block of C at a time](../../wiki#computing-a-4-x-4-block-of-c-at-a-time)
* [Repeating the same optimizations](../../wiki#repeating-the-same-optimizations)
* [Further optimizing](../../wiki#further-optimizing-1)
* [Blocking to maintain performance](../../wiki#blocking-to-maintain-performance)
* [Packing into contiguous memory](../../wiki#packing-into-contiguous-memory)
* [Acknowledgement](../../wiki#acknowledgement)# Related Links
* [BLISlab: A Sandbox for Optimizing GEMM](https://github.com/flame/blislab)
* [GEMM: From Pure C to SSE Optimized Micro Kernels](http://apfel.mathematik.uni-ulm.de/~lehn/sghpc/gemm/)# Acknowledgement
This material was partially sponsored by grants from the National Science Foundation (Awards ACI-1148125/1340293 and ACI-1550493)._Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF)._