Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/flame/how-to-optimize-gemm


https://github.com/flame/how-to-optimize-gemm

blis code-optimization gemm gotoblas matrix-multiplication

Last synced: about 13 hours ago
JSON representation

Awesome Lists containing this project

README

        

# How To Optimize Gemm wiki pages
https://github.com/flame/how-to-optimize-gemm/wiki

Copyright by Prof. Robert van de Geijn ([email protected]).

Adapted to Github Markdown Wiki by Jianyu Huang ([email protected]).

# Table of contents

* [The GotoBLAS/BLIS Approach to Optimizing Matrix-Matrix Multiplication - Step-by-Step](../../wiki#the-gotoblasblis-approach-to-optimizing-matrix-matrix-multiplication---step-by-step)
* [NOTICE ON ACADEMIC HONESTY](../../wiki#notice-on-academic-honesty)
* [References](../../wiki#references)
* [Set Up](../../wiki#set-up)
* [Step-by-step optimizations](../../wiki#step-by-step-optimizations)
* [Computing four elements of C at a time](../../wiki#computing-four-elements-of-c-at-a-time)
* [Hiding computation in a subroutine](../../wiki#hiding-computation-in-a-subroutine)
* [Computing four elements at a time](../../wiki#computing-four-elements-at-a-time)
* [Further optimizing](../../wiki#further-optimizing)
* [Computing a 4 x 4 block of C at a time](../../wiki#computing-a-4-x-4-block-of-c-at-a-time)
* [Repeating the same optimizations](../../wiki#repeating-the-same-optimizations)
* [Further optimizing](../../wiki#further-optimizing-1)
* [Blocking to maintain performance](../../wiki#blocking-to-maintain-performance)
* [Packing into contiguous memory](../../wiki#packing-into-contiguous-memory)
* [Acknowledgement](../../wiki#acknowledgement)

# Related Links
* [BLISlab: A Sandbox for Optimizing GEMM](https://github.com/flame/blislab)
* [GEMM: From Pure C to SSE Optimized Micro Kernels](http://apfel.mathematik.uni-ulm.de/~lehn/sghpc/gemm/)

# Acknowledgement
This material was partially sponsored by grants from the National Science Foundation (Awards ACI-1148125/1340293 and ACI-1550493).

_Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF)._