Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/LynnHo/Matrix-Calculus-Tutorial

Matrix Calculus via Differentials, Matrix Derivative, 矩阵求导教程
https://github.com/LynnHo/Matrix-Calculus-Tutorial

matrix matrix-calculations matrix-calculus matrix-derivatives

Last synced: 1 day ago
JSON representation

Matrix Calculus via Differentials, Matrix Derivative, 矩阵求导教程

Awesome Lists containing this project

README

        

Matrix Calculus

In this page, we introduce a differential based method for vector and matrix derivatives (matrix calculus), which ***only needs a few simple rules to derive most matrix derivatives***. This method is useful and well established in mathematics; however, few documents clearly or detailedly describe it. Therefore, we make this page aiming at the comprehensive introduction of ***matrix calculus via differentials***.

\* *If you want results only, there is an awesome online tool [Matrix Calculus](http://www.matrixcalculus.org/). If you want "how to," let's get started.*

- [0. Notation](#0-notation)
- [1. Matrix Calculus via Differentials](#1-matrix-calculus-via-differentials)
* [1.1 Differential Identities](#11-differential-identities)
* [1.2 Deriving Matrix Derivatives](#12-deriving-matrix-derivatives)
+ [1.2.1 Proof of chain rules \(identities 3\)](#121-proof-of-chain-rules-identities-3)
+ [1.2.2 Practical examples](#122-practical-examples)
- [2. Conclusion](#2-conclusion)

## 0. Notation

- , , and denote , , and respectively.
- The first half of the alphabet denote constants, and the second half denote variables.
- denotes matrix transpose, is the trace, is the determinant, and is the adjugate matrix.
- is the Kronecker product and is the Hadamard product.
- Here we use ***numerator layout***, while the online tool [Matrix Calculus](http://www.matrixcalculus.org/) seems to use ***mixed layout***. Please refer to [Wiki - Matrix Calculus - Layout Conventions](https://en.wikipedia.org/wiki/Matrix_calculus#Layout_conventions) for the detailed layout definitions, and keep in mind that ***different layouts lead to different results***. Below is the numerator layout,

## 1. Matrix Calculus via Differentials

### 1.1 Differential Identities

- **Identities 1**

- **Identities 2**

- **Identities 3 - chain rules**

- **Identities 4 - total differential**. Actually, all identities 1 are the matrix form of the total differential in eq. (24).

### 1.2 Deriving Matrix Derivatives

To derive a matrix derivative, we ***repeat using the identities 1 (the process is actually a chain rule)*** assisted by identities 2.

#### 1.2.1 Proof of chain rules (identities 3)

-

finally from eq. (2), we get .

-

finally from eq. (3), we get .

-

finally from eq. (1), we get .

-

finally from eq. (5), we get .

#### 1.2.2 Practical examples

**E.g. 1**,

finally from eq. (2), we get .

**E.g. 2**,

finally from eq. (3), we get . From line 3 to 4, we use the conclusion of , that is to say, we can derive more complicated matrix derivatives by properly utilizing the existing ones. From line 6 to 7, we use to introduce the in order to use eq. (3) later, which is common in scalar-by-matrix derivatives.

**E.g. 3**,

finally from eq. (3), we get .

**E.g. 4**,

finally from eq. (3), we get .

**E.g. 5 - two layer neural network**, , is a loss function such as Softmax Cross Entropy and MSE, is an element-wise activation function such as Sigmoid and ReLU

For ,

finally from eq. (3), we get .

For ,

finally from eq. (3), we get .

**E.g. 6**, prove

Since

then

therefore

\* *See [examples.md](./examples.md) for more examples.*

## 2. Conclusion
Now, if we fully understand the core mind of the above examples, I believe we can derive most matrix derivatives in [Wiki - Matrix Calculus](https://en.wikipedia.org/wiki/Matrix_calculus) by ourself. Please correct me if there is any mistake, and raise issues to request the detailed steps of computing the matrix derivatives that you are interested in.