Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/LynnHo/Matrix-Calculus-Tutorial
Matrix Calculus via Differentials, Matrix Derivative, 矩阵求导教程
https://github.com/LynnHo/Matrix-Calculus-Tutorial
matrix matrix-calculations matrix-calculus matrix-derivatives
Last synced: 2 months ago
JSON representation
Matrix Calculus via Differentials, Matrix Derivative, 矩阵求导教程
- Host: GitHub
- URL: https://github.com/LynnHo/Matrix-Calculus-Tutorial
- Owner: LynnHo
- Created: 2019-02-17T08:29:14.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2023-03-12T12:58:32.000Z (almost 2 years ago)
- Last Synced: 2024-08-03T15:11:21.343Z (6 months ago)
- Topics: matrix, matrix-calculations, matrix-calculus, matrix-derivatives
- Homepage:
- Size: 925 KB
- Stars: 251
- Watchers: 15
- Forks: 45
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Matrix Calculus
In this page, we introduce a differential based method for vector and matrix derivatives (matrix calculus), which ***only needs a few simple rules to derive most matrix derivatives***. This method is useful and well established in mathematics; however, few documents clearly or detailedly describe it. Therefore, we make this page aiming at the comprehensive introduction of ***matrix calculus via differentials***.
\* *If you want results only, there is an awesome online tool [Matrix Calculus](http://www.matrixcalculus.org/). If you want "how to," let's get started.*
- [0. Notation](#0-notation)
- [1. Matrix Calculus via Differentials](#1-matrix-calculus-via-differentials)
* [1.1 Differential Identities](#11-differential-identities)
* [1.2 Deriving Matrix Derivatives](#12-deriving-matrix-derivatives)
+ [1.2.1 Proof of chain rules \(identities 3\)](#121-proof-of-chain-rules-identities-3)
+ [1.2.2 Practical examples](#122-practical-examples)
- [2. Conclusion](#2-conclusion)## 0. Notation
- , , and denote , , and respectively.
- The first half of the alphabet denote constants, and the second half denote variables.
- denotes matrix transpose, is the trace, is the determinant, and is the adjugate matrix.
- is the Kronecker product and is the Hadamard product.
- Here we use ***numerator layout***, while the online tool [Matrix Calculus](http://www.matrixcalculus.org/) seems to use ***mixed layout***. Please refer to [Wiki - Matrix Calculus - Layout Conventions](https://en.wikipedia.org/wiki/Matrix_calculus#Layout_conventions) for the detailed layout definitions, and keep in mind that ***different layouts lead to different results***. Below is the numerator layout,## 1. Matrix Calculus via Differentials
### 1.1 Differential Identities
- **Identities 1**
- **Identities 2**
- **Identities 3 - chain rules**
- **Identities 4 - total differential**. Actually, all identities 1 are the matrix form of the total differential in eq. (24).
### 1.2 Deriving Matrix Derivatives
To derive a matrix derivative, we ***repeat using the identities 1 (the process is actually a chain rule)*** assisted by identities 2.
#### 1.2.1 Proof of chain rules (identities 3)
-
finally from eq. (2), we get .
-
finally from eq. (3), we get .
-
finally from eq. (1), we get .
-
finally from eq. (5), we get .
#### 1.2.2 Practical examples
**E.g. 1**,
finally from eq. (2), we get .
finally from eq. (3), we get . From line 3 to 4, we use the conclusion of , that is to say, we can derive more complicated matrix derivatives by properly utilizing the existing ones. From line 6 to 7, we use to introduce the in order to use eq. (3) later, which is common in scalar-by-matrix derivatives.
**E.g. 3**,
finally from eq. (3), we get .
finally from eq. (3), we get .
**E.g. 5 - two layer neural network**, , is a loss function such as Softmax Cross Entropy and MSE, is an element-wise activation function such as Sigmoid and ReLU
For ,
finally from eq. (3), we get .
For ,
finally from eq. (3), we get .
**E.g. 6**, prove
Since
then
therefore
\* *See [examples.md](./examples.md) for more examples.*
## 2. Conclusion
Now, if we fully understand the core mind of the above examples, I believe we can derive most matrix derivatives in [Wiki - Matrix Calculus](https://en.wikipedia.org/wiki/Matrix_calculus) by ourself. Please correct me if there is any mistake, and raise issues to request the detailed steps of computing the matrix derivatives that you are interested in.