{"id":13688575,"url":"https://github.com/LynnHo/Matrix-Calculus-Tutorial","last_synced_at":"2025-05-01T19:31:08.087Z","repository":{"id":45813910,"uuid":"171098850","full_name":"LynnHo/Matrix-Calculus-Tutorial","owner":"LynnHo","description":"Matrix Calculus via Differentials, Matrix Derivative, 矩阵求导教程","archived":false,"fork":false,"pushed_at":"2023-03-12T12:58:32.000Z","size":947,"stargazers_count":260,"open_issues_count":0,"forks_count":44,"subscribers_count":15,"default_branch":"master","last_synced_at":"2024-11-12T12:47:44.434Z","etag":null,"topics":["matrix","matrix-calculations","matrix-calculus","matrix-derivatives"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LynnHo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-02-17T08:29:14.000Z","updated_at":"2024-11-12T11:02:47.000Z","dependencies_parsed_at":"2024-01-14T17:58:21.723Z","dependency_job_id":"6abdac7f-e837-4e1c-82cb-031dcc64dbe2","html_url":"https://github.com/LynnHo/Matrix-Calculus-Tutorial","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LynnHo%2FMatrix-Calculus-Tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LynnHo%2FMatrix-Calculus-Tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LynnHo%2FMatrix-Calculus-Tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LynnHo%2FMatrix-Calculus-Tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LynnHo","download_url":"https://codeload.github.com/LynnHo/Matrix-Calculus-Tutorial/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251932655,"owners_count":21667189,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["matrix","matrix-calculations","matrix-calculus","matrix-derivatives"],"created_at":"2024-08-02T15:01:16.902Z","updated_at":"2025-05-01T19:31:07.140Z","avatar_url":"https://github.com/LynnHo.png","language":null,"funding_links":[],"categories":["Others"],"sub_categories":[],"readme":"\u003ch1\u003e \u003cp align=\"center\"\u003e Matrix Calculus \u003c/p\u003e \u003c/h1\u003e\n\nIn this page, we introduce a differential based method for vector and matrix derivatives (matrix calculus), which ***only needs a few simple rules to derive most matrix derivatives***. This method is useful and well established in mathematics; however, few documents clearly or detailedly describe it. Therefore, we make this page aiming at the comprehensive introduction of ***matrix calculus via differentials***.\n\n\\* *If you want results only, there is an awesome online tool [Matrix Calculus](http://www.matrixcalculus.org/). If you want \"how to,\" let's get started.*\n\n\u003c!-- MarkdownTOC --\u003e\n\n- [0. Notation](#0-notation)\n- [1. Matrix Calculus via Differentials](#1-matrix-calculus-via-differentials)\n    * [1.1 Differential Identities](#11-differential-identities)\n    * [1.2 Deriving Matrix Derivatives](#12-deriving-matrix-derivatives)\n        + [1.2.1 Proof of chain rules \\(identities 3\\)](#121-proof-of-chain-rules-identities-3)\n        + [1.2.2 Practical examples](#122-practical-examples)\n- [2. Conclusion](#2-conclusion)\n\n\u003c!-- /MarkdownTOC --\u003e\n\n\n## 0. Notation\n\n- \u003cimg src=\"svg/332cc365a4987aacce0ead01b8bdcc0b.svg?invert_in_darkmode\" align=middle width=9.39498779999999pt height=14.15524440000002pt/\u003e, \u003cimg src=\"svg/b0ea07dc5c00127344a1cad40467b8de.svg?invert_in_darkmode\" align=middle width=9.97711604999999pt height=14.611878600000017pt/\u003e, and \u003cimg src=\"svg/d05b996d2c08252f77613c25205a0f04.svg?invert_in_darkmode\" align=middle width=14.29216634999999pt height=22.55708729999998pt/\u003e denote \u003cimg src=\"svg/06316c77132ca6158472939a99814319.svg?invert_in_darkmode\" align=middle width=45.29889209999998pt height=22.831056599999986pt/\u003e, \u003cimg src=\"svg/fcc2a3c40b141613d2781dfcfd2690be.svg?invert_in_darkmode\" align=middle width=51.10707194999999pt height=20.87411699999998pt/\u003e, and \u003cimg src=\"svg/7cfa827ff847f043676716fc1761de31.svg?invert_in_darkmode\" align=middle width=79.451658pt height=22.55708729999998pt/\u003e respectively.\n- The first half of the alphabet \u003cimg src=\"svg/5a24655a97800650f252524ec4d77b21.svg?invert_in_darkmode\" align=middle width=79.47848699999999pt height=24.65753399999998pt/\u003e denote constants, and the second half \u003cimg src=\"svg/e056c39c77a72a0b8ad811db1bfaf257.svg?invert_in_darkmode\" align=middle width=80.60102324999998pt height=24.65753399999998pt/\u003e denote variables.\n- \u003cimg src=\"svg/22acdfc775a9c0010ae19703fdeedeb7.svg?invert_in_darkmode\" align=middle width=24.56618669999999pt height=27.91243950000002pt/\u003e denotes matrix transpose, \u003cimg src=\"svg/ea1ba186dcad14b51b2f2ed2ff31806d.svg?invert_in_darkmode\" align=middle width=39.90867704999999pt height=24.65753399999998pt/\u003e is the trace, \u003cimg src=\"svg/9ca9af65e7723587892c4820de37815f.svg?invert_in_darkmode\" align=middle width=23.42461439999999pt height=24.65753399999998pt/\u003e is the determinant, and \u003cimg src=\"svg/3c7afd910b2b14682613426db256d1ca.svg?invert_in_darkmode\" align=middle width=50.20661909999999pt height=24.65753399999998pt/\u003e is the adjugate matrix.\n- \u003cimg src=\"svg/2cd721f73978bd9ec4aabc24e65b08fd.svg?invert_in_darkmode\" align=middle width=12.785434199999989pt height=19.1781018pt/\u003e is the Kronecker product and \u003cimg src=\"svg/c0463eeb4772bfde779c20d52901d01b.svg?invert_in_darkmode\" align=middle width=8.219209349999991pt height=14.611911599999981pt/\u003e is the Hadamard product.\n- Here we use ***numerator layout***, while the online tool [Matrix Calculus](http://www.matrixcalculus.org/) seems to use ***mixed layout***. Please refer to [Wiki - Matrix Calculus - Layout Conventions](https://en.wikipedia.org/wiki/Matrix_calculus#Layout_conventions) for the detailed layout definitions, and keep in mind that ***different layouts lead to different results***. Below is the numerator layout,\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/1b248f70d5300d3e718064ac71bd7c2e.svg?invert_in_darkmode\" align=middle width=184.37670405pt height=39.452455349999994pt/\u003e\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/5f20c4aa8b6cce14bd2fccf73f369229.svg?invert_in_darkmode\" align=middle width=101.1645492pt height=94.143423pt/\u003e\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/47bd9b08a8f8ab9758b687d2f3b063cd.svg?invert_in_darkmode\" align=middle width=230.90713965pt height=98.63111444999998pt/\u003e\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/43a557c3b375b8ca5dfc60c68f3266f8.svg?invert_in_darkmode\" align=middle width=241.44147225pt height=103.37093415pt/\u003e\u003c/p\u003e\n\n\n## 1. Matrix Calculus via Differentials\n\n### 1.1 Differential Identities\n\n- **Identities 1**\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/1367d0cdd0fbd79abf57d0a6d4e5424b.svg?invert_in_darkmode\" align=middle width=410.3851191pt height=235.74932535pt/\u003e\u003c/p\u003e\n\n- **Identities 2**\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/752ed38773e8e927062d8f93bf5a1f05.svg?invert_in_darkmode\" align=middle width=629.82298335pt height=319.0194909pt/\u003e\u003c/p\u003e\n\n- **Identities 3 - chain rules**\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/51273d37cf9bcc67a8d66d5fbc70272b.svg?invert_in_darkmode\" align=middle width=449.86519875pt height=164.5634694pt/\u003e\u003c/p\u003e\n\n- **Identities 4 - total differential**. Actually, all identities 1 are the matrix form of the total differential in eq. (24).\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/55bcfb651b61684f38a48d7e12826599.svg?invert_in_darkmode\" align=middle width=495.735801pt height=167.75981309999997pt/\u003e\u003c/p\u003e\n\n### 1.2 Deriving Matrix Derivatives\n\nTo derive a matrix derivative, we ***repeat using the identities 1 (the process is actually a chain rule)*** assisted by identities 2.\n\n#### 1.2.1 Proof of chain rules (identities 3)\n\n- \u003cimg src=\"svg/d6d4dc6b6bbabac486dad7ac3011ac17.svg?invert_in_darkmode\" align=middle width=60.334188749999996pt height=33.20539859999999pt/\u003e\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/1785e0685f7ee389a54613ef7f82cffd.svg?invert_in_darkmode\" align=middle width=363.51749114999996pt height=80.59222875pt/\u003e\u003c/p\u003e\n\nfinally from eq. (2), we get \u003cimg src=\"svg/aca022a8d9084c704593d90b61128967.svg?invert_in_darkmode\" align=middle width=74.31442095pt height=30.648287999999997pt/\u003e.\n\n- \u003cimg src=\"svg/3afabb957686c850a9375e82d19dcd8f.svg?invert_in_darkmode\" align=middle width=62.63411055pt height=33.20539859999999pt/\u003e\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/3316e279dabb3f4cbaee0f558700433e.svg?invert_in_darkmode\" align=middle width=385.11320099999995pt height=124.17602174999999pt/\u003e\u003c/p\u003e\n\nfinally from eq. (3), we get \u003cimg src=\"svg/330946ab78a470ca64f4e5107987af4b.svg?invert_in_darkmode\" align=middle width=79.69079414999999pt height=30.648287999999997pt/\u003e.\n\n- \u003cimg src=\"svg/fbbdc722022c465f2ae6b48c36598d95.svg?invert_in_darkmode\" align=middle width=63.37669304999999pt height=33.20539859999999pt/\u003e\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/f840b4416a3f64d656eb04af8c3bbbff.svg?invert_in_darkmode\" align=middle width=386.2404777pt height=114.5869824pt/\u003e\u003c/p\u003e\n\nfinally from eq. (1), we get \u003cimg src=\"svg/367ed500e49383632c93a4446f39ec80.svg?invert_in_darkmode\" align=middle width=108.3902985pt height=28.92634470000001pt/\u003e.\n\n- \u003cimg src=\"svg/b4d21fc6fdd5d14a267710723a939ba5.svg?invert_in_darkmode\" align=middle width=60.21195674999999pt height=33.20539859999999pt/\u003e\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/7ef1edf6b0f8989ae7b66a74185a21ef.svg?invert_in_darkmode\" align=middle width=363.5402661pt height=80.59222875pt/\u003e\u003c/p\u003e\n\nfinally from eq. (5), we get \u003cimg src=\"svg/f8f3b0df92ab2d964857892f8ffec8b0.svg?invert_in_darkmode\" align=middle width=74.31442095pt height=30.648287999999997pt/\u003e.\n\n#### 1.2.2 Practical examples\n\n**E.g. 1**, \u003cimg src=\"svg/4e82ca6abaed2097dcbead8adf01afec.svg?invert_in_darkmode\" align=middle width=42.555557549999996pt height=37.282175700000025pt/\u003e\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/9c310a80928421f0554386569eb27c3b.svg?invert_in_darkmode\" align=middle width=439.95767474999997pt height=99.78847065pt/\u003e\u003c/p\u003e\n\nfinally from eq. (2), we get \u003cimg src=\"svg/c6c1fe4311d8e38885392c2de18807a6.svg?invert_in_darkmode\" align=middle width=94.91611635pt height=37.282175700000025pt/\u003e.\n\n\u003ca name=\"y=Wx\"\u003e\u003c/a\u003e**E.g. 2**, \u003cimg src=\"svg/2f241d50ab9d1c15bdbfc64ffdcefbf4.svg?invert_in_darkmode\" align=middle width=70.31119589999999pt height=37.28212289999999pt/\u003e\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/39d4c566edf7d6adc782e6c16982b883.svg?invert_in_darkmode\" align=middle width=627.40374675pt height=258.7181157pt/\u003e\u003c/p\u003e\n\nfinally from eq. (3), we get \u003cimg src=\"svg/e5707fda1f7e245c298e383e9f86c6bb.svg?invert_in_darkmode\" align=middle width=195.57101684999998pt height=37.28212289999999pt/\u003e. From line 3 to 4, we use the conclusion of \u003cimg src=\"svg/d1e757ff485f61fbd67a2c6e31bcf033.svg?invert_in_darkmode\" align=middle width=84.64209435pt height=34.28165400000002pt/\u003e, that is to say, we can derive more complicated  matrix derivatives by properly utilizing the existing ones. From line 6 to 7, we use \u003cimg src=\"svg/d84c3c399a65b6a8a262aa3315ab4b75.svg?invert_in_darkmode\" align=middle width=66.32411774999998pt height=24.65753399999998pt/\u003e to introduce the \u003cimg src=\"svg/a4035517c00ae250425f359c6b7eccfb.svg?invert_in_darkmode\" align=middle width=30.182756999999988pt height=24.65753399999998pt/\u003e in order to use eq. (3) later, which is common in scalar-by-matrix derivatives.\n\n**E.g. 3**, \u003cimg src=\"svg/966fb4dd3b3a91718c1964cff5c791d0.svg?invert_in_darkmode\" align=middle width=49.28919104999999pt height=33.20539859999999pt/\u003e\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/3f4278ebe0e8315ac8ce2a95ebd102c5.svg?invert_in_darkmode\" align=middle width=449.3987783999999pt height=71.70438164999999pt/\u003e\u003c/p\u003e\n\nfinally from eq. (3), we get \u003cimg src=\"svg/840804c350d128bab42a8842217b8afb.svg?invert_in_darkmode\" align=middle width=104.29809389999998pt height=33.20539859999999pt/\u003e.\n\n\u003ca name=\"Y=AX\"\u003e\u003c/a\u003e**E.g. 4**, \u003cimg src=\"svg/4a3ad3efe4ae7a1b4d75f4b88a821967.svg?invert_in_darkmode\" align=middle width=70.45737435pt height=33.20539859999999pt/\u003e\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/a7a66e09e9d6d5bdcc6a3e7b9270709f.svg?invert_in_darkmode\" align=middle width=555.8363975999999pt height=65.753424pt/\u003e\u003c/p\u003e\n\nfinally from eq. (3), we get \u003cimg src=\"svg/a3786564a3359ce0856a2cde2ae2a55f.svg?invert_in_darkmode\" align=middle width=122.08714155000001pt height=33.20539859999999pt/\u003e.\n\n**E.g. 5 - two layer neural network**, \u003cimg src=\"svg/72e84c4903b48d5de424c69d2dc906b3.svg?invert_in_darkmode\" align=middle width=104.59468799999998pt height=24.65753399999998pt/\u003e, \u003cimg src=\"svg/2f2322dff5bde89c37bcae4116fe20a8.svg?invert_in_darkmode\" align=middle width=5.2283516999999895pt height=22.831056599999986pt/\u003e is a loss function such as Softmax Cross Entropy and MSE, \u003cimg src=\"svg/8cda31ed38c6d59d14ebefa440099572.svg?invert_in_darkmode\" align=middle width=9.98290094999999pt height=14.15524440000002pt/\u003e is an element-wise activation function such as Sigmoid and ReLU\n\nFor \u003cimg src=\"svg/1b029168f72ed78ed025d43ee12a30d5.svg?invert_in_darkmode\" align=middle width=91.66476825pt height=33.20539859999999pt/\u003e,\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/b9d3e768d4e92d59a84014706d278bf4.svg?invert_in_darkmode\" align=middle width=646.51203045pt height=171.41278605pt/\u003e\u003c/p\u003e\n\nfinally from eq. (3), we get \u003cimg src=\"svg/78a22c94609d25f957a924c8f0c5b7f7.svg?invert_in_darkmode\" align=middle width=311.90916404999996pt height=37.8085191pt/\u003e.\n\nFor \u003cimg src=\"svg/ad8e23aabf92232befd5b06cd73ac69e.svg?invert_in_darkmode\" align=middle width=91.66476825pt height=33.20539859999999pt/\u003e,\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/4e6aa16db8b9845756c6d23ff43c3a1b.svg?invert_in_darkmode\" align=middle width=725.1369675pt height=349.40093429999996pt/\u003e\u003c/p\u003e\n\nfinally from eq. (3), we get \u003cimg src=\"svg/e573d73540484e851f8d3ee70e3ace68.svg?invert_in_darkmode\" align=middle width=401.9546586pt height=37.8085191pt/\u003e.\n\n**E.g. 6**, prove \u003cimg src=\"svg/42f3c1118cca6aa234b8b5726d36587f.svg?invert_in_darkmode\" align=middle width=186.67800359999998pt height=26.76175259999998pt/\u003e\n\nSince\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/1d94a3056ff8e51a9d5184cc583b69aa.svg?invert_in_darkmode\" align=middle width=79.88556675pt height=17.399144399999997pt/\u003e\u003c/p\u003e\n\nthen\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/a9a3ca06d0cccb54956ca71835c9af3d.svg?invert_in_darkmode\" align=middle width=267.88180485pt height=18.312383099999998pt/\u003e\u003c/p\u003e\n\ntherefore\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"svg/248123e6750f3a3484940afdbf02ca5c.svg?invert_in_darkmode\" align=middle width=192.06611819999998pt height=18.312383099999998pt/\u003e\u003c/p\u003e\n\n\\* *See [examples.md](./examples.md) for more examples.*\n\n\n## 2. Conclusion\nNow, if we fully understand the core mind of the above examples, I believe we can derive most matrix derivatives in [Wiki - Matrix Calculus](https://en.wikipedia.org/wiki/Matrix_calculus) by ourself. Please correct me if there is any mistake, and raise issues to request the detailed steps of computing the matrix derivatives that you are interested in.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLynnHo%2FMatrix-Calculus-Tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FLynnHo%2FMatrix-Calculus-Tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLynnHo%2FMatrix-Calculus-Tutorial/lists"}