{"id":13544863,"url":"https://github.com/breandan/kotlingrad","last_synced_at":"2025-05-15T13:06:04.986Z","repository":{"id":44164647,"uuid":"153732715","full_name":"breandan/kotlingrad","owner":"breandan","description":"🧩 Shape-Safe Symbolic Differentiation with Algebraic Data Types","archived":false,"fork":false,"pushed_at":"2024-12-17T02:09:42.000Z","size":301920,"stargazers_count":534,"open_issues_count":13,"forks_count":21,"subscribers_count":19,"default_branch":"master","last_synced_at":"2025-04-14T21:00:00.422Z","etag":null,"topics":["algebraic-data-types","array-programming","automatic-differentiation","chinese","computer-algebra","differentiable-programming","gradient-descent","kotlin","linear-algebra","message-passing","multi-stage-programming","optimization","shape-safety","symbolic-differentiation","types"],"latest_commit_sha":null,"homepage":"https://breandan.net/public/masters_thesis.pdf#page=49","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/breandan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":"breandan"}},"created_at":"2018-10-19T05:50:11.000Z","updated_at":"2025-03-20T21:43:27.000Z","dependencies_parsed_at":"2024-12-21T10:02:04.822Z","dependency_job_id":"72fc047b-2d9f-48cb-864d-882dd0a72d05","html_url":"https://github.com/breandan/kotlingrad","commit_stats":{"total_commits":1644,"total_committers":8,"mean_commits":205.5,"dds":"0.0042579075425790425","last_synced_commit":"41671d358b7b020ed0189143289718d9199b78ed"},"previous_names":[],"tags_count":26,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/breandan%2Fkotlingrad","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/breandan%2Fkotlingrad/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/breandan%2Fkotlingrad/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/breandan%2Fkotlingrad/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/breandan","download_url":"https://codeload.github.com/breandan/kotlingrad/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254346624,"owners_count":22055808,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algebraic-data-types","array-programming","automatic-differentiation","chinese","computer-algebra","differentiable-programming","gradient-descent","kotlin","linear-algebra","message-passing","multi-stage-programming","optimization","shape-safety","symbolic-differentiation","types"],"created_at":"2024-08-01T11:00:54.410Z","updated_at":"2025-05-15T13:05:59.973Z","avatar_url":"https://github.com/breandan.png","language":"Kotlin","funding_links":["https://github.com/sponsors/breandan"],"categories":["Kotlin"],"sub_categories":[],"readme":"\u003c!--- @file:Suppress(\"ClassName\") ---\u003e\n\u003c!--- @file:Suppress(\"PropertyName\") ---\u003e\n\n# Kotlin∇: Type-safe Symbolic Differentiation for the JVM\n\n[![Kotlin 1.6.20](https://img.shields.io/badge/Kotlin-1.6.20-blue.svg?style=flat\u0026logo=kotlin)](http://kotlinlang.org)\n[![Maven Central](https://img.shields.io/maven-central/v/ai.hypergraph/kotlingrad.svg?label=Maven%20Central)](https://search.maven.org/search?q=g:%22ai.hypergraph%22)\n[![CI](https://github.com/breandan/kotlingrad/workflows/CI/badge.svg)](https://github.com/breandan/kotlingrad/actions)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3549076.svg)](https://doi.org/10.5281/zenodo.3549076)\n\nKotlin∇ is a type-safe [automatic differentiation](http://breandan.net/public/masters_thesis.pdf#1a) framework written in [Kotlin](https://kotl.in). It allows users to express [differentiable programs](http://breandan.net/public/masters_thesis.pdf#1b) with higher-dimensional data structures and operators. We attempt to restrict syntactically valid constructions to those which are algebraically valid and can be checked at compile-time. By enforcing these constraints in the type system, it eliminates certain classes of runtime errors that may occur during the execution of a differentiable program. Due to type-inference, most type declarations may be safely omitted by the end-user. Kotlin∇ strives to be expressive, safe, and notationally similar to mathematics.\n\n## Table of contents\n\n* [Introduction](#introduction)\n* [Supported features](#features)\n* [Usage](#usage)\n  * [Installation](#installation)\n  * [Notation](#notation)\n  * [Shape safety](#shape-safety)\n  * [Higher-rank](#higher-rank-derivatives)\n  * [Higher-order](#higher-order-derivatives)\n  * [Example](#example)\n  * [Variable capture](#variable-capture)\n* [Visualization](#visualization-tools)\n  * [Dataflow graphs](#dataflow-graphs)\n  * [Plotting functions](#plotting)\n  * [Loss curves](#loss-curves)\n* [Testing and gradient checking](#testing)\n* [How does it work?](#how)\n  * [Operator overloading](#operator-overloading)\n  * [First-class functions](#first-class-functions)\n  * [Multi-stage programming](#multi-stage-programming)\n  * [Extension functions](#extension-functions)\n  * [Algebraic data types](#algebraic-data-types)\n  * [Multiple dispatch](#multiple-dispatch)\n  * [Shape-safe tensor operations](#shape-safe-tensor-operations)\n  * [Intermediate representation](#intermediate-representation)\n  * [Property delegation](#property-delegation)\n* [Experimental ideas](#experimental-ideas)\n  * [Church encoding](#church-encoding)\n  * [Type classes](#type-classes)\n  * [Type arithmetic](#type-arithmetic)\n* [Formal grammar](#grammar)\n* [UML diagram](#uml-diagram)\n* [Comparison to other frameworks](#comparison)\n* [References](#references)\n* [Acknowledgements](#special-thanks)\n\n## Introduction\n\nInspired by [Stalin∇](https://github.com/Functional-AutoDiff/STALINGRAD), [Autograd](https://github.com/hips/autograd), [DiffSharp](https://github.com/DiffSharp/DiffSharp), [Myia](https://github.com/mila-udem/myia), [Nexus](https://github.com/ctongfei/nexus), [Tangent](https://github.com/google/tangent), [Lantern](https://github.com/feiwang3311/Lantern) et al., Kotlin∇ attempts to port recent advancements in automatic differentiation (AD) to the Kotlin language. AD is useful for [gradient descent](https://en.wikipedia.org/wiki/Gradient_descent) and has a variety of applications in [numerical optimization](https://uhra.herts.ac.uk/bitstream/handle/2299/4342/903843.pdf) and [machine learning](http://www.jmlr.org/papers/volume18/17-468/17-468.pdf). Our implementation adds a number of experimental ideas, including compile-time [shape-safety](#shape-safety), [algebraic simplification](#multiple-dispatch) and numerical [stability checking](#testing) with property-based testing. We aim to provide an [algebraically-grounded](#operator-overloading) implementation of AD for shape-safe tensor operations. Tensors in Kotlin∇ are represented as [multidimensional arrays](https://en.wikipedia.org/wiki/Tensor#As_multidimensional_arrays).\n\n## Features\n\nKotlin∇ currently supports the following features:\n\n* Arithmetical operations on scalars, vectors and matrices\n* Shape-safe vector and matrix algebra\n* Partial and higher-order differentiation on scalars\n* Property-based testing for numerical gradient checking\n* Recovery of symbolic derivatives from AD\n\nAdditionally, it aims to support:\n\n* PyTorch-style [define-by-run](https://openreview.net/pdf?id=BJJsrmfCZ#section.1) semantics\n* N-dimensional tensors and [higher-order tensor operators](https://en.wikipedia.org/wiki/Tensor_contraction)\n* Fully-general AD over control flow, variable reassignment\n(via [delegation](https://kotlinlang.org/docs/reference/delegated-properties.html)), and array programming, possibly using a typed IR such as [Myia](https://github.com/mila-udem/myia)\n\nAll of these features are implemented without access to bytecode or special compiler tricks - just using [higher-order functions and lambdas](https://kotlinlang.org/docs/reference/lambdas.html) as shown in [Lambda the Ultimate Backpropogator](http://www-bcl.cs.may.ie/~barak/papers/toplas-reverse.pdf), embedded DSLs a la [Lightweight Modular Staging](https://infoscience.epfl.ch/record/150347/files/gpce63-rompf.pdf), and [ordinary generics](https://kotlinlang.org/docs/reference/generics.html). Please see below for a more detailed [feature comparison](#comparison).\n\n## Usage\n\n### Installation\n\nKotlin∇ is hosted on [Maven Central](https://s01.oss.sonatype.org/index.html#nexus-search;quick~kotlingrad). An example project is provided [here](https://github.com/breandan/kotlingrad-consumer).\n\n#### Gradle\n\n```kotlin\ndependencies {\n  implementation(\"ai.hypergraph:kotlingrad:0.4.7\")\n}\n```\n\n#### Maven\n\n```xml\n\u003cdependency\u003e\n  \u003cgroupId\u003eai.hypergraph\u003c/groupId\u003e\n  \u003cartifactId\u003ekotlingrad\u003c/artifactId\u003e\n  \u003cversion\u003e0.4.7\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n#### Jupyter Notebook\n\nTo access Kotlin∇'s notebook support, use the following line magic:\n\n```kotlin\n@file:DependsOn(\"ai.hypergraph:kotlingrad:0.4.7\")\n```\n\nFor more information, explore the [tutorial](samples/notebooks/hello_kotlingrad.ipynb).\n\n### Notation\n\nKotlin∇ operators are [higher-order functions](https://en.wikipedia.org/wiki/Higher-order_function), which take at most two inputs and return a single output, all of which are functions with the same numerical type, and whose shape is denoted using superscript in the rightmost column below. \n\n|                                Math                                |      Infix \u003csup\u003e\u0026dagger;\u003c/sup\u003e  |              Prefix              |     Postfix\u003csup\u003e\u0026Dagger;\u003c/sup\u003e      |                             Operator Type Signature                             |\n|:------------------------------------------------------------------:|:-------------------------------:|:--------------------------------:|:-----------------------------------:|:-------------------------------------------------------------------------------:|\n|    $$\\mathbf{A}(\\mathbf{B})$$\u003cbr\u003e$$\\mathbf{A}\\circ\\mathbf{B}$$     |       `a(b)`\u003cbr\u003e`a of b`        |                                  |                                     |    $$(\\texttt{a}: ℝ^{τ}→ℝ^{π}, \\texttt{b}: ℝ^{λ} → ℝ^{τ}) → (ℝ^{λ}→ℝ^{π})$$     |\n|                    $$\\mathbf{A}\\pm\\mathbf{B}$$                     |       `a + b`\u003cbr\u003e`a - b`        | `plus(a, b)`\u003cbr\u003e`minus(a, b)`    |                                     |    $$(\\texttt{a}: ℝ^{τ}→ℝ^{π}, \\texttt{b}: ℝ^{λ} → ℝ^{π}) → (ℝ^{?}→ℝ^{π})$$     |\n|                      $$\\mathbf{A}\\mathbf{B}$$                      |     `a * b`\u003cbr\u003e`a.times(b)`     |          `times(a, b)`           |                                     | $$(\\texttt{a}: ℝ^{τ}→ℝ^{m×n}, \\texttt{b}: ℝ^{λ}→ℝ^{n×p}) → (ℝ^{?}→ℝ^{m×p})$$    |\n| $$\\frac{\\mathbf{A}}{\\mathbf{B}}$$\u003cbr\u003e$$\\mathbf{A}\\mathbf{B}^{-1}$$ |      `a / b`\u003cbr\u003e`a.div(b)`      |           `div(a, b)`            |                                     |  $$(\\texttt{a}: ℝ^{τ}→ℝ^{m×n}, \\texttt{b}: ℝ^{λ}→ℝ^{p×n}) → (ℝ^{?}→ℝ^{m×p})$$   |\n|                         $$\\pm\\mathbf{A}$$                          |                                 |           `-a`\u003cbr\u003e`+a`           |       `a.neg()`\u003cbr\u003e`a.pos()`        |                  $$(\\texttt{a}: ℝ^{τ}→ℝ^{π}) → (ℝ^{τ}→ℝ^{π})$$                  |\n|             $$\\sin{a}$$\u003cbr\u003e$$\\cos{a}$$\u003cbr\u003e$$\\tan{a}$$              |                                 | `sin(a)`\u003cbr\u003e`cos(a)`\u003cbr\u003e`tan(a)` | `a.sin()`\u003cbr\u003e`a.cos()`\u003cbr\u003e`a.tan()` |                          $$(\\texttt{a}: ℝ→ℝ) → (ℝ→ℝ)$$                          |\n|                             $$\\ln{a}$$                             |                                 |       `ln(a)`\u003cbr\u003e`log(a)`        |        `a.ln()`\u003cbr\u003e`a.log()`        |                $$(\\texttt{a}: ℝ^{τ}→ℝ^{m×m}) → (ℝ^{τ}→ℝ^{m×m})$$                |\n|                           $$\\log_{b}a$$                            |           `a.log(b)`            |           `log(a, b)`            |                                     |     $$(\\texttt{a}: ℝ^{τ}→ℝ^{m×m}, \\texttt{b}: ℝ^{λ}→ℝ^{m×m}) → (ℝ^{?}→ℝ)$$      |\n|                          $$\\mathbf{A}^b$$                          |           `a.pow(b)`            |           `pow(a, b)`            |                                     |     $$(\\texttt{a}: ℝ^{τ}→ℝ^{m×m}, \\texttt{b}: ℝ^{λ}→ℝ) → (ℝ^{?}→ℝ^{m×m})$$      |\n|                  $$\\sqrt{A}$$\u003cbr\u003e$$\\sqrt[3]{A}$$                   |  `a.pow(1.0/2)`\u003cbr\u003e`a.root(3)`  |      `sqrt(a)`\u003cbr\u003e`cbrt(a)`      |      `a.sqrt()`\u003cbr\u003e`a.cbrt()`       |                $$(\\texttt{a}: ℝ^{τ}→ℝ^{m×m}) → (ℝ^{τ}→ℝ^{m×m})$$                |\n| $$\\frac{da}{db},\\frac{\\partial{a}}{\\partial{b}}$$ \u003cbr\u003e $$D_b{a}$$  |  `a.d(b)`\u003cbr\u003e`d(a) / d(b)`      |            `grad(a)[b]`          |                                     |      $$(\\texttt{a}: C(ℝ^{τ}→ℝ)^{*}, \\texttt{b}: C(ℝ^{λ}→ℝ)) → (ℝ^{?}→ℝ)$$       |\n|                           $$\\nabla{a}$$                            |                                 |            `grad(a)`             |             `a.grad()`              |                  $$(\\texttt{a}: C(ℝ^{τ}→ℝ)) → (ℝ^{τ}→ℝ^{τ})$$                   |\n|                      $$\\nabla_{\\mathbf{B}}a$$                      |     `a.d(b)`\u003cbr\u003e`a.grad(b)`     |   `grad(a, b)`\u003cbr\u003e`grad(a)[b]`   |                                     | $$(\\texttt{a}: C(ℝ^{τ}→ℝ^{π}), \\texttt{b}: C(ℝ^{λ}→ℝ^{ω})) → (ℝ^{?}→ℝ^{π×ω})$$  |\n|                    $$\\nabla\\cdot{\\mathbf{A}}$$                     |                                 |            `divg(a)`             |             `a.divg()`              |                  $$(\\texttt{a}: C(ℝ^{τ}→ℝ^{m})) → (ℝ^{τ}→ℝ)$$                   |\n|                    $$\\nabla\\times{\\mathbf{A}}$$                    |                                 |            `curl(a)`             |             `a.curl()`              |                $$(\\texttt{a}: C(ℝ^{3}→ℝ^{3})) → (ℝ^{3}→ℝ^{3})$$                 |\n|                    $$\\mathcal{J}(\\mathbf{A})$$                     |                                 |            `grad(a)`             |             `a.grad()`              |               $$(\\texttt{a}: C(ℝ^{τ}→ℝ^{m})) → (ℝ^{τ}→ℝ^{m×τ})$$                |\n|                         $$\\mathbf{H}(a)$$                          |                                 |            `hess(a)`             |             `a.hess()`              |                 $$(\\texttt{a}: C(ℝ^{τ}→ℝ)) → (ℝ^{τ}→ℝ^{τ×τ})$$                  |\n|                     $$\\Delta{a},\\nabla^{2}a$$                      |                                 |            `lapl(a)`             |             `a.lapl()`              |                  $$(\\texttt{a}: C(ℝ^{τ}→ℝ)) → (ℝ^{τ}→ℝ^{τ})$$                   |\n\nℝ can be a `Double`, `Float` or `BigDecimal`. Specialized operators are defined for subsets of ℝ, e.g., `Int`, `Short` or `BigInteger` for subsets of ℤ, however differentiation is [only defined](https://en.wikipedia.org/wiki/Differentiable_function) for continuously differentiable functions on ℝ.\n\n\u003csup\u003e\u0026dagger;\u003c/sup\u003e `a` and `b` are higher-order functions. These may be constants (e.g., `0`, `1.0`), variables (e.g., `Var()`) or expressions (e.g., `x + 1`, `2 * x + y`).\n\n\u003csup\u003e\u0026Dagger;\u003c/sup\u003e For infix notation, `.` is optional. Parentheses are also optional depending on [precedence](https://kotlinlang.org/docs/reference/functions.html#infix-notation).\n\n\u003csup\u003e\u0026sect;\u003c/sup\u003e Matrix division is defined iff **B** is invertible, although it could be possible to redefine this operator using the [Moore-Penrose inverse](https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse).\n\n\u003csup\u003e\u0026lowast;\u003c/sup\u003e Where C(ℝ\u003csup\u003em\u003c/sup\u003e) is the space of all continuous functions over ℝ. If the function is not over ℝ, it will fail at compile-time. If the function is over ℝ but not continuous differentiable at the point under consideration, it will fail at runtime.\n\n\u003csup\u003e?\u003c/sup\u003e The input shape is tracked at runtime, but not at the type level. While it would be nice to infer a union type bound over the inputs of binary functions, it is likely impossible using the Kotlin type system [without great effort](core/src/commonMain/gen/ai/hypergraph/kotlingrad/typelevel/arity/Variables.kt). If the user desires type checking when invoking higher order functions with literal values, they will need to specify the combined input type explicitly or do so at runtime.\n\n\u003csup\u003eτ, λ, π, ω\u003c/sup\u003e Arbitrary products.\n\n### Higher-Rank Derivatives\n\nKotlin∇ supports derivatives between tensors of up to rank 2. The shape of a tensor derivative depends on (1) the shape of the function under differentiation and (2) the shape of the variable with respect to which we are differentiating.\n\n|     I/O Shape     |    $$ℝ^{?}→ℝ$$    |  $$ℝ^{?}→ℝ^{m}$$   | $$ℝ^{?}→ℝ^{j×k}$$ |\n|:-----------------:|:-----------------:|:------------------:|:-----------------:|\n|   $$ℝ^{?}→ℝ$$     |    $$ℝ^{?}→ℝ$$    |  $$ℝ^{?}→ℝ^{m}$$   | $$ℝ^{?}→ℝ^{j×k}$$ |\n|  $$ℝ^{?}→ℝ^{n}$$  |  $$ℝ^{?}→ℝ^{n}$$  | $$ℝ^{?}→ℝ^{m×n}$$  |        :x:        |\n| $$ℝ^{?}→ℝ^{h×i}$$ | $$ℝ^{?}→ℝ^{h×i}$$ |        :x:         |        :x:        |\n\nMatrix-by-vector, vector-by-matrix, and matrix-by-matrix derivatives require rank 3+ tensors and are currently unsupported.\n\n### Higher-order derivatives\n\nKotlin∇ supports arbitrary order derivatives on scalar functions, and up to 2nd order derivatives on vector functions. Higher-order derivatives on matrix functions are unsupported.\n\n### Shape safety\n\nShape safety is an important concept in Kotlin∇. There are three broad strategies for handling shape errors:\n\n* Hide the error somehow by implicitly reshaping or [broadcasting](https://docs.scipy.org/doc/numpy-1.10.4/user/basics.broadcasting.html) arrays\n* Announce the error at runtime, with a relevant message, e.g., [`InvalidArgumentError`](https://www.tensorflow.org/api_docs/python/tf/errors/InvalidArgumentError)\n* Do not allow programs which can result in a shape error to compile\n\nIn Kotlin∇, we use the last strategy to check the shape of tensor operations. Consider the following program:\n\n```kotlin\n// Inferred type: Vec\u003cDouble, D2\u003e\nval a = Vec(1.0, 2.0)\n// Inferred type: Vec\u003cDouble, D3\u003e\nval b = Vec(1.0, 2.0, 3.0)\n\nval c = b + b\n\n// Does not compile, shape mismatch\n// a + b\n```\n\nAttempting to sum two vectors whose shapes do not match will fail to compile, and they must be explicitly resized.\n\n```kotlin\n// Inferred type: Mat\u003cDouble, D1, D4\u003e\nval a = Mat1x4(1.0, 2.0, 3.0, 4.0)\n// Inferred type: Mat\u003cDouble, D4, D1\u003e\nval b = Mat4x1(1.0, 2.0, 3.0, 4.0)\n\nval c = a * b\n\n// Does not compile, inner dimension mismatch\n// a * a\n// b * b\n```\n\nSimilarly, attempting to multiply two matrices whose inner dimensions do not match will fail to compile.\n\n```kotlin\nval a = Mat2x4( \n  1.0, 2.0, 3.0, 4.0,\n  5.0, 6.0, 7.0, 8.0\n)\n\nval b = Mat4x2( \n  1.0, 2.0,\n  3.0, 4.0,\n  5.0, 6.0,\n  7.0, 8.0\n)\n\n// Types are optional, but encouraged\nval c: Mat\u003cDouble, D2, D2\u003e = a * b \n\nval d = Mat2x1(1.0, 2.0)\n\nval e = c * d\n\nval f = Mat3x1(1.0, 2.0, 3.0)\n\n// Does not compile, inner dimension mismatch\n// e * f\n```\n\nExplicit types are optional but encouraged. [Type inference](https://www.youtube.com/watch?v=MyljSWm0Y_k) helps preserve shape information over long programs.\n\n```kotlin\nfun someMatFun(m: Mat\u003cDouble, D3, D1\u003e): Mat\u003cDouble, D3, D3\u003e = ...\nfun someMatFun(m: Mat\u003cDouble, D2, D2\u003e) = ...\n```\n\nWhen writing a function, it is mandatory to declare the input type(s), but the return type [may be omitted](https://kotlinlang.org/docs/reference/functions.html#explicit-return-types). Shape-safety is currently supported up to rank-2 tensors, i.e. matrices.\n\n### Example\n\nThe following example shows how to derive higher-order partials of a function `z` of type ℝ²→ℝ:\n\n```kotlin\nval z = x * (-sin(x * y) + y) * 4  // Infix notation\nval `∂z∕∂x` = d(z) / d(x)          // Leibniz notation [Christianson, 2012]\nval `∂z∕∂y` = d(z) / d(y)          // Partial derivatives\nval `∂²z∕∂x²` = d(`∂z∕∂x`) / d(x)  // Higher-order derivatives\nval `∂²z∕∂x∂y` = d(`∂z∕∂x`) / d(y) // Higher-order partials\nval `∇z` = z.grad()                // Gradient operator\n\nval values = arrayOf(x to 0, y to 1)\n\nprintln(\"z(x, y) \\t= $z\\n\" +\n  \"z(${values.map { it.second }.joinToString()}) \\t\\t= ${z(*values)}\\n\" +\n  \"∂z/∂x \\t\\t= $`∂z∕∂x` \\n\\t\\t= \" + `∂z∕∂x`(*values) + \"\\n\" +\n  \"∂z/∂y \\t\\t= $`∂z∕∂y` \\n\\t\\t= \" + `∂z∕∂y`(*values) + \"\\n\" +\n  \"∂²z/∂x² \\t= $`∂z∕∂y` \\n\\t\\t= \" + `∂²z∕∂x²`(*values) + \"\\n\" +\n  \"∂²z/∂x∂y \\t= $`∂²z∕∂x∂y` \\n\\t\\t= \" + `∂²z∕∂x∂y`(*values) + \"\\n\" +\n  \"∇z \\t\\t= $`∇z` \\n\\t\\t= [${`∇z`[x]!!(*values)}, ${`∇z`[y]!!(*values)}]ᵀ\")\n```\n\nAny backticks and unicode characters above are simply for readability and have no effect on the behavior. Running [this program](samples/src/main/kotlin/ai/hypergraph/kotlingrad/samples/HelloKotlingrad.kt) via `./gradlew HelloKotlingrad` should produce the following output:\n\n```\nz(x, y)         = ((x) * ((- (sin((x) * (y)))) + (y))) * (4.0)\nz(0, 1)         = 0.0\n∂z/∂x           = d(((x) * ((- (sin((x) * (y)))) + (y))) * (4.0)) / d(x) \n                = 4.0\n∂z/∂y           = d(((x) * ((- (sin((x) * (y)))) + (y))) * (4.0)) / d(y) \n                = 0.0\n∂²z/∂x²         = d(((x) * ((- (sin((x) * (y)))) + (y))) * (4.0)) / d(y) \n                = 4.0\n∂²z/∂x∂y        = d(d(((x) * ((- (sin((x) * (y)))) + (y))) * (4.0)) / d(x)) / d(y) \n                = 4.0\n∇z              = {y=d(((x) * ((- (sin((x) * (y)))) + (y))) * (4.0)) / d(y), x=d(((x) * ((- (sin((x) * (y)))) + (y))) * (4.0)) / d(x)} \n                = [4.0, 0.0]ᵀ\n```\n\n### Variable capture\n\nNot only does Kotlin∇'s type system encode [output shape](#shape-safety), it is also capable of tracking free and bound variables, for order-independent name binding and partial application. Expressions inhabited by free variables are typed as functions until fully bound, at which time they return a concrete value. Consider the following example:\n\n```kotlin\nval q = X + Y * Z + Y + 0.0\nval p0 = q(X to 1.0, Y to 2.0, Z to 3.0) // Name binding\nval p1 = q(X to 1.0, Y to 1.0)(Z to 1.0) // Variadic currying\nval p3 = q(Z to 1.0)(X to 1.0, Y to 1.0) // Any order is possible\nval p4 = q(Z to 1.0)(X to 1.0)(Y to 1.0) // Proper currying\nval p5 = q(Z to 1.0)(X to 1.0) // Returns a partially applied function\nval p6 = (X + Z + 0)(Y to 1.0) // Does not compile\n```\n\nThis feature is made possible by encoding a type-level [Hasse diagram](https://en.wikipedia.org/wiki/Hasse_diagram) over a small set of predefined variable names, with skip-connections for variadic combination and partial application. Curious readers may glean further details by referring to [the implementation](core/src/commonMain/gen/ai/hypergraph/kotlingrad/typelevel/arity/Variables.kt) and [usage example](samples/src/main/kotlin/ai/hypergraph/kotlingrad/samples/VariableCapture.kt).\n\n## Visualization tools\n\nKotlin∇ provides various graphical tools that can be used for visual debugging.\n\n### Dataflow graphs\n\nKotlin∇ functions are a type of [directed acyclic graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph), called dataflow graphs (DFGs). For example, running the expression `((1 + x * 2 - 3 + y + z / y).d(y).d(x) + z / y * 3 - 2).render()` will display the following DFG:\n\n![](samples/src/main/resources/dataflow.svg)\n\nRed and blue edges indicate the right and left inputs to a binary operator, respectively. Consider the DFG for a batch of stochastic gradients on [linear regression](samples/src/main/kotlin/ai/hypergraph/kotlingrad/samples/LinearRegression.kt), which can be written in matrix form as \u003cimg src=\"https://render.githubusercontent.com/render/math?math=\\nabla_{\\Theta}||\\mathbf{Y} - \\mathbf{X}\\Theta||^2\"\u003e:\n\n![](samples/src/main/resources/lr_batch_loss_graph.svg)\n\nThetas represent the hidden parameters under differentiation and the constants are the batch inputs (**X**) and targets (**Y**). When all the free variables are bound to numerical values, the graph collapses into a single node, which can be unwrapped into a Kotlin `Number`.\n\n### Plotting\n\nTo generate the [sample 2D plots](samples/src/main/kotlin/ai/hypergraph/kotlingrad/samples/Plot2D.kt) below, run `./gradlew Plot2D`.\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"samples/src/main/resources/plot.svg\"\u003e\u003c/p\u003e\n\u003cp align=\"center\"\u003e\u003cimg src=\"samples/src/main/resources/hermite.svg\"\u003e\u003c/p\u003e\n\nPlotting is also possible in higher dimensions, [for example](samples/src/main/kotlin/ai/hypergraph/kotlingrad/samples/Plot3D.kt) in 3D via `./gradlew Plot3D`:\n\n![](samples/src/main/resources/ripple.png)\n![](samples/src/main/resources/pulsar.png)\n![](samples/src/main/resources/starquake.png)\n![](samples/src/main/resources/novaflux.png)\n\n### Loss curves\n\nGradient descent is one application for Kotlin∇. Below, is a typical loss curve of SGD on [a multilayer perceptron](samples/src/main/kotlin/ai/hypergraph/kotlingrad/samples/MLP.kt):\n\n![](samples/src/main/resources/mlp_loss.svg)\n\nTo train the model, execute `./gradlew MLP` from within the parent directory.\n\n## Testing\n\nTo run [the tests](core/src/jvmTest/kotlin/ai/hypergraph/kotlingrad), execute `../gradlew allTests` from the `core` directory.\n\nKotlin∇ claims to eliminate certain runtime errors, but how do we know the proposed implementation is not incorrect? One method, borrowed from the Haskell community, is called [property-based testing](http://breandan.net/public/masters_thesis.pdf#33) (PBT), closely related to [metamorphic testing](http://breandan.net/public/masters_thesis.pdf#34). Notable implementations include [QuickCheck](https://github.com/nick8325/quickcheck), [Hypothesis](https://github.com/HypothesisWorks/hypothesis) and [ScalaTest](http://www.scalatest.org/user_guide/property_based_testing) (ported to Kotlin in [Kotest](https://github.com/kotest/kotest)). PBT uses algebraic properties to verify the result of an operation by constructing semantically equivalent but syntactically distinct expressions, which should produce the same answer. Kotlin∇ uses two such equivalences to validate its AD implementation:\n\n* [Analytic differentiation](https://en.wikipedia.org/wiki/Differentiation_rules): manually differentiate and compare the values returned on a subset of the domain with AD.\n* [Finite difference approximation](http://breandan.net/public/masters_thesis.pdf#5a): sample space of symbolic (differentiable) functions, comparing results of AD to FD.\n\nFor example, consider the following test, which checks whether the analytical derivative and the automatic derivative, when evaluated at a given point, are equal to each other within the limits of numerical precision:\n\n```kotlin\nval x by Var()\nval y by Var()\n\nval z = y * (sin(x * y) - x)            // Function under test\nval `∂z∕∂x` = d(z) / d(x)               // Automatic derivative\nval manualDx = y * (cos(x * y) * y - 1) // Analytical derivative \n\n\"∂z/∂x should be y * (cos(x * y) * y - 1)\" {\n  NumericalGenerator.assertAll { ẋ, ẏ -\u003e\n    // Evaluate the results at a given seed\n    val autoEval = `∂z∕∂x`(x to ẋ, y to ẏ) \n    val manualEval = manualDx(x to ẋ, y to ẏ)\n    // Should pass iff Δ(adEval, manualEval) \u003c Ɛ\n    autoEval shouldBeApproximately manualEval\n  }\n}\n```\n\nPBT will search the input space for two numerical values `ẋ` and `ẏ`, which violate the specification, then [\"shrink\"](https://hackage.haskell.org/package/QuickCheck-2.12.6.1/docs/Test-QuickCheck-Arbitrary.html#v:shrink) them to discover pass-fail boundary values. We can construct a similar test using finite differences:\n\n```kotlin\n\"d(sin x)/dx should be equal to (sin(x + dx) - sin(x)) / dx\" {\n  NumericalGenerator.assertAll { ẋ -\u003e\n    val f = sin(x)\n    \n    val `df∕dx` = d(f) / d(x)\n    val adEval = `df∕dx`(ẋ) \n    \n    val dx = 1E-8\n    // Since ẋ is a raw numeric type, sin =\u003e kotlin.math.sin\n    val fdEval = (sin(ẋ + dx) - sin(ẋ)) / dx\n    adEval shouldBeApproximately fdEval\n  }\n}\n```\n\n![](samples/src/main/resources/comparison.svg)\n\nAbove, we [compare numerical errors](samples/src/main/kotlin/ai/hypergraph/kotlingrad/samples/ADSDComparison.kt) for three types of computational differentiation against infinite precision symbolic differentiation (IP):\n\n1. Finite precision automatic differentiation (AD)\n2. Finite precision symbolic differentiation (SD)\n3. Finite precision finite differences (FD)\n\nAD and SD both exhibit relative errors (i.e. with respect to each other) several orders of magnitude lower than their absolute errors (i.e. with respect to IP), which roughly agree to within numerical precision. As expected, FD exhibits numerical error significantly higher than AD and SD due to the inaccuracy of floating-point division.\n\nThere are many other ways to independently verify the numerical gradient, such as [dual numbers](https://en.wikipedia.org/wiki/Dual_number#Differentiation) or the [complex step derivative](https://timvieira.github.io/blog/post/2014/08/07/complex-step-derivative/). Another method is to compare the numerical output against a well-known implementation, such as [TensorFlow](https://github.com/JetBrains/kotlin-native/tree/master/samples/tensorflow). We plan to conduct a more thorough comparison of numerical accuracy and performance.\n\n## How?\n\nTo understand the core of Kotlin∇'s AD implementation, please refer to the [scalar example](core/src/commonMain/kotlin/ai/hypergraph/kotlingrad/api/Scalar.kt).\n\nThis project relies on a few Kotlin-specific language features, which together enable a concise, flexible and type-safe user interface. The following features have proven beneficial to the development of Kotlin∇:\n\n#### Operator overloading\n \n[Operator overloading](https://kotlinlang.org/docs/reference/operator-overloading.html) enables concise notation for arithmetic on abstract types, where the types encode [algebraic structures](http://breandan.net/public/masters_thesis.pdf#page=58), e.g., `Group`, `Ring`, and `Field`. These abstractions are extensible to other kinds of mathematical structures, such as complex numbers and quaternions.\n\nFor example, suppose we have an interface `Group`, which overloads the operators `+` and `*`, and is defined like so:\n\n```kotlin\ninterface Group\u003cT: Group\u003cT\u003e\u003e {\n  operator fun plus(addend: T): T\n\n  operator fun times(multiplicand: T): T\n}\n```\n\nHere, we specify a recursive type bound using a method known as [F-bounded quantification](http://staff.ustc.edu.cn/~xyfeng/teaching/FOPL/lectureNotes/CookFBound89.pdf) to ensure that operations return the concrete type variable `T`, rather than something more abstract like `Group`. Imagine a class `Fun` that has implemented `Group`. It can be used as follows:\n\n```kotlin\nfun \u003cT: Group\u003cT\u003e\u003e cubed(t: T): T = t * t * t\n\nfun \u003cT: Group\u003cT\u003e\u003e twiceCubed(t: T): T = cubed(t) + cubed(t)\n```\n\nLike [Python](https://docs.python.org/3.4/library/operator.html), Kotlin supports overloading a [limited set of operators](https://kotlinlang.org/docs/reference/operator-overloading.html), which are evaluated using a [fixed precedence](https://kotlinlang.org/docs/reference/grammar.html#precedence). In the current version of Kotlin∇, operators do not perform any computation, they simply construct a directed acyclic graph representing the symbolic expression. Expressions are only evaluated when invoked as a function.\n\n#### First-class functions\n\nWith [higher-order functions and lambdas](https://kotlinlang.org/docs/reference/lambdas.html), Kotlin treats [functions as first-class citizens](https://en.wikipedia.org/wiki/First-class_function). This allows us to represent mathematical functions and programming functions with the same underlying abstractions (typed FP). Several [recent](http://www-bcl.cs.may.ie/~barak/papers/toplas-reverse.pdf) [papers](https://papers.nips.cc/paper/8221-backpropagation-with-callbacks-foundations-for-efficient-and-expressive-differentiable-programming.pdf) have demonstrated the expressiveness of this paradigm for automatic differentiation.\n\nIn Kotlin∇, all expressions can be treated as functions. For example:\n\n```kotlin\nfun \u003cT: Group\u003cT\u003e\u003e makePoly(x: Var\u003cT\u003e, y: Var\u003cT\u003e) = x * y + y * y + x * x\nval x by Var()\nval y by Var()\nval f = makePoly(x, y)\nval z = f(1.0, 2.0) // Returns a value\nprintln(z) // Prints: 7\n```\n\nAdditionally, it is possible to build functions consisting of varying dimensional inputs:\n\n```kotlin\nfun \u003cT: Fun\u003cT\u003e\u003e mlp(p1: VFun\u003cT, D3\u003e, p2: MFun\u003cT, D3, D3\u003e, p3: T) =\n  ((p1 * p2 + p1 * p2 * p2 dot p1 + p1) - p3) pow p3\n```\n\n#### Multi-stage programming\n\nKotlin∇ uses [operator overloading](#operator-overloading) in the host language to first construct a [dataflow graph](#dataflow-graphs), but evaluates the graph lazily. Called \"multi-stage programming\", or *staging*, this is a metaprogramming technique from the [ML community](http://ocamllabs.io/iocamljs/staging.html) which enables type-safe runtime code translation and compilation. More recently, staging has been put to effective use for [compiling embedded DSLs](https://static.csg.ci.i.u-tokyo.ac.jp/papers/14/scherr-ecoop2014.pdf) similar to Kotlin∇.\n\nIn its current form, Kotlin∇ takes a \"shallow embedding\" approach. Similar to an [interpreter](https://en.wikipedia.org/wiki/Interpreter_pattern), it adheres closely to the user-defined program and does not perform much code specialization or rewriting for optimization purposes. Unlike an interpreter, it postpones evaluation until all free variables in an expression have been bound. Consider the following snippet, which decides when to evaluate an expression:\n\n```kotlin\nvar EAGER = false\noperator fun invoke(newBindings: Bindings\u003cX\u003e): Fun\u003cX\u003e =\n    Composition(this, newBindings).run { if (bindings.complete || EAGER) evaluate() else this }\n```\n\nIf `bindings` are `complete`, this means there are no unbound variables remaining (implementation omitted for brevity), and we can evaluate the expression to obtain a numerical result. Suppose we have the following user code:\n\n```kotlin\nval x = Var()\nval y = Var()\nval z = Var()\nval f0 = x + y * z\nvar f1 = f0(x to 1).also { println(it) } // Prints: (x + y * z)(x=1)\nvar f2 = f1(y to 2).also { println(it) } // Prints: (x + y * z)(x=1)(y=2)\nvar f3 = f2(z to 3).also { println(it) } // Prints: 7\n```\n\nOnce the last line is reached, all variables are bound, and instead of returning a `Composition`, Kotlin∇ evaluates the function, returning a constant. Alternatively, if `EAGER` mode is enabled, each invocation is applied as early as possible:\n\n```kotlin\nEAGER = true\nf1 = f0(x to 1).also { println(it) } // Prints: 1 + y * z\nf2 = f1(y to 2).also { println(it) } // Prints: 1 + 2 * z\nf3 = f2(z to 3).also { println(it) } // Prints: 7\n```\n\nIn the following section, we describe how evaluation works.\n\n#### Algebraic data types\n\n[Algebraic data types](https://en.wikipedia.org/wiki/Algebraic_data_type) (ADTs) in the form of [sealed classes](https://kotlinlang.org/docs/reference/sealed-classes.html) (a.k.a. sum types) facilitate a limited form of pattern matching over a closed set of subclasses. By using these, the compiler forces us to provide an exhaustive control flow when type checking a sealed class. Consider the following classes:\n\n```kotlin\nclass Const\u003cT: Fun\u003cT\u003e\u003e(val number: Number) : Fun\u003cT\u003e()\nclass Sum\u003cT: Fun\u003cT\u003e\u003e(val left: Fun\u003cT\u003e, val right: Fun\u003cT\u003e) : Fun\u003cT\u003e()\nclass Prod\u003cT: Fun\u003cT\u003e\u003e(val left: Fun\u003cT\u003e, val right: Fun\u003cT\u003e) : Fun\u003cT\u003e()\nclass Var\u003cT: Fun\u003cT\u003e\u003e: Fun\u003cT\u003e() { override val variables: Set\u003cVar\u003cX\u003e\u003e = setOf(this) }\nclass Zero\u003cT: Fun\u003cT\u003e\u003e: Const\u003cT\u003e(0.0)\nclass One\u003cT: Fun\u003cT\u003e\u003e: Const\u003cT\u003e(1.0)\n```\n\nWhen checking the type of a sealed class, consumers must explicitly handle every case, as incomplete control flow will produce a compiler error rather than fail at runtime. Consider a simplified definition of the superclass `Fun`, which defines invocation and differentiation using a restricted form of pattern matching:\n\n```kotlin\nsealed class Fun\u003cX: Fun\u003cX\u003e\u003e(open val variables: Set\u003cVar\u003cX\u003e\u003e = emptySet()): Group\u003cFun\u003cX\u003e\u003e {\n    constructor(vararg fns: Fun\u003cX\u003e): this(fns.flatMap { it.variables }.toSet())\n\n    // Since the subclasses of Fun are a closed set, no `else  ...` is required.\n    operator fun invoke(map: Bindings\u003cX\u003e): Fun\u003cX\u003e = when (this) {\n        is Const -\u003e this\n        is Var -\u003e map.getOrElse(this) { this } // Partial application is permitted\n        is Prod -\u003e left(map) * right(map) // Smart casting implicitly casts after checking\n        is Sum -\u003e left(map) + right(map)\n    }\n\n    fun d(variable: Var\u003cX\u003e): Fun\u003cX\u003e = when(this) {\n       is Const -\u003e Zero\n       is Var -\u003e if (variable == this) One else Zero\n       // Product rule: d(u*v)/dx = du/dx * v + u * dv/dx\n       is Prod -\u003e left.d(variable) * right + left * right.d(variable)\n       is Sum -\u003e left.d(variable) + right.d(variable)\n    }\n\n    operator fun plus(addend: Fun\u003cT\u003e) = Sum(this, addend)\n\n    operator fun times(multiplicand: Fun\u003cT\u003e) = Prod(this, multiplicand)\n}\n```\n\nSymbolic differentiation as implemented by Kotlin∇ has two distinct passes, one for differentiation and one for evaluation. Differentiation constitutes a top-down substitution process on the computation graph and evaluation propagates the values from the bottom, up. This reduction semantics for this procedure are described more precisely in [the specification](https://github.com/breandan/kotlingrad/blob/master/specification.md#reduction-semantics).\n\n[![](latex/figures/kotlingrad_diagram.png)](http://breandan.net/public/masters_thesis.pdf#page=58)\n\nKotlin∇ functions are not only data structures, but Kotlin functions which can be invoked by passing a [`Bindings`](/core/src/commonMain/kotlin/ai/hypergraph/kotlingrad/api/Bindings.kt) instance (effectively, a `Map\u003cFun\u003cX\u003e, Fun\u003cX\u003e\u003e`). To enable this functionality, we overload the [`invoke` operator](https://kotlinlang.org/docs/reference/operator-overloading.html#invoke), then recurse over the graph, using `Bindings` as a lookup table. If a matching subexpression is found, we propagate the bound value instead of the matching function. This is known as the [interpreter pattern](https://en.wikipedia.org/wiki/Interpreter_pattern).\n\nKotlin's [smart casting](https://kotlinlang.org/docs/reference/typecasts.html#smart-casts) is an example of [flow-sensitive type analysis](https://en.wikipedia.org/wiki/Flow-sensitive_typing) where the abstract type `Fun` can be treated as `Sum` after performing an `is Sum` check. Without smart casting, we would need to write `(this as Sum).left` to access the member, `left`, causing a potential `ClassCastException` if the cast were mistaken.\n\n#### Extension functions\n\nBy using [extension functions](https://kotlinlang.org/docs/reference/extensions.html), users can convert between numerical types in the host language and our eDSL, by augmenting classes with additional operators. [Context-oriented programming](https://proandroiddev.com/an-introduction-context-oriented-programming-in-kotlin-2e79d316b0a2), allows users to define custom extensions without requiring subclasses or inheritance.\n\n```kotlin\ndata class Const\u003cT: Group\u003cT\u003e\u003e(val number: Double) : Fun()\ndata class Sum\u003cT: Group\u003cT\u003e\u003e(val e1: Fun, val e2: Fun) : Fun()\ndata class Prod\u003cT: Group\u003cT\u003e\u003e(val e1: Fun, val e2: Fun) : Fun()\n\nclass Fun\u003cT: Group\u003cT\u003e\u003e: Group\u003cFun\u003cT\u003e\u003e {\n  operator fun plus(addend: Fun\u003cT\u003e) = Sum(this, addend)\n  \n  operator fun times(multiplicand: Fun\u003cT\u003e) = Prod(this, multiplicand)\n}\n\nobject DoubleContext {\n  operator fun Number.times(expr: Fun\u003cDouble\u003e) = Const(toDouble()) * expr\n}\n```\n\nNow, we can use the context to define another extension, `Fun.multiplyByTwo`, which computes the product inside a `DoubleContext`, using the operator overload we defined above:\n\n```kotlin\nfun Fun\u003cDouble\u003e.multiplyByTwo() = with(DoubleContext) { 2 * this } // Uses `*` operator in DoubleContext\n```\n\nExtensions can also be defined in another file or context and imported on demand. For example, Kotlin∇ also uses extensions to define [shape-safe](#shape-safe-tensor-operations) constructors and operators for vector and matrix arithmetic.\n\n#### Multiple dispatch\n\nIn conjunction with ADTs, Kotlin∇ also uses [multiple dispatch](https://en.wikipedia.org/wiki/Multiple_dispatch) to instantiate the most specific result type of [applying an operator](https://github.com/breandan/kotlingrad/blob/09f4aaf789238820fb5285706e0f1e22ade59b7c/src/main/kotlin/ai/hypergraph/kotlingrad/functions/Function.kt#L24-L38) based on the type of its operands. While multiple dispatch is not an explicit language feature, it can be emulated using inheritance.\n\nBuilding on the previous example, a common task in AD is to [simplify a graph](http://deeplearning.net/software/theano/extending/optimization.html). This is useful in order to minimize the total number of calculations required, improving numerical stability. We can eagerly simplify expressions based on algebraic [rules of replacement](https://en.wikipedia.org/wiki/Rule_of_replacement). Smart casting allows us to access members of a class after checking its type, without explicitly casting it:\n\n[//]: # (Note: numerical stability is sensitive to the order of rewriting, cf. https://en.wikipedia.org/wiki/Kahan_summation_algorithm)\n\n```kotlin\noverride fun times(multiplicand: Function\u003cX\u003e): Function\u003cX\u003e = when {\n  this == zero -\u003e this\n  this == one -\u003e multiplicand\n  multiplicand == one -\u003e this\n  multiplicand == zero -\u003e multiplicand\n  this == multiplicand -\u003e pow(two)\n  this is Const \u0026\u0026 multiplicand is Const -\u003e const(value * multiplicand.value)\n  // Further simplification is possible using rules of replacement\n  else -\u003e Prod(this, multiplicand)\n}\n\nval result = Const(2.0) * Sum(Var(2.0), Const(3.0)) // Sum(Prod(Const(2.0), Var(2.0)), Const(6.0))\n```\n\nThis allows us to put all related control flow on a single abstract class which is inherited by subclasses, simplifying readability, debugging and refactoring.\n\n\n#### Shape-safe tensor operations\n\nWhile first-class [dependent types](https://wiki.haskell.org/Dependent_type) are useful for ensuring arbitrary shape safety (e.g., when concatenating and reshaping matrices), they are unnecessary for simple equality checking (such as when multiplying two matrices). When the shape of a tensor is known at compile-time, it is possible to encode this information using a less powerful type system*, as long as it supports subtyping and parametric polymorphism (a.k.a. generics). In practice, we can implement a shape-checked tensor arithmetic in languages like Java, Kotlin, C++, C# or Typescript, which accept generic type parameters. In Kotlin, whose type system is [less expressive](https://kotlinlang.org/docs/reference/generics.html#variance) than Java, we use the following strategy.\n\nShape safety is currently supported up to rank-2 tensors, i.e. matrices. To perform dimension checking in our type system, we first enumerate a list of integer type literals as a chain of subtypes, `C \u003c: C - 1 \u003c: C - 2 \u003c: ... \u003c: 1 \u003c: 0`, where `C` is the largest fixed-length dimension we wish to represent, which can be specified by the user prior to compilation. This guarantees linear space and time complexity for subtype checking, with a constant upper bound.\n\n```kotlin\n@file:Suppress(\"ClassName\")\ninterface Nat\u003cT: D0\u003e { val i: Int } // Used for certain type bounds\nsealed class D0(open val i: Int = 0) { companion object: D0(), Nat\u003cD0\u003e }\nsealed class D1(override val i: Int = 1): D0(i) { companion object: D1(), Nat\u003cD1\u003e }\nsealed class D2(override val i: Int = 2): D1(i) { companion object: D2(), Nat\u003cD2\u003e }\nsealed class D3(override val i: Int = 3): D2(i) { companion object: D3(), Nat\u003cD3\u003e }\n//... † Automatically generated\n```\n\nNext, we overload the call operator to emulate instantiating a collection literal, using arity to infer its dimensionality. Consider the rank-1 case for length inference on vector literals:\n\n```kotlin\nopen class Vec\u003cE, Len: D1\u003e(val contents: List\u003cE\u003e)\nfun \u003cT\u003e Vec(t1: T): Vec\u003cT, D1\u003e = Vec(listOf(t1))\nfun \u003cT\u003e Vec(t1: T, t2: T): Vec\u003cT, D2\u003e = Vec(listOf(t1, t2))\nfun \u003cT\u003e Vec(t1: T, t2: T, t3: T): Vec\u003cT, D3\u003e = Vec(listOf(t1, t2, t3))\n//... † Automatically generated\n```\n\nFinally, we encode length as a parameter of the operand type. Since integer literals are a chain of subtypes, we need only define one operator using the highest literal, and can rely on [Liskov substitution](https://en.wikipedia.org/wiki/Liskov_substitution_principle) to preserve shape safety for all subtypes.\n\n```kotlin\ninfix operator fun \u003cC: D1, V: Vec\u003cInt, C\u003e\u003e V.plus(v: V): Vec\u003cInt, C\u003e =\n  Vec(contents.zip(v.contents).map { it.first + it.second })\n```\n\nThe operator `+` can now be used like so. Incompatible operands will cause a type error:\n\n```kotlin\nval one = Vec(1, 2, 3) + Vec(1, 2, 3)          // Always runs safely\nval add = Vec(1, 2, 3) + Vec(listOf(/*...*/))  // May fail at runtime\nval sum = Vec(1, 2) + add                      // Does not compile\n```\n\nA similar syntax is available for [matrices](core/src/commonMain/kotlin/ai/hypergraph/kotlingrad/api/Matrix.kt) and higher-rank [tensors](core/src/commonMain/kotlin/ai/hypergraph/kotlingrad/api/Tensor.kt). For example, Kotlin∇ can infer the shape of multiplying two matrices, and will not compile if their inner dimensions do not match:\n\n```kotlin\nopen class Mat\u003cX, R: D1, C: D1\u003e(vararg val rows: Vec\u003cX, C\u003e)\nfun \u003cX\u003e Mat1x2(d0: X, d1: X): Mat\u003cX, D1, D2\u003e = Mat(Vec(d0, d1))\nfun \u003cX\u003e Mat2x1(d0: X, d1: X): Mat\u003cX, D2, D1\u003e = Mat(Vec(d0), Vec(d1))\n//... † Automatically generated\noperator fun \u003cQ: D1, R: D1, S: D1\u003e Mat\u003cInt, Q, R\u003e.times(m: Mat\u003cInt, R, S\u003e): Mat\u003cInt, Q, S\u003e = TODO()\n\n// Inferred type: Mat\u003cInt, D4, D4\u003e\nval l = Mat4x4(\n  1, 2, 3, 4,\n  5, 6, 7, 8,\n  9, 0, 0, 0,\n  9, 0, 0, 0\n)\n\n// Inferred type: Mat\u003cInt, D4, D3\u003e\nval m = Mat4x3(\n  1, 1, 1,\n  2, 2, 2,\n  3, 3, 3,\n  4, 4, 4\n)\n\n// Inferred type: Mat\u003cInt, D4, D3\u003e\nval lm = l * m\n// m * m // Compile error: Expected Mat\u003c3, *\u003e, found Mat\u003c4, 3\u003e\n```\n\n[Further examples](samples/src/main/kotlin/ai/hypergraph/kotlingrad/samples/MatrixDemo.kt) are provided for shape-safe matrix operations such as addition, subtraction and transposition.\n\nA similar technique is possible in Haskell, which is capable of a more powerful form of type-level computation, [type arithmetic](https://wiki.haskell.org/Type_arithmetic). Type arithmetic makes it easy to express [convolutional arithmetic](https://arxiv.org/pdf/1603.07285.pdf) and other arithmetic operations on shape variables (say, splitting a vector in half), which is currently not possible, or would require enumerating every possible combination of type literals.\n\n\u003csup\u003e\u0026lowast;\u003c/sup\u003e Many type systems are still capable of performing arbitrary computation in the type checker. As specified, Java's type system is [known to be Turing Complete](https://arxiv.org/pdf/1605.05274.pdf). It may be possible to emulate a limited form of dependent types in Java by exploiting this property, although this may not be computationally tractable due to the practical limitations noted by Grigore.\n\n\u003csup\u003e\u0026dagger;\u003c/sup\u003e Statically generated code, shipped within the library. To regenerate these methods (e.g., using larger dimensions), a code generator is [provided](shipshape/src/main/kotlin/ai/hypergraph/shipshape).\n\n#### Intermediate representation\n\nKotlin∇ programs are [staged](#multi-stage-programming) into [Kaliningraph](https://github.com/breandan/kaliningraph), an experimental IR for graph computation. As written by the user, many graphs are computationally suboptimal due to expression swell and parameter sharing. To accelerate forward- and backpropagation, it is often advantageous to simplify the graph by applying the [reduction semantics](https://github.com/breandan/kotlingrad/blob/master/specification.md#operational-semantics) in a process known as [graph canonicalization](https://en.wikipedia.org/wiki/Graph_canonization). Kaliningraph enables compiler-like optimizations over the graph such as expression simplification and analytic root-finding, and supports features for visualization and debugging, e.g., in [computational notebooks](https://github.com/breandan/kotlingrad/blob/master/samples/notebooks/hello_kotlingrad.ipynb).\n\n#### Property delegation\n\n[Property delegation](https://kotlinlang.org/docs/reference/delegated-properties.html) is a reflection feature in the Kotlin language which lets us access properties to which an instance is bound. For example, we can read the property name like so:\n\n```kotlin\nclass Var(val name: String?) {\n  operator fun getValue(thisRef: Any?, property: KProperty\u003c*\u003e) = Var(name ?: property.name)\n}\n```\n\nThis feature allows consumers to instantiate variables e.g., in an embedded DSL without redeclaring their names:\n\n```kotlin\nval x by Var()   // With property delegation\nval x = Var(\"x\") // Without property delegation\n```\n\nWithout property delegation, users would need to repeat the property name in the constructor.\n\n## Experimental ideas\n\nThe current API is stable but can be [improved](https://github.com/breandan/kotlingrad/issues) in many ways. Currently, Kotlin∇ does not infer a function's input dimensionality (i.e. free variables and their corresponding shape). While it is possible to perform [variable capture](#variable-capture) over a small alphabet using [type safe currying](samples/src/main/kotlin/ai/hypergraph/kotlingrad/samples/VariableCapture.kt), this technique incurs a large source code [overhead](core/src/commonMain/kotlin/ai/hypergraph/kotlingrad/typelevel/VariableCapture.kt). It may be possible to reduce the footprint using [phantom types](https://gist.github.com/breandan/d0d7c21bb7f78ef54c21ce6a6ac49b68) or some form of union type bound (cf. [Kotlin](https://kotlinlang.org/docs/reference/generics.html#upper-bounds), [Java](https://docs.oracle.com/javase/tutorial/java/generics/bounded.html)).\n\nWhen the shape of an N-dimensional array is known at compile-time, we can use [type-level integers](shipshape/src/main/kotlin/ai/hypergraph/shipshape/DimGen.kt) to ensure shape conforming tensor operations (inspired by [Nexus](https://github.com/ctongfei/nexus) and others).\n\nAllowing users to specify a matrix's structure in its type signature, (e.g., `Singular`, `Symmetric`, `Orthogonal`, `Unitary`, `Hermitian`, `Toeplitz`) would allow us to specialize derivation over such matrices (cf. [section 2.8](https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf#page=14) of The Matrix Cookbook).\n\n### Church encoding\n\nComputers appear to be very complicated machines. Beneath this complexity lies a remarkably simple idea: many apparently complex routines can be rewritten in terms of function composition. Consider the binary operator `^`, which can be lowered as follows:\n\n```\na ^ b :=  a * ... * a \n          \\_________/\n            b times\na * b :=  a + ... + a \n          \\_________/\n            b times\na + b :=  a + 1 + ... + 1\n              \\_________/\n                b times\na := next*(next(...next(1)...))\n     \\________________/\n          a times\n```\n\u0026lowast; `next` is also called `S` in [Peano arithmetic](https://en.wikipedia.org/wiki/Successor_function).\n\nBy using the λ-calculus, Church [tells us](https://compcalc.github.io/public/church/church_calculi_1941.pdf#page=9), we can lower a large portion of mathematics onto a single operator: function application. Curry, by way of [Schönfinkel](https://writings.stephenwolfram.com/data/uploads/2020/12/Schonfinkel-OnTheBuildingBlocksOfMathematicalLogic.pdf), gives us combinatory logic, a kind of Rosetta stone for deciphering and translating between a host of cryptic languages. These two ideas, λ-calculus and combinators, are keys to unlocking many puzzles in computer science and mathematics.\n\nThough mathematically elegant, Church numerals are not particularly efficient or pleasant to read. One discovers that trying to encode Church arithmetic in a language without dependent types grows quickly impractical. By selecting a higher radix, however, it is possible to reduce spatial complexity and improve readability, albeit at the cost of increased temporal complexity on certain operations (e.g., `+` and `-`). Kotlin∇ uses a [binary encoding](#type-arithmetic) by default, however generators for other bases are also provided for convenience.\n\n### Type classes\n\nThe trouble with numerical towers is that they assume all inheritors are aware of the tower. In practice, many types we would like to reuse are entirely oblivious to our DSL. How do we allow users to bring in existing types without needing to modify their source code? This kind of [ad hoc polymorphism](https://en.wikipedia.org/wiki/Ad_hoc_polymorphism) can be achieved using a pattern called the [type class](https://en.wikipedia.org/wiki/Type_class). While the JVM does not allow multiple inheritance on classes, it does support multiple inheritance and [default methods](https://docs.oracle.com/javase/tutorial/java/IandI/defaultmethods.html) on interfaces, allowing users to implement an interface via delegation rather than inheritance.\n\nSuppose we have a base type, `Nat` defined as an interface with a unitary member, `nil`, and its successor function, `next`, representing the [Church encoding](https://en.wikipedia.org/wiki/Church_axioms) for natural numbers. To emulate instantiation, we can provide a [nested class](https://kotlinlang.org/docs/nested-classes.html) equipped with a constructor overriding `nil` and `next` as follows:\n\n```kotlin\ninterface Nat\u003cT\u003e {\n  val nil: T\n  val one: T get() = nil.next()\n  fun T.next(): T\n\n  class of\u003cT\u003e(\n    override val nil: T,\n    val vnext: T.() -\u003e T\n  ): Nat\u003cT\u003e {\n    override fun T.next(): T = vnext()\n  }\n}\n```\n\nNow, if we wanted to wrap an external type, such as `Double`, inside our tower, we could do so as follows:\n\n```kotlin\nval doubleNat = Nat.of(nil = 0.0) { this + 1.0 }\n```\n\nAlthough the `Nat` interface is very expressive, evaluating arithmetic expressions on `Nat`s can be computationally expensive. For instance, we could define the first three [hyperoperations](https://en.wikipedia.org/wiki/Hyperoperation) naïvely as follows:\n\n```kotlin\ntailrec fun \u003cT\u003e Nat\u003cT\u003e.plus(l: T, r: T, acc: T = l, i: T = nil): T =\n  if (i == r) acc else plus(l, r, acc.next(), i.next())\n\ntailrec fun \u003cT\u003e Nat\u003cT\u003e.times(l: T, r: T, acc: T = nil, i: T = nil): T =\n  if (i == r) acc else times(l, r, acc + l, i.next())\n\ntailrec fun \u003cT\u003e Nat\u003cT\u003e.pow(base: T, exp: T, acc: T = one, i: T = one): T =\n  if (i == exp) acc else pow(base, exp, acc * base, i.next())\n```\n\nHowever, we note that computing `pow(a, b)` using this representation requires 𝓞(a↑b) operations using [Knuth notation](https://en.wikipedia.org/wiki/Knuth%27s_up-arrow_notation). Clearly, we must do better if this encoding is to be usable. We can make `Nat` more efficient by introducing a subtype, `Group`, which forces implementors to define a native addition operator:\n\n```kotlin\ninterface Group\u003cT\u003e: Nat\u003cT\u003e {\n  override fun T.next(): T = this + one\n  override fun T.plus(t: T): T\n\n  class of\u003cT\u003e(\n    override val nil: T, override val one: T,\n    val plus: (T, T) -\u003e T\n  ): Group\u003cT\u003e {\n    override fun T.plus(t: T) = plus(this, t)\n  }\n}\n```\n\nGiven a `Group`, we can now define a more efficient implementation of Fibonacci. This will use the group-specific addition operator:\n\n```kotlin\ntailrec fun \u003cT\u003e Nat\u003cT\u003e.fibonacci(\n  n: T,\n  seed: Pair\u003cT, T\u003e = nil to one,\n  fib: (Pair\u003cT, T\u003e) -\u003e Pair\u003cT, T\u003e = { (a, b) -\u003e b to a + b },\n  i: T = nil,\n): T =\n  if (i == n) fib(seed).first\n  else fibonacci(n = n, seed = fib(seed), i = i.next())\n\nval doubleGroup = Group.of(one = 1.0, plus = { a, b -\u003e a + b })\nprintln(doubleGroup.fibonacci(10.0)) // Prints: 233.0\n```\n\nWe could further extend this chain by introducing a subtype called `Ring`, which overrides `+` and requires implementors to define a native `*` operator. `Ring`s and their relatives are known to have many useful applications in [graph theory](https://github.com/breandan/kaliningraph#algebra) and [statistics](https://github.com/breandan/markovian#algebraic-methods):\n\n```kotlin\ninterface Ring\u003cT\u003e: Group\u003cT\u003e {\n  override fun T.plus(t: T): T\n  override fun T.times(t: T): T\n\n  class of\u003cT\u003e(\n    override val nil: T, override val one: T,\n    val plus: (T, T) -\u003e T,\n    val times: (T, T) -\u003e T\n  ): Ring\u003cT\u003e {\n    override fun T.plus(t: T) = plus(this, t)\n    override fun T.times(t: T) = times(this, t)\n  }\n}\n\nval doubleRing = Ring.of(one = 1.0, plus = { a, b -\u003e a + b }, times = { a, b -\u003e a * b })\n```\n\nSince differentiation is a [linear map](https://en.wikipedia.org/wiki/Linear_map) between function spaces, we now have the primitives necessary to build a fully-generic AD system, and could easily implement the [sum and product rules](https://compcalc.github.io/public/pytorch/ad_pytorch.pdf#page=6). To view the above example in full, see [`Types.kt`](https://github.com/breandan/kaliningraph/blob/master/src/commonMain/kotlin/ai/hypergraph/kaliningraph/types/Types.kt).\n\nWhat benefit does this abstraction provide to the end user? By parameterizing over primitive operators, Kotlin∇ consumers can easily swap out a tensor backend without needing to alter or recompile any upstream dependencies. This feature makes multiplatform development a breeze: wherever a type class operator (e.g., `+` or `*`) with matching signature is encountered across a project, it will be dispatched to the user-supplied lambda delegate for specialized execution on custom hardware. Runtime indirection can be elided with proper compiler inlining for zero-cost abstraction.\n\n### Type arithmetic\n\nBy default, Kotlin∇ supports compile time type arithmetic in the following domain:\n\n* Fully symmetric arithmetic: `{ a ⍟ b ϵ [0..16){+,-,*}[0..16) | 0 ≤ a ⍟ b }`\n* Asymmetric arithmetic: `{ a ⍟ b ϵ [0..512){+,-}[0..16) | 0 ≤ a ⍟ b \u003c 512 }`\n* Semi-symmetric arithmetic: `{ a / b = c, a = b * c | a, b, c ϵ [0..128) \u0026 a % b = 0 }`\n\nArithmetic outside this domain is checked at runtime, prior to evaluation.\n\nCompile time type arithmetic is achieved by generating a type-level representation of the [Church encoding](#church-encoding). A usage example is shown in [`ChurchArithmeticTest.kt`](/core/src/commonTest/kotlin/ai/hypergraph/kotlingrad/typelevel/church/ChurchArithmeticTest.kt), which may be run with the following command:\n\n```sh\n./gradlew :kotlingrad:cleanJvmTest :kotlingrad:jvmTest --tests \"ai.hypergraph.kotlingrad.typelevel.church.ChurchArithmeticTest\"\n```\n\nExtensions to other bases, including [binary](/core/src/commonTest/kotlin/ai/hypergraph/kotlingrad/typelevel/binary/BinaryArithmeticTest.kt) and [decimal](/core/src/commonTest/kotlin/ai/hypergraph/kotlingrad/typelevel/chinese/AbacusTest.kt) are also provided, which may be used as follows:\n\n```kotlin\n// Boolean arithmetic\nval b32 = T.F\n  .let { it + T.F }   // B_4\u003cØ\u003e\n  .let { it + T.F.F } // B_8\u003cØ\u003e\n  .let { it + T.T }   // T\u003cT\u003cF\u003cT\u003cØ\u003e\u003e\u003e\u003e\n  .let { it + T.F }   // T\u003cF\u003cT\u003cT\u003cØ\u003e\u003e\u003e\u003e\n  .let { it - T.F }   // T\u003cT\u003cF\u003cT\u003cØ\u003e\u003e\u003e\u003e\n  .let { it + T.F }   // T\u003cF\u003cT\u003cT\u003cØ\u003e\u003e\u003e\u003e\n  .let { it + T.F }   // T\u003cT\u003cT\u003cT\u003cØ\u003e\u003e\u003e\u003e\n  .let { it + T }     // T\u003cF\u003cF\u003cF\u003cØ\u003e\u003e\u003e\u003e\n\nassertEquals(T.F.F.F.F, b32)\n\n// Chinese arithmetic\nval 四十二 = (十七 减 九)\n  .let { it 加 it }        // 六\u003c一\u003c无\u003e\u003e\n  .let { (it 加 八) 加 六 } // 零\u003c三\u003c无\u003e\u003e\n  .let { (it 减 三) 加 九 } // 六\u003c三\u003c无\u003e\u003e\n  .let { (it 加 六) 除 六 } // 七\u003c无\u003e\n  .let { (it 乘 六) 加 五 } // 七\u003c四\u003c无\u003e\u003e\n  .let { (it 减 三) 减 九 } // 五\u003c三\u003c无\u003e\u003e\n  .let { (it 加 五) 加 二 } // 二\u003c四\u003c无\u003e\u003e\n  .also { assertEquals(六 乘 七, it) }\n\nassertEquals(42, 四十二.toInt())\n```\n\nTo alter the arithmetic domain, edit the file [`BinGen.kt`](/shipshape/src/main/kotlin/ai/hypergraph/shipshape/BinGen.kt)/[`算盘厂.kt`](/shipshape/src/main/kotlin/ai/hypergraph/shipshape/算盘厂.kt), then use the following command to regenerate [`Arithmetic.kt`](/core/src/commonMain/gen/ai/hypergraph/kotlingrad/typelevel/binary/Arithmetic.kt)/[`算盘.kt`](/core/src/commonMain/gen/ai/hypergraph/kotlingrad/typelevel/chinese/算盘.kt):\n\n```sh\n./gradlew genShapes\n```\n\nIn practice, compile time type arithmetic may struggle to compute numbers in excess of `4095`. The Kotlin team has been informed of these issues:\n\n* [KT-30040](https://youtrack.jetbrains.com/issue/KT-30040)\n* [~~KT-50466~~](https://youtrack.jetbrains.com/issue/KT-50466)\n* [KT-50533](https://youtrack.jetbrains.com/issue/KT-50533)\n* [KT-50553](https://youtrack.jetbrains.com/issue/KT-50553)\n* [~~KT-50617~~](https://youtrack.jetbrains.com/issue/KT-50617)\n\nThis API is experimental and subject to change without notice. In the future, it will be used to statically type check tensor functions whose output shape is an arithmetic function of the input shapes, e.g., concatenation, splitting and [convolution](https://arxiv.org/pdf/1603.07285.pdf).\n\n## Grammar\n\nFor a detailed grammar and semantics, please refer to [the Kotlin∇ specification](specification.md).\n\n## UML Diagram\n\nThe following graph depicts the subtyping relation between classes and interfaces in Kotlin∇.\n\n[![](samples/src/main/resources/uml_diagram.svg)](https://raw.githubusercontent.com/breandan/kotlingrad/master/samples/src/main/resources/uml_diagram.svg)\n\n## Comparison\n\nUnlike certain frameworks which simply wrap an existing AD library in a type-safe DSL, Kotlin∇ contains a fully shape-safe implementation of algorithmic differentiation, written in pure Kotlin. By doing so, it can leverage Kotlin language features such as typed functional programming, as well as interoperability with other languages on the JVM platform. Furthermore, it implements [symbolic differentiation](http://breandan.net/public/masters_thesis.pdf#2a), which unlike Wengert tape or dual-number based ADs, allows it to calculate derivatives of arbitrarily high order with zero extra engineering required. Further details can be found below.\n\n|                                    Framework                                     | Language |        SD¹         |        AD²         |        HD³         |        DP⁴         |        FP⁵         |        TS⁶         |        SS⁷         |        DT⁸         |        MP⁹         |\n|:--------------------------------------------------------------------------------:|:--------:|:------------------:|:------------------:|:------------------:|:------------------:|:------------------:|:------------------:|:------------------:|:------------------:|:------------------:|\n|                                     Kotlin∇                                      |  Kotlin  | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |   :construction:   | :heavy_check_mark: |\n|               [DiffSharp](https://diffsharp.github.io/DiffSharp/)                |    F#    |        :x:         | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |        :x:         |        :x:         |        :x:         |\n|       [TensorFlow.FSharp](https://github.com/fsprojects/TensorFlow.FSharp)       |    F#    |        :x:         |        :x:         |        :x:         | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |        :x:         |        :x:         |\n|               [shapesafe](https://github.com/tribbloid/shapesafe)                |  Scala   |   :construction:   |   :construction:   |   :construction:   |   :construction:   | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |   :construction:   |        :x:         |\n|                        [Nexus](https://tongfei.me/nexus/)                        |  Scala   |        :x:         | :heavy_check_mark: |        :x:         | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |        :x:         |        :x:         |\n|                [Lantern](https://feiwang3311.github.io/Lantern/)                 |  Scala   |        :x:         | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |        :x:         |        :x:         |        :x:         |\n|           [Hipparchus](https://github.com/Hipparchus-Math/hipparchus)            |   Java   |        :x:         | :heavy_check_mark: |        :x:         |        :x:         |        :x:         | :heavy_check_mark: |        :x:         |        :x:         |        :x:         |\n|                [JAutoDiff](https://github.com/uniker9/JAutoDiff/)                |   Java   | :heavy_check_mark: | :heavy_check_mark: |        :x:         |        :x:         |        :x:         | :heavy_check_mark: |        :x:         |        :x:         |        :x:         |\n|                   [Eclipse DL4J](https://deeplearning4j.org/)                    |   Java   |        :x:         |   :construction:   |        :x:         |        :x:         |        :x:         | :heavy_check_mark: |        :x:         |        :x:         |        :x:         |\n|               [SICMUtils](https://github.com/sicmutils/sicmutils)                | Clojure  | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |        :x:         |        :x:         |        :x:         |        :x:         |\n|                        [Halide](https://halide-lang.org/)                        |   C++    |        :x:         | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |        :x:         | :heavy_check_mark: |        :x:         |        :x:         |        :x:         |\n|              [Tensor Safe](https://github.com/leopiney/tensor-safe)              | Haskell  |        :x:         |        :x:         |        :x:         |        :x:         | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |        :x:         |\n|               [HaskTorch](https://github.com/hasktorch/hasktorch)                | Haskell  |        :x:         |        :x:         |        :x:         |        :x:         | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |        :x:         |        :x:         |\n|                [Dex](https://github.com/google-research/dex-lang)                | Haskell  |        :x:         | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |   :construction:   |        :x:         |\n|                [Grenade](https://github.com/HuwCampbell/grenade)                 | Haskell  |        :x:         |        :x:         |        :x:         |        :x:         | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |        :x:         |        :x:         |\n|           [Stalin∇](https://github.com/Functional-AutoDiff/STALINGRAD)           |  Scheme  |        :x:         | :heavy_check_mark: |        :x:         |        :x:         | :heavy_check_mark: |        :x:         |        :x:         |        :x:         |        :x:         |\n|                    [Myia](https://github.com/mila-udem/myia)                     |  Python  | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |        :x:         |        :x:         |        :x:         |   :construction:   |\n|                  [Autograd](https://github.com/HIPS/autograd/)                   |  Python  |        :x:         | :heavy_check_mark: |        :x:         |        :x:         |        :x:         |        :x:         |        :x:         |        :x:         |        :x:         |\n|                       [JAX](https://github.com/google/jax)                       |  Python  |        :x:         | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |        :x:         |        :x:         |        :x:         |   :construction:   |\n|                   [Tangent](https://github.com/google/tangent)                   |  Python  |        :x:         | :heavy_check_mark: |        :x:         |        :x:         |        :x:         |        :x:         |        :x:         |        :x:         |        :x:         |\n| [Analitik](https://link.springer.com/content/pdf/10.1007/BF01070461.pdf#page=39) | Analitik | :heavy_check_mark: |        :x:         |        :x:         |        :x:         | :heavy_check_mark: |        :x:         |        :x:         |        :x:         |        :x:         |\n\n¹ Symbolic differentiation*, ² Automatic differentiation*, ³ Higher-order/rank differentiation, ⁴ Differentiable programming*, ⁵ Functional programming, ⁶ Compile-time type safety, ⁷ Compile-time shape safety, ⁸ Dependently Typed, ⁹ Multiplatform\n\n\u003csup\u003e\u0026lowast;\u003c/sup\u003e Although we do not distinguish between AD and SD, here we adopt the authors' preferred nomenclature. We do make a distinction between differentiable programming libraries and those which simply construct neural networks. The :construction: symbol indicates work in progress.\n\n## References\n\nTo the author's knowledge, Kotlin∇ is the first AD implementation in native Kotlin. While the particular synthesis of these ideas (i.e. shape-safe, functional AD, using generic types) is unique, it has been influenced by a long list of prior work in AD. Below is a list of projects and publications that helped inspire this work.\n\n### Automatic differentiation\n\n* [The Simple Essence of Automatic Differentiation](http://conal.net/papers/essence-of-ad/essence-of-ad-icfp.pdf)\n* [Reverse-Mode AD in a Functional Framework: Lambda the Ultimate Backpropagator](http://www-bcl.cs.may.ie/~barak/papers/toplas-reverse.pdf)\n* [Automatic differentiation in ML: Where we are and where we should be going](https://papers.nips.cc/paper/8092-automatic-differentiation-in-ml-where-we-are-and-where-we-should-be-going.pdf)\n* [A Leibniz Notation for Automatic Differentiation](https://uhra.herts.ac.uk/bitstream/handle/2299/8933/904722.pdf)\n* [First-Class Automatic Differentiation in Swift: A Manifesto](https://gist.github.com/rxwei/30ba75ce092ab3b0dce4bde1fc2c9f1d)\n* [The (JAX) Autodiff Cookbook](https://colab.research.google.com/github/google/jax/blob/master/notebooks/autodiff_cookbook.ipynb)\n* [Automatic Differentiation in PyTorch](https://openreview.net/pdf?id=BJJsrmfCZ)\n* [Automatic Differentiation in Machine Learning: a Survey](http://jmlr.org/papers/volume18/17-468/17-468.pdf)\n* [Complexity of Derivatives Generated by Symbolic Differentiation](https://doi.org/10.1007/978-3-642-57201-2_12)\n* [Eigen-AD: Algorithmic Differentiation of the Eigen Library](https://arxiv.org/pdf/1911.12604.pdf)\n\n### Complexity\n\n* [Fast parallel computation of polynomials using few processors](http://www.cs.tau.ac.il/~amnon/Classes/2015-PRG/Papers/VSBR83.pdf), Valiant and Skyum (1983)\n* [The complexity of partial derivatives](https://core.ac.uk/download/pdf/82480031.pdf), Baur and Strassen (1983)\n* [Lower Bounds on Arithmetic Circuits via Partial Derivatives](https://www.math.ias.edu/~avi/PUBLICATIONS/MYPAPERS/NW96/final.pdf)\n* [Learning Restricted Models of Arithmetic Circuits](https://www.cs.tau.ac.il/~shpilka/publications/KlivansShpilka_Learning_via_partial_derivatives.pdf)\n\n### Differentiable programming\n\n* [Neural Networks, Types, and Functional Programming](https://colah.github.io/posts/2015-09-NN-Types-FP/)\n* [Backpropagation with Continuation Callbacks: Foundations for Efficient and Expressive Differentiable Programming](https://papers.nips.cc/paper/8221-backpropagation-with-callbacks-foundations-for-efficient-and-expressive-differentiable-programming.pdf)\n* [Backprop as Functor: A compositional perspective on supervised learning](https://arxiv.org/pdf/1711.10455.pdf)\n* [Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator](https://www.cs.purdue.edu/homes/rompf/papers/wang-preprint201811.pdf)\n* [Efficient Differentiable Programming in a Functional Array-Processing Language](https://arxiv.org/pdf/1806.02136.pdf)\n* [Operational Calculus for Differentiable Programming](https://arxiv.org/pdf/1610.07690.pdf)\n* [Differentiable Functional Programming](https://www.robots.ox.ac.uk/~gunes/assets/pdf/baydin-2016-slides-functionallondoners.pdf)\n* [Differentiable Programming for Image Processing and Deep Learning in Halide](https://people.csail.mit.edu/tzumao/gradient_halide/gradient_halide.pdf)\n* [Software 2.0](https://medium.com/@karpathy/software-2-0-a64152b37c35)\n\n### Calculus\n\n* [The Matrix Calculus You Need For Deep Learning](https://explained.ai/matrix-calculus/index.html), Parr and Howard (2018)\n* [Backpropagation in matrix notation](https://arxiv.org/pdf/1707.02746.pdf), Mishachev (2017)\n* [Matrix derivatives](https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf#derivatives), from the Matrix Cookbook\n* [Div, Grad, Curl and All That](https://archive.org/details/H.M.ScheyDivGradCurlAndAllThat), Petersen and Pedersen (2012)\n* [Matrix Differentiation (and some other stuff)](https://atmos.washington.edu/~dennis/MatrixCalculus.pdf), Barnes (2006)\n* [Symbolic Matrix Derivatives](https://www.jstor.org/stable/2236019), Dwyer and Macphail (1948)\n\n### Computer algebra\n\n* [Towards an API for the real numbers](https://doi.org/10.1145/3395658), Boehm (2020)\n* [miniKanren as a Tool for Symbolic Computation in Python](https://arxiv.org/pdf/2005.11644.pdf), Willard (2020)\n* [A Design Proposal for an Object Oriented Algebraic Library](https://pdfs.semanticscholar.org/6fd2/88960ef83469c898a3d8ed8f0950e7839625.pdf), Niculescu (2003)\n* [On Using Generics for Implementing Algebraic Structures](https://www.cs.ubbcluj.ro/~studia-i/contents/2011-4/02-Niculescu.pdf), Niculescu (2011)\n* [How to turn a scripting language into a domain-specific language for computer algebra](https://arxiv.org/pdf/0811.1061.pdf), Jolly and Kredel (2008)\n* [Evaluation of a Java Computer Algebra System](https://pdfs.semanticscholar.org/ce81/39a9008bdc7d23be0ff05ef5a16d512b352c.pdf), Kredel (2007)\n* [Typesafe Abstractions for Tensor Operations](https://arxiv.org/pdf/1710.06892.pdf), Chen (2017)\n* [Einstein Summation in Numpy](https://obilaniu6266h16.wordpress.com/2016/02/04/einstein-summation-in-numpy/), Bilaniuk (2016)\n* [Issues in Computer Algebra](https://www.cs.rit.edu/~anh/comp_alg.html), Nunes-Harwitt\n* [Term Rewriting and All That](https://www21.in.tum.de/~nipkow/TRaAT/), Baader and Nipkow (1998)\n* [Describing the syntax of programming languages using conjunctive and Boolean grammars](http://users.utu.fi/aleokh/papers/conj_bool_programming.pdf), Okhotin (2016)\n* [Formal languages over GF(2)](https://users.math-cs.spbu.ru/~okhotin/papers/formal_languages_gf2.pdf), Okhotin (2019)\n\n### Symbolic mathematics\n\n* [KMath](https://github.com/altavir/kmath) - Kotlin mathematics extensions library\n* [SymJa](https://github.com/axkr/symja_android_library/) - Computer algebra language \u0026 symbolic math library for Android\n* [tensor](https://github.com/amodeus-science/tensor) - Linear algebra for tensors with symbolic and numeric scalars\n* [Hipparchus](https://github.com/Hipparchus-Math/hipparchus) - An efficient, general-purpose mathematics components library in the Java programming language\n* [miniKanren](http://minikanren.org/) - A tool for symbolic computation and logic programming\n* [SymJava](https://github.com/yuemingl/SymJava) - A Java library for fast symbolic-numeric computation\n* [JAS](https://github.com/kredel/java-algebra-system) - Java Algebra System\n* [jalgebra](https://github.com/mdgeorge4153/jalgebra) - An abstract algebra library for Java\n* [COJAC](https://github.com/Cojac/Cojac) - Numerical sniffing tool and Enriching number wrapper for Java\n* [chebfun](https://www.chebfun.org) - Allows representing functions as [Chebyshev polynomials](https://en.wikipedia.org/wiki/Chebyshev_polynomials), for easy symbolic differentiation (or integration)\n* [horeilly1101/deriv](https://github.com/horeilly1101/deriv) - Open source derivative calculator REST API (and Java library)\n\n### Neural networks\n\n* [Hacker's Guide to Neural Networks](https://karpathy.github.io/neuralnets/), Karpathy (2014)\n* [Tricks from Deep Learning](https://arxiv.org/pdf/1611.03777.pdf), Baydin et al. (2016)\n* [Practical Dependent Types in Haskell: Type-Safe Neural Networks](https://blog.jle.im/entry/practical-dependent-types-in-haskell-1.html), Le (2016)\n* [A guide to convolutional arithmetic for deep learning](https://arxiv.org/pdf/1603.07285.pdf), Dumoulin and Visin (2018)\n\n### Type systems\n\n* [Generalized Algebraic Data Types and Object-Oriented Programming](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/gadtoop.pdf), Kennedy and Russo (2005)\n* [Java Generics are Turing Complete](https://arxiv.org/pdf/1605.05274.pdf), Grigore (2016)\n* [Dimension Types](https://link.springer.com/content/pdf/10.1007%2F3-540-57880-3_23.pdf), Kennedy (2004)\n* [An algebraic view of dimension types](https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-391.pdf#page=145), Kennedy (1996)\n* [Type Inference and Unification](https://www.cs.cornell.edu/courses/cs3110/2011sp/Lectures/lec26-type-inference/type-inference.htm)\n* [Constructive mathematics and computer programming](https://royalsocietypublishing.org/doi/pdf/10.1098/rsta.1984.0073), Martin-Lof (1984)\n* [Programming in Martin-Löf's Type Theory](http://www.cse.chalmers.se/research/group/logic/book/book.pdf#page=23), Nordstrom et al. (1990)\n\n### Domain-specific languages\n\n* [Compiling Embedded Languages](http://conal.net/papers/jfp-saig/compile-dsel.pdf), Elliott et al. (2003)\n* [Implicit Staging of EDSL Expressions: A Bridge between Shallow and Deep Embedding](https://static.csg.ci.i.u-tokyo.ac.jp/papers/14/scherr-ecoop2014.pdf), Scherr and Chiba (2014)\n* [DSL Implementation Using Staging and Monads](https://dl.acm.org/doi/pdf/10.1145/331963.331975) Sheard et al. (1999)\n* [Deeply Reifying Running Code for Constructing a Domain-Specific Language](https://dl.acm.org/doi/pdf/10.1145/2972206.2972219), Chiba et al. (2016)\n* [Staged Abstract Interpreters](https://www.cs.purdue.edu/homes/rompf/papers/wei-oopsla19.pdf), Wei et al. (2019)\n* [Generating Fluent Embedded Domain-Specific Languages with Subchaining](https://static.csg.ci.i.u-tokyo.ac.jp/papers/19/nakamaru-jcl50.pdf), Nakamaru et al. (2019)\n* [Generating a Generic Fluent API in Java](https://arxiv.org/pdf/2002.06179.pdf), Nakamarua and Chiba (2020)\n* [Fling – A Fluent API Generator](https://drops.dagstuhl.de/opus/volltexte/2019/10805/pdf/LIPIcs-ECOOP-2019-13.pdf), Gil and Roth (2019)\n* [Scripting an IDE for EDSL awareness](https://ilyasergey.net/papers/groovy-dsl.pdf), Sergey et al. (2011)\n\n### Automated testing\n\n* [DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars](https://arxiv.org/pdf/1708.08559.pdf), Tian et al. (2018)\n* [QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs](https://www.eecs.northwestern.edu/~robby/courses/395-495-2009-fall/quick.pdf), Claessen and Hughes (2000)\n* [Learning to Discover Efficient Mathematical Identities](https://papers.nips.cc/paper/5350-learning-to-discover-efficient-mathematical-identities.pdf), Zaremba et al. (2014)\n\n### AD libraries\n\n* [TensorFlow.FSharp](https://github.com/fsprojects/TensorFlow.FSharp): An eDSL for writing numerical models in F# with support for interactive tensor shape-checking\n* [Stalin∇](https://github.com/Functional-AutoDiff/STALINGRAD), a brutally optimizing compiler for the VLAD language, a pure dialect of Scheme with first-class automatic differentiation operators\n* [Autograd](https://github.com/hips/autograd) - Efficiently computes derivatives of NumPy code\n* [Myia](https://github.com/mila-udem/myia) - SCT based AD, adapted from Pearlmutter \u0026 Siskind's \"Reverse Mode AD in a functional framework\"\n* [JAX](https://github.com/google/jax) - Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more\n* [Dex](https://github.com/google-research/dex-lang) -  Research language for array processing in the Haskell/ML family\n* [Nexus](https://github.com/ctongfei/nexus) - Type-safe tensors, deep learning and probabilistic programming in Scala\n* [Tangent](https://github.com/google/tangent) - \"Source-to-Source Debuggable Derivatives in Pure Python\"\n* [Grenade](https://github.com/HuwCampbell/grenade) - composable, dependently typed, practical, and fast RNNs in Haskell\n* [Lantern](https://feiwang3311.github.io/Lantern/) - a framework in Scala, based on delimited continuations and multi-stage programming\n* [JAutoDiff](https://github.com/uniker9/JAutoDiff) - An Automatic Differentiation Library\n* [DiffSharp](https://github.com/DiffSharp/DiffSharp), a functional AD library implemented in the F# language\n* [Analitik](https://link.springer.com/content/pdf/10.1007/BF01070461.pdf) - Algebraic language for the description of computing processes using analytical transformations\n\n## Special thanks\n\nThe following individuals have helped shape this project through their enthusiasm and thoughtful feedback. Please check out their work.\n\n* [Liam Paull](http://liampaull.ca)\n* [Michalis Famelis](https://michalis.famelis.info/)\n* [Marc Feeley](http://www.iro.umontreal.ca/~feeley/)\n* [Eugene Syriani](http://www-ens.iro.umontreal.ca/~syriani/)\n* [Hanneli Tavante](http://hannelita.com/)\n* [Stefan Monnier](https://www.iro.umontreal.ca/~monnier/)\n* [Alexander Nozik](https://scholar.google.com/citations?user=B-WJi4kAAAAJ)\n* [Erik Meijer](https://twitter.com/headinthebox/)\n* [Krishna Murthy](https://krrish94.github.io/)\n* [Maxime Chevalier-Boisvert](https://pointersgonewild.com/)\n* [Kiran Gopinathan](https://scholar.google.com/citations?user=IcuGXgcAAAAJ\u0026hl=en)\n* [Jacob Miller](https://scholar.google.ca/citations?user=xG3VWpEAAAAJ)\n* [Adam Pocock](http://www.adampocock.com/)\n* [Torsten Scholak](https://tscholak.github.io/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbreandan%2Fkotlingrad","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbreandan%2Fkotlingrad","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbreandan%2Fkotlingrad/lists"}