{"id":16348846,"url":"https://github.com/mstksg/backprop","last_synced_at":"2026-03-05T01:33:58.775Z","repository":{"id":17817920,"uuid":"82775599","full_name":"mstksg/backprop","owner":"mstksg","description":"Heterogeneous automatic differentiation (\"backpropagation\") in Haskell","archived":false,"fork":false,"pushed_at":"2025-06-05T04:02:01.000Z","size":11657,"stargazers_count":191,"open_issues_count":5,"forks_count":22,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-08-20T19:46:59.270Z","etag":null,"topics":["automatic-differentiation","backprop","backpropagation","deep-learning","gradient-descent","graph","neural-network"],"latest_commit_sha":null,"homepage":"https://backprop.jle.im","language":"Haskell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mstksg.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-02-22T07:43:46.000Z","updated_at":"2025-08-20T13:19:39.000Z","dependencies_parsed_at":"2023-01-11T19:42:05.467Z","dependency_job_id":"470fba5f-c0b1-4d86-85d5-69bd7b354795","html_url":"https://github.com/mstksg/backprop","commit_stats":null,"previous_names":[],"tags_count":21,"template":false,"template_full_name":null,"purl":"pkg:github/mstksg/backprop","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mstksg%2Fbackprop","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mstksg%2Fbackprop/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mstksg%2Fbackprop/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mstksg%2Fbackprop/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mstksg","download_url":"https://codeload.github.com/mstksg/backprop/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mstksg%2Fbackprop/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30104536,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-05T01:06:53.091Z","status":"ssl_error","status_checked_at":"2026-03-05T01:02:35.679Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automatic-differentiation","backprop","backpropagation","deep-learning","gradient-descent","graph","neural-network"],"created_at":"2024-10-11T00:55:22.838Z","updated_at":"2026-03-05T01:33:53.759Z","avatar_url":"https://github.com/mstksg.png","language":"Haskell","funding_links":[],"categories":["Haskell"],"sub_categories":[],"readme":"[backprop][docs]\n================\n\n[![backprop on Hackage](https://img.shields.io/hackage/v/backprop.svg?maxAge=86400)](https://hackage.haskell.org/package/backprop)\n[![backprop on Stackage LTS 11](http://stackage.org/package/backprop/badge/lts-11)](http://stackage.org/lts-11/package/backprop)\n[![backprop on Stackage Nightly](http://stackage.org/package/backprop/badge/nightly)](http://stackage.org/nightly/package/backprop)\n[![Build Status](https://travis-ci.org/mstksg/backprop.svg?branch=master)](https://travis-ci.org/mstksg/backprop)\n\n[![Join the chat at https://gitter.im/haskell-backprop/Lobby](https://badges.gitter.im/haskell-backprop/Lobby.svg)](https://gitter.im/haskell-backprop/Lobby?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n\n[**Documentation and Walkthrough**][docs]\n\n[docs]: https://backprop.jle.im\n\nAutomatic *heterogeneous* back-propagation.\n\nWrite your functions to compute your result, and the library will automatically\ngenerate functions to compute your gradient.\n\nDiffers from [ad][] by offering full heterogeneity -- each intermediate step\nand the resulting value can have different types (matrices, vectors, scalars,\nlists, etc.).\n\n[ad]: http://hackage.haskell.org/package/ad\n\nUseful for applications in [differentiable programming][dp] and deep learning\nfor creating and training numerical models, especially as described in this\nblog post on [a purely functional typed approach to trainable models][models].\nOverall, intended for the implementation of gradient descent and other numeric\noptimization techniques.  Comparable to the python library [autograd][].\n\n[dp]: https://www.facebook.com/yann.lecun/posts/10155003011462143\n[models]: https://blog.jle.im/entry/purely-functional-typed-models-1.html\n[autograd]: https://github.com/HIPS/autograd\n\nCurrently up on [hackage][], with haddock documentation!  However, a proper\nlibrary introduction and usage tutorial [is available here][docs].  See also my\n[introductory blog post][blog].  You can also find help or support on the\n[gitter channel][gitter].\n\n[hackage]: http://hackage.haskell.org/package/backprop\n[blog]: https://blog.jle.im/entry/introducing-the-backprop-library.html\n[gitter]: https://gitter.im/haskell-backprop/Lobby\n\nIf you want to provide *backprop* for users of your library, see this [guide\nto equipping your library with backprop][library].\n\n[library]: https://backprop.jle.im/08-equipping-your-library.html\n\n\nMNIST Digit Classifier Example\n------------------------------\n\nMy [blog post][blog] introduces the concepts in this library in the context of\ntraining a handwritten digit classifier.  I recommend reading that first.\n\nThere are some [literate haskell examples][mnist-lhs] in the source, though\n([rendered as pdf here][mnist-pdf]), which can be built (if [stack][] is\ninstalled) using:\n\n[mnist-lhs]: https://github.com/mstksg/backprop/blob/master/samples/backprop-mnist.lhs\n[mnist-pdf]: https://github.com/mstksg/backprop/blob/master/renders/backprop-mnist.pdf\n[stack]: http://haskellstack.org/\n\n```bash\n$ ./Build.hs exe\n```\n\nThere is a follow-up tutorial on using the library with more advanced types,\nwith extensible neural networks a la [this blog post][blog], [available as\nliterate haskell][neural-lhs] and also [rendered as a PDF][neural-pdf].\n\n[blog]: https://blog.jle.im/entries/series/+practical-dependent-types-in-haskell.html\n[neural-lhs]: https://github.com/mstksg/backprop/blob/master/samples/extensible-neural.lhs\n[neural-pdf]: https://github.com/mstksg/backprop/blob/master/renders/extensible-neural.pdf\n\nBrief example\n-------------\n\n(This is a really brief version of [the documentation walkthrough][docs] and my\n[blog post][blog])\n\nThe quick example below describes the running of a neural network with one\nhidden layer to calculate its squared error with respect to target `targ`,\nwhich is parameterized by two weight matrices and two bias vectors.\nVector/matrix types are from the *hmatrix* package.\n\nLet's make a data type to store our parameters, with convenient accessors using\n*[lens][]*:\n\n[lens]: http://hackage.haskell.org/package/lens\n\n```haskell\nimport Numeric.LinearAlgebra.Static.Backprop\n\ndata Network = Net { _weight1 :: L 20 100\n                   , _bias1   :: R 20\n                   , _weight2 :: L  5  20\n                   , _bias2   :: R  5\n                   }\n\nmakeLenses ''Network\n```\n\n(`R n` is an n-length vector, `L m n` is an m-by-n matrix, etc., `#\u003e` is\nmatrix-vector multiplication)\n\n\"Running\" a network on an input vector might look like this:\n\n```haskell\nrunNet net x = z\n  where\n    y = logistic $ (net ^^. weight1) #\u003e x + (net ^^. bias1)\n    z = logistic $ (net ^^. weight2) #\u003e y + (net ^^. bias2)\n\nlogistic :: Floating a =\u003e a -\u003e a\nlogistic x = 1 / (1 + exp (-x))\n```\n\nAnd that's it!  `neuralNet` is now backpropagatable!\n\nWe can \"run\" it using `evalBP`:\n\n```haskell\nevalBP2 runNet :: Network -\u003e R 100 -\u003e R 5\n```\n\nIf we write a function to compute errors:\n\n```haskell\nsquaredError target output = error `dot` error\n  where\n    error = target - output\n```\n\nwe can \"test\" our networks:\n\n```haskell\nnetError target input net = squaredError (auto target)\n                                         (runNet net (auto input))\n```\n\nThis can be run, again:\n\n```haskell\nevalBP (netError myTarget myVector) :: Network -\u003e Double\n```\n\nNow, we just wrote a *normal function to compute the error of our network*.\nWith the *backprop* library, we now also have a way to *compute the gradient*,\nas well!\n\n```haskell\ngradBP (netError myTarget myVector) :: Network -\u003e Network\n```\n\nNow, we can perform gradient descent!\n\n```haskell\ngradDescent\n    :: R 100\n    -\u003e R 5\n    -\u003e Network\n    -\u003e Network\ngradDescent x targ n0 = n0 - 0.1 * gradient\n  where\n    gradient = gradBP (netError targ x) n0\n```\n\nTa dah!  We were able to compute the gradient of our error function, just by\nonly saying how to compute *the error itself*.\n\nFor a more fleshed out example, see [the documentaiton][docs], my [blog\npost][blog] and the [MNIST tutorial][mnist-lhs] (also [rendered as a\npdf][mnist-pdf])\n\nBenchmarks and Performance\n--------------------------\n\nHere are some basic benchmarks comparing the library's automatic\ndifferentiation process to \"manual\" differentiation by hand.  When using the\n[MNIST tutorial][bench] as an example:\n\n[bench]: https://github.com/mstksg/backprop/blob/master/bench/bench.hs\n\n![benchmarks](https://i.imgur.com/rLUx4x4.png)\n\nHere we compare:\n\n1.  \"Manual\" differentiation of a 784 x 300 x 100 x 10 fully-connected\n    feed-forward ANN.\n2.  Automatic differentiation using *backprop* and the lens-based accessor\n    interface\n3.  Automatic differentiation using *backprop* and the \"higher-kinded\n    data\"-based pattern matching interface\n4.  A hybrid approach that manually provides gradients for individual layers\n    but uses automatic differentiation for chaining the layers together.\n\nWe can see that simply *running* the network and functions (using `evalBP`)\nincurs virtually zero overhead.  This means that library authors could actually\nexport *only* backprop-lifted functions, and users would be able to use them\nwithout losing any performance.\n\nAs for computing gradients, there exists some associated overhead, from three\nmain sources.  Of these, the building of the computational graph and the\nWengert Tape wind up being negligible.  For more information, see [a detailed\nlook at performance, overhead, and optimization techniques][performance] in the\ndocumentation.\n\n[performance]: https://backprop.jle.im/07-performance.html\n\nNote that the manual and hybrid modes almost overlap in the range of their\nrandom variances.\n\nComparisons\n-----------\n\n*backprop* can be compared and contrasted to many other similar libraries with\nsome overlap:\n\n1.  The *[ad][]* library (and variants like *[diffhask][]*) support automatic\n    differentiation, but only for *homogeneous*/*monomorphic* situations.  All\n    values in a computation must be of the same type --- so, your computation\n    might be the manipulation of `Double`s through a `Double -\u003e Double`\n    function.\n\n    *backprop* allows you to mix matrices, vectors, doubles, integers, and even\n    key-value maps as a part of your computation, and they will all be\n    backpropagated properly with the help of the `Backprop` typeclass.\n\n2.  The *[autograd][]* library is a very close equivalent to *backprop*,\n    implemented in Python for Python applications.  The difference between\n    *backprop* and *autograd* is mostly the difference between Haskell and\n    Python --- static types with type inference, purity, etc.\n\n3.  There is a link between *backprop* and deep learning/neural network\n    libraries like *[tensorflow][]*, *[caffe][]*, and *[theano][]*, which all\n    support some form of heterogeneous automatic differentiation.  Haskell\n    libraries doing similar things include *[grenade][]*.\n\n    These are all frameworks for working with neural networks or other\n    gradient-based optimizations --- they include things like built-in\n    optimizers, methods to automate training data, built-in models to use out\n    of the box.  *backprop* could be used as a *part* of such a framework, like\n    I described in my [A Purely Functional Typed Approach to Trainable\n    Models][models] blog series; however, the *backprop* library itself does\n    not provide any built in models or optimizers or automated data processing\n    pipelines.\n\n[diffhask]: https://hackage.haskell.org/package/diffhask\n[tensorflow]: https://www.tensorflow.org/\n[caffe]: http://caffe.berkeleyvision.org/\n[theano]: http://www.deeplearning.net/software/theano/\n[grenade]: http://hackage.haskell.org/package/grenade\n\nSee [documentation][comparisons] for a more detailed look.\n\n[comparisons]: https://backprop.jle.im/09-comparisons.html\n\nTodo\n----\n\n1.  Benchmark against competing back-propagation libraries like *ad*, and\n    auto-differentiating tensor libraries like *[grenade][]*\n\n    [grenade]: https://github.com/HuwCampbell/grenade\n\n2.  Write tests!\n\n3.  Explore opportunities for parallelization.  There are some naive ways of\n    directly parallelizing right now, but potential overhead should be\n    investigated.\n\n4.  Some open questions:\n\n    a.  Is it possible to support constructors with existential types?\n\n    b.  How to support \"monadic\" operations that depend on results of previous\n        operations? (`ApBP` already exists for situations that don't)\n\n    c.  What needs to be done to allow us to automatically do second,\n        third-order differentiation, as well?  This might be useful for certain\n        ODE solvers which rely on second order gradients and hessians.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmstksg%2Fbackprop","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmstksg%2Fbackprop","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmstksg%2Fbackprop/lists"}