{"id":15100706,"url":"https://github.com/thomasahle/tensorgrad","last_synced_at":"2025-04-08T09:10:57.624Z","repository":{"id":231160277,"uuid":"781088863","full_name":"thomasahle/tensorgrad","owner":"thomasahle","description":"Machine Learning with Symbolic Tensors","archived":false,"fork":false,"pushed_at":"2025-03-03T15:37:27.000Z","size":21272,"stargazers_count":264,"open_issues_count":18,"forks_count":12,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-04-01T07:53:40.435Z","etag":null,"topics":["autograd","derivatives","neurosymbolic","probability","symbolic-computation","tensor"],"latest_commit_sha":null,"homepage":"http://tensorcookbook.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thomasahle.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-02T18:16:37.000Z","updated_at":"2025-03-27T23:28:11.000Z","dependencies_parsed_at":"2024-05-29T01:15:36.083Z","dependency_job_id":"dff73091-1cd9-4bbb-8fed-02285da03a72","html_url":"https://github.com/thomasahle/tensorgrad","commit_stats":null,"previous_names":["thomasahle/tensorgrad"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasahle%2Ftensorgrad","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasahle%2Ftensorgrad/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasahle%2Ftensorgrad/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasahle%2Ftensorgrad/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thomasahle","download_url":"https://codeload.github.com/thomasahle/tensorgrad/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247809964,"owners_count":20999816,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autograd","derivatives","neurosymbolic","probability","symbolic-computation","tensor"],"created_at":"2024-09-25T18:00:50.147Z","updated_at":"2025-04-08T09:10:57.580Z","avatar_url":"https://github.com/thomasahle.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/thomasahle/tensorgrad/main/docs/images/basics.png\" width=\"100%\"\u003e\n\u003ch3\u003e\n  \n[Book](https://tensorcookbook.com) | [Documentation](https://tensorcookbook.com/docs) | [API Reference](https://tensorcookbook.com/docs/api)\n\n\u003c/h3\u003e\n\u003c/div\u003e\n\n# Tensorgrad\nA Tensor \u0026 Deep Learning framework - It's like PyTorch meets SymPy.\n\nTensorgrad is an open-source python package for symbolic tensor manipulation.\nIt performs any simplification described in the [Tensor Cookbook (draft)](https://github.com/thomasahle/tensorgrad/blob/main/paper/cookbook.pdf)  automatically, and can even be used as a machine learning framework.\n\n## Examples\n\nTo run the examples for yourself, use [the playground](https://tensorcookbook.com/playground.html) or\nsee [this notebook](https://colab.research.google.com/drive/10Lk39tTgRd-cCo5gNNe3KvdDcVP2F5aB?usp=sharing).\n\nInstall tensorgrad with\n```bash\npip install tensorgrad\n```\n\nFor visualizations we need some latex packages:\n```bash\napt-get install texlive-luatex\napt-get install texlive-latex-extra\napt-get install texlive-fonts-extra\napt-get install poppler-utils\n```\n\n### Derivative of L2 Loss\n\n```python\nfrom tensorgrad import Variable\nimport tensorgrad.functions as F\n# ||Ax - y||_2^2\nb, x, y = sp.symbols(\"b x y\")\nX = tg.Variable(\"X\", b, x)\nY = tg.Variable(\"Y\", b, y)\nW = tg.Variable(\"W\", x, y)\nXWmY = X @ W - Y\nl2 = XWmY @ XWmY\ngrad = l2.grad(W)\ndisplay_pdf_image(to_tikz(grad.full_simplify()))\n```\n\nThis will output the tensor diagram:\n\n\u003cimg src=\"https://raw.githubusercontent.com/thomasahle/tensorgrad/main/docs/images/l2_grad_w_single_step.png\" width=\"50%\"\u003e\n\nTensorgrad can also output pytorch code for numerically computing the gradient with respect to W:\n```python\n\u003e\u003e\u003e to_pytorch(grad)\nimport torch\nWX = torch.einsum('xy,bx -\u003e by', W, X)\nsubtraction = WX - Y\nX_subtraction = torch.einsum('bx,by -\u003e xy', X, subtraction)\nfinal_result = 2 * X_subtraction\n```\n\n### Hessian of CE Loss\n\nFor a more complicated example, consider the following program for computing the Entropy of Cross Entropy Loss:\n\n```python\nfrom tensorgrad import Variable\nimport tensorgrad.functions as F\n\nlogits = Variable(\"logits\", [\"C\"])\ntarget = Variable(\"target\", [\"C\"])\n\ne = F.exp(logits)\nsoftmax = e / F.sum(e)\nce = -F.sum(target * F.log(softmax))\n\nH = ce.grad(logits).grad(logits)\n\ndisplay_pdf_image(to_tikz(H.full_simplify()))\n```\n\n\u003cimg src=\"https://raw.githubusercontent.com/thomasahle/tensorgrad/main/docs/images/hess_ce.png\" width=\"50%\"\u003e\n\nThis is tensor diagram notation for `(diag(p) - pp^T) sum(target)`, where `p = softmax(logits)`.\n\n### Expectations\n\nTensorgrad can also take expectations of arbitrary functions with respect to Gaussian tensors.\n\nAs an example, consider the L2 Loss program from before:\n```python\nX = Variable(\"X\", \"b, x\")\nY = Variable(\"Y\", \"b, y\")\nW = Variable(\"W\", \"x, y\")\nmu = Variable(\"mu\", \"x, y\")\nC = Variable(\"C\", \"x, y, x2, y2\")\nXWmY = X @ W - Y\nl2 = F.sum(XWmY * XWmY)\nE = Expectation(l2, W, mu, C)\ndisplay_pdf_image(to_tikz(E.full_simplify()))\n```\n\n\u003cimg src=\"https://raw.githubusercontent.com/thomasahle/tensorgrad/main/docs/images/expectation.png\" width=\"50%\"\u003e\n\nNote that the covariance is a rank-4 tensor (illustrated with a star) since we take the expectation with respect to a matrix.\nThis is different from the normal \"matrix shaped\" covariance you get if you take expectation with respect to a vector.\n\n### Evaluation\n\nTensorgrad can evaluate your diagrams using [Pytorch Named Tensors](https://pytorch.org/docs/stable/named_tensor.html).\nIt uses graph isomorphism detection to eliminated common subexpressions.\n\n### Code Generation\n\nTensorgrad can convert your diagrams back into pytorch code.\nThis gives a super optimized way to do gradients and higher order derivatives in neural networks.\n\n\n### Matrix Calculus\n\nIn Penrose's book, The Road to Reality: A Complete Guide to the Laws of the Universe, he introduces a notation for taking derivatives on tensor networks. In this library we try to follow Penrose's notation, expanding it as needed to handle a full \"chain rule\" on tensor functions.\n\u003cimg style=\"background-color: white\" src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/3/37/Penrose_covariant_derivate.svg/2880px-Penrose_covariant_derivate.svg.png\" width=\"100%\"\u003e\n\nAnother source of inspiration was Yaroslav Bulatov's [derivation of the hessian of neural networks](https://community.wolfram.com/groups/-/m/t/2437093):\n\n\u003cimg src=\"https://raw.githubusercontent.com/thomasahle/tensorgrad/main/docs/images/hessian_yaroslaw.png\"\u003e\n\n# More stuff\n\n## Transformers\n\n\u003cimg src=\"https://raw.githubusercontent.com/thomasahle/tensorgrad/main/docs/images/attention.png\"\u003e\n\n## Convolutional Neural Networks \n\nThe main ingredient in CNNs are the linear operations Fold and Unfold. \nUnfold takes an image, with dimensions HxW and outputs P \"patches\" of size K^2, where K is the kernel size. Fold is the reverse operation. \nSince they are linear operations (they consists only of copying/adding) we can express them as a tensor with shape (H, W, P, K^2).\n\n\u003cimg src=\"https://raw.githubusercontent.com/thomasahle/tensorgrad/main/docs/images/uCrOg.png\" widht=\"80%\"\u003e\n\n\u003ca href=\"https://arxiv.org/abs/1908.04471\"\u003eHayashi et al.\u003c/a\u003e show that if you define a tensor `(∗)_{i,j,k} = [i=j+k]`, then the \"Unfold\" operator factors along the spacial dimensions, and you can write a bunch of different convolutional neural networks easily as tensor networks:\n\u003cimg src=\"https://raw.githubusercontent.com/thomasahle/tensorgrad/main/docs/images/68747470733a2f2f64726976652e676f6f676c652e636f6d2f75633f6578706f72743d766965772669643d3141305235795371446e48715939624650677163546f6b347735416a516c666572.png\"\u003e\n\nWith tensorgrad you can write the \"standard\" convolutional neural network like this:\n```python\ndata = Variable(\"data\", [\"b\", \"c\", \"w\", \"h\"])\nunfold = Convolution(\"w\", \"j\", \"w2\") @ Convolution(\"h\", \"i\", \"h2\")\nkernel = Variable(\"kernel\", [\"c\", \"i\", \"j\", \"c2\"])\nexpr = data @ unfold @ kernel\n```\n\nAnd then easily find the jacobian symbolically with `expr.grad(kernel)`:\n\u003cimg src=\"https://raw.githubusercontent.com/thomasahle/tensorgrad/main/docs/images/conv_jac.png\"\u003e\n\n## Tensor Sketch\n\nTaken from [this Twitter thread](https://twitter.com/thomasahle/status/1674572437953089536):\nI wish I had know about Tensor Graphs back when i worked on Tensor-sketching.\nLet me correct this now and explain dimensionality reduction for tensors using Tensor Networks:\n\n\u003cimg src=\"https://raw.githubusercontent.com/thomasahle/tensorgrad/main/docs/images/ts_simple.png\" width=\"66%\"\u003e\n\nThe second version is the \"original\" Tensor Sketch by \nRasmus Pagh and Ninh Pham. (https://rasmuspagh.net/papers/tensorsketch.pdf) Each fiber is reduced by a JL sketch, and the result is element-wise multiplied.\nNote the output of each JL is larger than in the \"simple\" sketch to give the same output size.\n\n\u003cimg src=\"https://raw.githubusercontent.com/thomasahle/tensorgrad/main/docs/images/ts_pp.png\" width=\"66%\"\u003e\n\nNext we have the \"recursive\" sketch by myself and coauthors in https://thomasahle.com/#paper-tensorsketch-joint.\nIn the paper we sometimes describe this as a tree, but it doesn't really matter. We just had already created the tree-graphic when we realized.\n\n\u003cimg src=\"https://raw.githubusercontent.com/thomasahle/tensorgrad/main/docs/images/ts_tree.png\" width=\"66%\"\u003e\n\nThe main issue with the AKKRVWZ-sketch was that we used order-3 tensors internally, which require more space/time than simple random matrices in the PP-sketch.\nWe can mitigate this issue by replacing each order-3 tensor with a simple order-2 PP-sketch.\n\n\u003cimg src=\"https://raw.githubusercontent.com/thomasahle/tensorgrad/main/docs/images/ts_hybrid.png\" width=\"66%\"\u003e\n\n\nFinally we can speed up each matrix multiplication by using FastJL, which is itself basically an outer product of a bunch of tiny matrices. But at this point my picture is starting to get a bit overwhelming.\n\n## See also\n\n- [Tool for creating tensor diagrams from einsum](https://thomasahle.com/blog/einsum_to_dot.html?q=abc,cde,efg,ghi,ija-%3Ebdfhj\u0026layout=circo) by Thomas Ahle\n- [Ideograph: A Language for Expressing and Manipulating Structured Data](https://arxiv.org/pdf/2303.15784.pdf) by Stephen Mell, Osbert Bastani, Steve Zdancewic\n\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomasahle%2Ftensorgrad","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthomasahle%2Ftensorgrad","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomasahle%2Ftensorgrad/lists"}