{"id":13503870,"url":"https://github.com/coreylowman/dfdx","last_synced_at":"2025-05-14T07:08:54.004Z","repository":{"id":37773578,"uuid":"416162048","full_name":"coreylowman/dfdx","owner":"coreylowman","description":"Deep learning in Rust, with shape checked tensors and neural networks","archived":false,"fork":false,"pushed_at":"2024-07-23T02:05:58.000Z","size":2724,"stargazers_count":1809,"open_issues_count":89,"forks_count":106,"subscribers_count":32,"default_branch":"main","last_synced_at":"2025-05-12T13:53:25.340Z","etag":null,"topics":["autodiff","autodifferentiation","autograd","backpropagation","cuda","cuda-kernels","cuda-support","cuda-toolkit","cudnn","deep-learning","deep-neural-networks","gpu","gpu-acceleration","gpu-computing","machine-learning","neural-network","rust","rust-lang","tensor"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/coreylowman.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"coreylowman","patreon":"dfdx","ko_fi":"coreylowman","open_collective":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":null}},"created_at":"2021-10-12T02:58:54.000Z","updated_at":"2025-05-11T00:53:32.000Z","dependencies_parsed_at":"2022-07-13T16:44:24.762Z","dependency_job_id":"4cb13fbf-d7f4-4e99-9f6c-61e3d5d2c5f0","html_url":"https://github.com/coreylowman/dfdx","commit_stats":{"total_commits":878,"total_committers":41,"mean_commits":"21.414634146341463","dds":0.4567198177676538,"last_synced_commit":"4722a99d303f347d6088d95867d007c75ca6dd78"},"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coreylowman%2Fdfdx","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coreylowman%2Fdfdx/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coreylowman%2Fdfdx/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coreylowman%2Fdfdx/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/coreylowman","download_url":"https://codeload.github.com/coreylowman/dfdx/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254092776,"owners_count":22013290,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autodiff","autodifferentiation","autograd","backpropagation","cuda","cuda-kernels","cuda-support","cuda-toolkit","cudnn","deep-learning","deep-neural-networks","gpu","gpu-acceleration","gpu-computing","machine-learning","neural-network","rust","rust-lang","tensor"],"created_at":"2024-07-31T23:00:48.741Z","updated_at":"2025-05-14T07:08:48.992Z","avatar_url":"https://github.com/coreylowman.png","language":"Rust","funding_links":["https://github.com/sponsors/coreylowman","https://patreon.com/dfdx","https://ko-fi.com/coreylowman"],"categories":["Libraries","Rust","Library / Framework","Other Versions of YOLO","Summary","Frameworks","Machine Learning","Neural Networks"],"sub_categories":["Artificial Intelligence","General-Purpose Machine Learning"],"readme":"# dfdx: shape checked deep learning in rust\n\n[![](https://dcbadge.vercel.app/api/server/AtUhGqBDP5)](https://discord.gg/AtUhGqBDP5)\n[![crates.io](https://img.shields.io/crates/v/dfdx?style=for-the-badge)](https://crates.io/crates/dfdx)\n[![docs.rs](https://img.shields.io/docsrs/dfdx?label=docs.rs%20latest\u0026style=for-the-badge)](https://docs.rs/dfdx)\n\n\nErgonomics \u0026 safety focused deep learning in Rust.\n\n**Still in pre-alpha state. The next few releases are planned to be breaking releases.**\n\nFeatures at a glance:\n1. :fire: GPU accelerated tensor library with shapes up to 6d!\n2. Shapes with both compile and runtime sized dimensions. (e.g. `Tensor\u003c(usize, Const\u003c10\u003e)\u003e` and `Tensor\u003cRank2\u003c5, 10\u003e\u003e`)\n3. A large library of tensor operations (including `matmul`, `conv2d`, and much more).\n    1. All tensor operations shape and type checked at compile time!!\n4. Ergonomic neural network building blocks (like `Linear`, `Conv2D`, and `Transformer`).\n5. Standard deep learning optimizers such as `Sgd`, `Adam`, `AdamW`, `RMSprop`, and more.\n\n`dfdx` is on [crates.io](https://crates.io/crates/dfdx)! Use by adding this to your `Cargo.toml`:\n\n```toml\ndfdx = \"0.13.0\"\n```\n\nSee the documentation at [docs.rs/dfdx](https://docs.rs/dfdx).\n\n[1] https://en.wikipedia.org/wiki/Automatic_differentiation#Reverse_accumulation\n\n## Design Goals\n\n1. Ergonomics the whole way down (both frontend interface \u0026 internals).\n2. Check as much at compile time as possible (i.e. don't compile if something is not correct).\n3. Maximize performance.\n4. Minimize unsafe code[1]\n5. Minimize Rc\u003cRefCell\u003cT\u003e\u003e used in internal code[2]\n\n[1] Currently the only unsafe calls are for matrix multiplication.\n\n[2] The only things that use `Arc` are tensors to store their data. `Arc` is used instead of `Box` to reduce\nallocations when tensors are cloned.\n\n## GPU acceleration with CUDA\n\nEnable the `cuda` feature to start using the `Cuda` device! Requires the installation of nvidia's cuda toolkit. See [feature flags docs](https://docs.rs/dfdx/latest/dfdx/feature_flags/index.html) for more info.\n\n## API Preview\n\nCheck [examples/](examples/) for more details.\n\n1. 👌 Simple Neural Networks API, completely shape checked at compile time.\n\n```rust\ntype Mlp = (\n    (Linear\u003c10, 32\u003e, ReLU),\n    (Linear\u003c32, 32\u003e, ReLU),\n    (Linear\u003c32, 2\u003e, Tanh),\n);\n\nfn main() {\n    let dev: Cuda = Default::default(); // or `Cpu`\n    let mlp = dev.build_module::\u003cMlp, f32\u003e();\n    let x: Tensor\u003cRank1\u003c10\u003e, f32, Cpu\u003e = dev.zeros();\n    let y: Tensor\u003cRank1\u003c2\u003e, f32, Cpu\u003e = mlp.forward(x);\n    mlp.save(\"checkpoint.npz\")?;\n}\n```\n\n2. 📈 Ergonomic Optimizer API\n\n```rust\ntype Model = ...\nlet mut model = dev.build_module::\u003cModel, f32\u003e();\nlet mut grads = model.alloc_grads();\nlet mut sgd = Sgd::new(\u0026model, SgdConfig {\n    lr: 1e-2,\n    momentum: Some(Momentum::Nesterov(0.9))\n});\n\nlet loss = ...\ngrads = loss.backward();\n\nsgd.update(\u0026mut model, \u0026grads);\n```\n\n3. 💡 Const tensors can be converted to and from normal rust arrays\n```rust\nlet t0: Tensor\u003cRank0, f32, _\u003e = dev.tensor(0.0);\nassert_eq!(t0.array(), \u00260.0);\n\nlet t1 /*: Tensor\u003cRank1\u003c3\u003e, f32, _\u003e*/ = dev.tensor([1.0, 2.0, 3.0]);\nassert_eq!(t1.array(), [1.0, 2.0, 3.0]);\n\nlet t2: Tensor\u003cRank2\u003c2, 3\u003e, f32, _\u003e = dev.sample_normal();\nassert_ne!(t2.array(), [[0.0; 3]; 2]);\n```\n\n## Fun/notable implementation details\n\n### Module\n\n```rust\npub trait Module\u003cInput\u003e {\n    type Output;\n    fn forward(\u0026self, input: Input) -\u003e Self::Output;\n}\n```\n\nFrom this flexible trait we get:\n1. Single \u0026 batched inputs (just have multiple impls!)\n2. Multiple inputs/outputs (multi-headed modules, or rnns)\n3. Behavior different when tape is present or not (**not** the .train()/.eval() behavior present in other libraries!).\n\n### Tuples represent feedforward (a.k.a sequential) modules\n\nSince we can implement traits for tuples, which is *not possible in other languages* AFAIK, they provide a very nice frontend\nfor sequentially executing modules.\n\n```rust\n// no idea why you would do this, but you could!\ntype Model = (ReLU, Sigmoid, Tanh);\nlet model = dev.build_module::\u003cModel, f32\u003e();\n```\n\n```rust\ntype Model = (Linear\u003c10, 5\u003e, Tanh)\nlet model = dev.build_module::\u003cModel, f32\u003e();\n```\n\nHow implementing Module for a 2-tuple looks:\n```rust\nimpl\u003cInput, A, B\u003e Module\u003cInput\u003e for (A, B)\nwhere\n    Input: Tensor,\n    A: Module\u003cInput\u003e,        // A is a module that takes Input\n    B: Module\u003cA::Output\u003e,    // B is a module that takes A's Output\n{\n    type Output = B::Output; // the output of this is B's Output\n    fn forward(\u0026self, x: Input) -\u003e Self::Output {\n        let x = self.0.forward(x);\n        let x = self.1.forward(x);\n        x\n    }\n}\n```\n\nModules implemented for Tuples up to 6 elements, but *you can arbitrarily nest them*!\n\n### No `Rc\u003cRefCells\u003cT\u003e\u003e` used - Gradient tape is not kept behind a cell!\n\nOther implementations may store a reference to the gradient tape directly on tensors, which requires mutating tensors or using Rc/Refcells all over the place.\n\nWe've figured out an elegant way to avoid this, reducing references and dynamic borrow checks to 0!\n\nSince all operations result in exactly 1 child, we can always move the gradient tape to the child of the last operation. Additionally, no model parameters (all tensors) will ever own the gradient tape because they will never be the result of any operation. This means we know exactly which tensor owns the gradient tape, and the tensors that have it will always be intermediate results that don't need to be maintained across gradient computation.\n\n*All of this together gives users unprecedented control/precision over what tensors are recorded on the gradient tape!*\n\nOne advanced use case requires that tensors be re-used multiple times in a computation graph.\nThis can be handled by cloning the tensor, and manually moving the gradient tape around.\n\n### Type checked backward\n\ntl;dr: If you forget to include a call to `trace()` or `traced()`, the program won't compile!\n\n```diff\n-let pred = module.forward(x);\n+let pred = module.forward(x.traced(grads));\nlet loss = (y - pred).square().mean();\nlet gradients = loss.backward();\n```\n\nSince we know exactly what tensors own the gradient tape, we can require the tensor passed into `.backward()` to own the gradient tape!\nAnd further, we can require it be moved into `.backward()`, so it can destruct the tape and construct the gradients!\n\n__All of this can be checked at compile time 🎉__\n\n### 📄 Validated against pytorch\n\nAll functions \u0026 operations are tested against behavior shown by similar code in pytorch.\n\n# License\n\nDual-licensed to be compatible with the Rust project.\n\nLicensed under the Apache License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0 or the MIT license http://opensource.org/licenses/MIT, at your option. This file may not be copied, modified, or distributed except according to those terms.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoreylowman%2Fdfdx","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcoreylowman%2Fdfdx","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoreylowman%2Fdfdx/lists"}