{"id":1374,"url":"https://github.com/palle-k/DL4S","last_synced_at":"2025-08-06T12:32:12.475Z","repository":{"id":46133309,"uuid":"174763595","full_name":"palle-k/DL4S","owner":"palle-k","description":"Accelerated tensor operations and dynamic neural networks based on reverse mode automatic differentiation for every device that can run Swift - from watchOS to Linux","archived":false,"fork":false,"pushed_at":"2023-11-05T15:42:33.000Z","size":20456,"stargazers_count":102,"open_issues_count":2,"forks_count":13,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-05-29T14:02:21.725Z","etag":null,"topics":["autograd","automatic-differentiation","convolutional-neural-networks","deep-learning","deep-neural-networks","derivatives","gradient-descent","machine-learning","neural-networks","optimizers","recurrent-networks","recurrent-neural-networks","swift","swift-machine-learning","tensor"],"latest_commit_sha":null,"homepage":"https://palle-k.github.io/DL4S/","language":"Swift","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/palle-k.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-03-10T01:09:01.000Z","updated_at":"2024-05-28T19:53:55.000Z","dependencies_parsed_at":"2023-12-16T00:20:55.624Z","dependency_job_id":null,"html_url":"https://github.com/palle-k/DL4S","commit_stats":{"total_commits":285,"total_committers":7,"mean_commits":"40.714285714285715","dds":"0.34736842105263155","last_synced_commit":"fb3f097bc59b0990a2dd87faaac6516d0675836f"},"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/palle-k%2FDL4S","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/palle-k%2FDL4S/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/palle-k%2FDL4S/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/palle-k%2FDL4S/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/palle-k","download_url":"https://codeload.github.com/palle-k/DL4S/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228898283,"owners_count":17988652,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autograd","automatic-differentiation","convolutional-neural-networks","deep-learning","deep-neural-networks","derivatives","gradient-descent","machine-learning","neural-networks","optimizers","recurrent-networks","recurrent-neural-networks","swift","swift-machine-learning","tensor"],"created_at":"2024-01-05T20:15:44.999Z","updated_at":"2024-12-09T13:30:35.298Z","avatar_url":"https://github.com/palle-k.png","language":"Swift","funding_links":[],"categories":["Machine Learning","Libs","Data and Storage","AI [🔝](#readme)"],"sub_categories":["Other Hardware","AI"],"readme":"\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://github.com/palle-k/DL4S/blob/develop/.github/logo.png?raw=true\" alt=\"DL4S\" width=\"300\" /\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://github.com/palle-k/DL4S/blob/master/License\"\u003e\u003cimg src=\"https://img.shields.io/github/license/palle-k/DL4S.svg\" alt=\"License\"/\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/palle-k/DL4S/releases\"\u003e\u003cimg src=\"https://img.shields.io/github/v/tag/palle-k/DL4S\" alt=\"Releases\"/\u003e\u003c/a\u003e\n\u003ca href=\"https://palle-k.github.io/DL4S/\"\u003e\u003cimg src=\"https://palle-k.github.io/DL4S/badge.svg\" alt=\"Documentation\" /\u003e\u003c/a\u003e\u003cbr/\u003e\n\u003ca href=\"#installation\"\u003e\u003cimg src=\"https://img.shields.io/badge/platform-Linux%20|%20macOS%20|%20iOS%20|%20tvOS%20|%20watchOS-green.svg\" alt=\"Supports Linux, macOS, iOS, tvOS and watchOS\" /\u003e\u003c/a\u003e\n\u003ca href=\"https://travis-ci.org/palle-k/DL4S\"\u003e\u003cimg src=\"https://travis-ci.org/palle-k/DL4S.svg?branch=master\" alt=\"Build Status\" /\u003e\u003c/a\u003e\n\u003c/p\u003e\n\nDL4S provides a high-level API for many accelerated operations common in neural networks and deep learning.\nIt furthermore has automatic differentiation builtin, which allows you to create and train neural networks without needing to manually\nimplement backpropagation - without needing a special Swift toolchain.\n\nFeatures include implementations for many basic binary and unary operators,\nbroadcasting, matrix operations, convolutional and recurrent neural networks, \ncommonly used optimizers, second derivatives and much more.\nDL4S provides implementations for common network architectures, such as VGG, AlexNet, ResNet and Transformers.\n\nWhile its primary purpose is deep learning and optimization, DL4S can be used as a library for vectorized mathematical operations like numpy.\n\n[Read the full documentation](https://palle-k.github.io/DL4S/)\n\n## Overview\n1. [Installation](#installation)\n2. [Features](#features)\n    1. Layers\n    2. Optimizers\n    3. Losses\n    4. Tensor Operations\n    5. Engines\n    6. Architectures\n3. [Examples](#examples)\n\n\n## Installation\n\n### iOS / tvOS / macOS\n\n1. In Xcode, select \"File\" \u003e \"Swift Packages\" \u003e \"Add Package Dependency\"\n2. Enter `https://github.com/palle-k/DL4S.git` into the Package URL field and click \"Next\".\n3. Select \"Branch\", \"master\" and click \"Next\".\n4. Enable the Package Product DL4S, your app in the \"Add to Target\" column and click \"Next\". \n\n**Note**: Installation via CocoaPods is no longer supported for newer versions.\n\n### Swift Package\nAdd the dependency to your `Package.swift` file:\n\n```swift\n.package(url: \"https://github.com/palle-k/DL4S.git\", .branch(\"master\"))\n```\n\nThen add `DL4S` as a dependency to your target:\n\n```swift\n.target(name: \"MyPackage\", dependencies: [\"DL4S\"])\n```\n\n#### MKL / IPP / OpenMP Support\n\nDL4S can be accelerated with Intel's Math Kernel Library, Integrated Performance Primitives and OpenMP ([Installation Instructions](https://software.intel.com/en-us/articles/installing-intel-free-libs-and-python-apt-repo)).\n\nOn Apple devices, DL4S uses vectorized functions provided by the builtin Accelerate framework by default.\nIf no acceleration library is available, a fallback implementation is used.\n\nCompiling with MKL/IPP:\n```bash\n# After adding the APT repository as described in the installation instructions\nsudo apt-get install intel-mkl-64bit-2019.5-075 intel-ipp-64bit-2019.5-075 libiomp-dev\n\nexport MKLROOT=/opt/intel/mkl\nexport IPPROOT=/opt/intel/ipp\nexport LD_LIBRARY_PATH=${MKLROOT}/lib/intel64:${IPPROOT}/lib/intel64:${LD_LIBRARY_PATH}\n\nswift build -c release \\\n    -Xswiftc -DMKL_ENABLE \\\n    -Xlinker -L${MKLROOT}/lib/intel64 \\\n    -Xlinker -L${IPPROOT}/lib/intel64\n```\n\n### TensorBoard Support\n\n[DL4S-Tensorboard](https://github.com/palle-k/DL4S-Tensorboard) provides a summary writer that can write tensorboard compatible logs.\n\n### LLDB Extension\n\nDL4S includes a LLDB python script that provides custom descriptions for Tensors (`util/debugger_support/tensor.py`).\n\nTo use enhanced summaries, execute  `command script import /path/to/DL4S/util/debugger_support/tensor.py`\neither directly in LLDB or add the command to your `~/.lldbinit` file.\n\nThen you can use the `print` or `frame variable` commands to print human-readable descriptions of tensors.\n\n## Features\n\n\u003cdetails\u003e\n\u003csummary\u003e\nLayers\n\u003c/summary\u003e\n\u003cp\u003e\n\nCore:\n\n- [x] Convolution\n- [x] Transposed Convolution\n- [x] Dense/Linear/Fully Connected\n- [x] LSTM\n- [x] Gated Recurrent Unit (GRU)\n- [x] Vanilla RNN\n- [x] Embedding\n- [x] Multi-head Attention\n- [x] Transformer Block\n\nPooling:\n\n- [x] Max Pooling\n- [x] Average Pooling\n- [x] Adaptive Max Pooling\n- [x] Adaptive Average Pooling\n\nNorm:\n\n- [x] Batch Norm\n- [x] Layer Norm\n\nUtility:\n\n- [x] Bidirectional RNNs\n- [x] Sequential\n- [x] Lambda\n- [x] Dropout\n- [x] Lambda\n\nActivation:\n\n- [x] Relu\n- [x] LeakyRelu\n- [x] Gelu\n- [x] Tanh\n- [x] Sigmoid\n- [x] Softmax\n- [x] Log Softmax\n- [x] Dropout\n- [x] Gelu\n- [x] Swish\n- [x] Mish\n- [x] LiSHT\n\nTransformer:\n\n- [x] Positional Encoding\n- [x] Scaled Dot Product Attention\n- [x] Multihead Attention\n- [x] Pointwise Feed Forward\n- [x] Transformer Encoder Block\n- [x] Transformer Decoder Block\n\n\u003c/p\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nOptimizers\n\u003c/summary\u003e\n\u003cp\u003e\n\n- [x] SGD\n- [x] Momentum\n- [x] Adam\n- [x] AMSGrad\n- [x] AdaGrad\n- [x] AdaDelta\n- [x] RMSProp\n\n\u003c/p\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nLosses\n\u003c/summary\u003e\n\u003cp\u003e\n\n- [x] Binary Cross-Entropy\n- [x] Categorical Cross-Entropy\n- [x] Negative Log Likelihood (NLL Loss)\n- [x] MSE\n- [x] L1 \u0026 L2 regularization\n\n\u003c/p\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nTensor Operations\n\u003c/summary\u003e\n\u003cp\u003e\n\nBehavior of broadcast operations is consistent with numpy rules.\n\n- [x] broadcast-add\n- [x] broadcast-sub\n- [x] broadcast-mul \n- [x] broadcast-div\n- [x] matmul\n- [x] neg\n- [x] exp\n- [x] pow\n- [x] log\n- [x] sqrt\n- [x] sin\n- [x] cos\n- [x] tan\n- [x] tanh\n- [x] sum\n- [x] max\n- [x] relu\n- [x] leaky relu\n- [x] gelu\n- [x] elu\n- [x] elementwise min\n- [x] elementwise max\n- [x] reduce sum\n- [x] reduce max\n- [x] scatter\n- [x] gather\n- [x] conv2d\n- [x] transposed conv2d\n- [x] max pool\n- [x] avg pool\n- [x] subscript\n- [x] subscript range\n- [x] transpose\n- [x] axis permute\n- [x] reverse\n- [x] im2col\n- [x] col2im\n- [x] stack / concat\n- [x] swish activation\n- [x] mish activation\n- [x] lisht activation\n- [x] diagonal matrix generation\n- [x] diagonal extraction\n- [x] band matrix generation\n\n\u003c/p\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nEngines\n\u003c/summary\u003e\n\u003cp\u003e\n\n- [x] CPU (Accelerate framework for Apple Devices)\n- [x] CPU (Intel Math Kernel Library and Integrated Performance Primitives)\n- [x] CPU (Generic)\n- [ ] GPU (ArrayFire: OpenCL, CUDA)\n\nFor an experimental, early stage GPU accelerated version, check out `feature/arrayfire`.\n\n\u003c/p\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nArchitectures\n\u003c/summary\u003e\n\u003cp\u003e\n\nDefault implementations are provided for the following architectures:\n\n- [x] ResNet18\n- [x] VGG (11, 13, 16, 19)\n- [x] AlexNet\n- [x] Transformer\n\n\u003c/p\u003e\n\u003c/details\u003e\n\n\n## Examples\n\nSome high level examples have been implemented in other repositories:\n\n- [Neural Machine Translation](https://github.com/palle-k/Seq2Seq-DL4S) based on seq2seq with Attention\n- [Generative Adversarial Networks](https://github.com/palle-k/DL4S-WGAN-GP) - Wasserstein GAN with Gradient Penalty (WGAN-GP)\n- [Reinforcement Learning](https://github.com/palle-k/REINFORCE-DL4S) - Trains an agent to find the exit in a 2D grid world.\n\n### Arithmetic \u0026 Differentiation\n\nDL4S provides a high-level interface to many vectorized operations on tensors.\n\n```swift\nlet a = Tensor\u003cFloat, CPU\u003e([[1,2],[3,4],[5,6]], requiresGradient: true)\nlet prod = a.transposed().matrixMultipled(with: a)\nlet s = prod.reduceSum()\nlet l = log(s)\nprint(l) // 5.1873856\n```\n\nWhen a tensor is marked to require a gradient, a compute graph will be captured. \nThe graph stores all operations, which use that tensor directly or indirectly as an operand.\n\nIt is then possible to backpropagate through that graph using the `gradients(of:)` function:\n```swift\n// Backpropagate\nlet dl_da = l.gradients(of: [a])[0]\n\nprint(dl_da)\n/*\n[[0.034, 0.034]\n [0.078, 0.078]\n [0.123, 0.123]]\n*/\n```\n\n#### Second derivatives\n\nThe operations used during backpropagation are themselves differentiable. \nTherefore, second derivatives can be computed by computing the gradient of the gradient.\n\nWhen higher order derivatives are required, the compute graph of the backwards pass has to be explicitly retained.\n```swift\nlet t = Tensor\u003cFloat, CPU\u003e([1,2,3,4], requiresGradient: true)\n\nlet result = t * t * t\nprint(result) // [1, 8, 27, 64]\n\nlet grad = result.gradients(of: [t], retainBackwardsGraph: true)[0]\nprint(grad) // [3, 12, 27, 48]\n\nlet secondGrad = grad.gradients(of: [t], retainBackwardsGraph: true)[0]\nprint(secondGrad) // [6, 12, 18, 24]\n\nlet thirdGrad = secondGrad.gradients(of: [t])[0]\nprint(thirdGrad) // [6, 6, 6, 6]\n```\n\n\n### Convolutional Networks\n\nExample for MNIST classification\n\n```swift\n// Input must be batchSizex1x28x28\nvar model = Sequential {\n   Convolution2D\u003cFloat, CPU\u003e(inputChannels: 1, outputChannels: 6, kernelSize: (5, 5))\n   Relu\u003cFloat, CPU\u003e()\n   MaxPool2D\u003cFloat, CPU\u003e(windowSize: 2, stride: 2)\n   \n   Convolution2D\u003cFloat, CPU\u003e(inputChannels: 6, outputChannels: 16, kernelSize: (5, 5))\n   Relu\u003cFloat, CPU\u003e()\n   MaxPool2D\u003cFloat, CPU\u003e(windowSize: 2, stride: 2)\n   \n   Flatten\u003cFloat, CPU\u003e()\n   \n   Dense\u003cFloat, CPU\u003e(inputSize: 256, outputSize: 120)\n   Relu\u003cFloat, CPU\u003e()\n   \n   Dense\u003cFloat, CPU\u003e(inputSize: 120, outputSize: 10)\n   LogSoftmax\u003cFloat, CPU\u003e()\n}\n\nvar optimizer = Adam(model: model, learningRate: 0.001)\n\n// Single iteration of minibatch gradient descent\nlet batch: Tensor\u003cFloat, CPU\u003e = ... // shape: [batchSize, 1, 28, 28]\nlet y_true: Tensor\u003cInt32, CPU\u003e = ... // shape: [batchSize]\n\n// use optimizer.model, not model\nlet pred = optimizer.model(batch)\nlet loss = categoricalNegativeLogLikelihood(expected: y_true, actual: pred)\n\nlet gradients = loss.gradients(of: optimizer.model.parameters)\noptimizer.update(along: gradients)\n```\n\n### Recurrent Networks\n\nExample for MNIST classification\n\nThe Gated Reccurent Unit scans the image from top to bottom and uses the final hidden state for classification.\n\n```swift\nlet model = Sequential {\n    GRU\u003cFloat, CPU\u003e(inputSize: 28, hiddenSize: 128, direction: .forward)\n    Lambda\u003cGRU\u003cFloat, CPU\u003e.Outputs, Tensor\u003cFloat, CPU\u003e, Float, CPU\u003e { inputs in\n        inputs.0\n    }\n    Dense\u003cFloat, CPU\u003e(inputSize: 128, outputSize: 10)\n    LogSoftmax\u003cFloat, CPU\u003e()\n}\n\nvar optimizer = Adam(model: model, learningRate: 0.001)\n\nlet batch: Tensor\u003cFloat, CPU\u003e = ... // shape: [batchSize, 28, 28]\nlet y_true: Tensor\u003cInt32, CPU\u003e = ... // shape: [batchSize]\n\nlet x = batch.permuted(to: 1, 0, 2) // Swap first and second axis\nlet pred = optimizer.model(x)\nlet loss = categoricalNegativeLogLikelihood(expected: y_true, actual: pred)\n\nlet gradients = loss.gradients(of: optimizer.model.parameters)\noptimizer.update(along: gradients)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpalle-k%2FDL4S","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpalle-k%2FDL4S","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpalle-k%2FDL4S/lists"}