{"id":13728459,"url":"https://mratsim.github.io/Arraymancer/","last_synced_at":"2025-05-08T00:31:45.990Z","repository":{"id":21203994,"uuid":"88188361","full_name":"mratsim/Arraymancer","owner":"mratsim","description":"A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends","archived":false,"fork":false,"pushed_at":"2024-03-26T11:40:09.000Z","size":3816,"stargazers_count":1187,"open_issues_count":158,"forks_count":95,"subscribers_count":35,"default_branch":"master","last_synced_at":"2024-03-27T09:53:40.303Z","etag":null,"topics":["autograd","automatic-differentiation","cuda","cudnn","deep-learning","gpgpu","gpu-computing","high-performance-computing","iot","linear-algebra","machine-learning","matrix-library","multidimensional-arrays","ndarray","neural-networks","nim","opencl","openmp","parallel-computing","tensor"],"latest_commit_sha":null,"homepage":"https://mratsim.github.io/Arraymancer/","language":"Nim","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mratsim.png","metadata":{"files":{"readme":"README.md","changelog":"changelog.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2017-04-13T17:10:19.000Z","updated_at":"2024-04-15T10:52:39.371Z","dependencies_parsed_at":"2023-09-28T15:59:54.025Z","dependency_job_id":"e8ad7711-51c7-4a6d-be42-0fe165621ca4","html_url":"https://github.com/mratsim/Arraymancer","commit_stats":null,"previous_names":[],"tags_count":42,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mratsim%2FArraymancer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mratsim%2FArraymancer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mratsim%2FArraymancer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mratsim%2FArraymancer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mratsim","download_url":"https://codeload.github.com/mratsim/Arraymancer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224616558,"owners_count":17341151,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autograd","automatic-differentiation","cuda","cudnn","deep-learning","gpgpu","gpu-computing","high-performance-computing","iot","linear-algebra","machine-learning","matrix-library","multidimensional-arrays","ndarray","neural-networks","nim","opencl","openmp","parallel-computing","tensor"],"created_at":"2024-08-03T02:00:42.800Z","updated_at":"2024-11-14T19:31:11.264Z","avatar_url":"https://github.com/mratsim.png","language":"Nim","readme":"[![Join the chat on Discord #nim-science](https://img.shields.io/discord/371759389889003530?color=blue\u0026label=nim-science\u0026logo=discord\u0026logoColor=gold\u0026style=flat-square)](https://discord.gg/f5hA9UK3dY) [![Github Actions CI](https://github.com/mratsim/arraymancer/workflows/Arraymancer%20CI/badge.svg)](https://github.com/mratsim/arraymancer/actions?query=workflow%3A%Arraymancer+CI%22+branch%3Amaster) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) ![Stability](https://img.shields.io/badge/stability-experimental-orange.svg)\n\n# Arraymancer - A n-dimensional tensor (ndarray) library.\n\nArraymancer is a tensor (N-dimensional array) project in Nim. The main focus is providing a fast and ergonomic CPU, Cuda and OpenCL ndarray library on which to build a scientific computing ecosystem.\n\nThe library is inspired by Numpy and PyTorch and targets the following use-cases:\n\n- N-dimensional arrays (tensors) for numerical computing\n- machine learning algorithms (as in Scikit-learn: least squares solvers, PCA and dimensionality reduction, classifiers, regressors and clustering algorithms, cross-validation).\n- deep learning\n\nThe ndarray component can be used without the machine learning and deep learning component.\nIt can also use the OpenMP, Cuda or OpenCL backends.\n\nNote: While Nim is compiled and does not offer an interactive REPL yet (like Jupyter), it allows much faster prototyping than C++ due to extremely fast compilation times. Arraymancer compiles in about 5 seconds on my dual-core MacBook.\n\nReminder of supported compilation flags:\n\n- `-d:release`: Nim release mode (no stacktraces and debugging information)\n- `-d:danger`: No runtime checks like array bound checking\n- `-d:blas=blaslibname`: Customize the BLAS library used by Arraymancer. By default (i.e. if you don't define this setting) Arraymancer will try to automatically find a BLAS library (e.g. `blas.so/blas.dll` or `libopenblas.dll`) on your path. You should only set this setting if for some reason you want to use a specific BLAS library. See [nimblas](https://github.com/SciNim/nimblas) for further information\n- `-d:lapack=lapacklibname`: Customize the LAPACK library used by Arraymancer. By default (i.e. if you don't define this setting) Arraymancer will try to automatically find a LAPACK library (e.g. `lapack.so/lapack.dll` or `libopenblas.dll`) on your path. You should only set this setting if for some reason you want to use a specific LAPACK library. See [nimlapack](https://github.com/SciNim/nimlapack) for further information\n- `-d:openmp`: Multithreaded compilation\n- `-d:mkl`: Deprecated flag which forces the use of MKL. Implies `-d:openmp`. Use `-d:blas=mkl -d:lapack=mkl` instead, but _only_ if you want to force Arraymancer to use MKL, instead of looking for the available BLAS / LAPACK libraries\n- `-d:openblas`: Deprecated flag which forces the use of OpenBLAS. Use `-d:blas=openblas -d:lapack=openblas` instead, but _only_ if you want to force Arraymancer to use OpenBLAS, instead of looking for the available BLAS / LAPACK libraries\n- `-d:cuda`: Build with Cuda support\n- `-d:cudnn`: Build with CuDNN support, implies `-d:cuda`\n- `-d:avx512`: Build with AVX512 support by supplying the `-mavx512dq` flag\n  to gcc / clang. Without this flag the resulting binary does not use AVX512\n  even on CPUs that support it. Setting this flag, however, makes the binary\n  incompatible with CPUs that do _not_ support AVX512. See the comments in #505\n  for a discussion (from `v0.7.9`)\n- You might want to tune library paths in [nim.cfg](nim.cfg) after installation for OpenBLAS, MKL and Cuda compilation.\n  The current defaults should work on Mac and Linux; and on Windows after downloading `libopenblas.dll` or another\n  BLAS / LAPACK DLL (see the [Installation](#installation) section for more information) and copying it into a folder\n  in your path or into the compilation output folder.\n\n## Show me some code\n\nThe Arraymancer tutorial is available [here](https://mratsim.github.io/Arraymancer/tuto.first_steps.html).\n\nHere is a preview of Arraymancer syntax.\n\n### Tensor creation and slicing\n\n```Nim\nimport math, arraymancer\n\nconst\n    x = @[1, 2, 3, 4, 5]\n    y = @[1, 2, 3, 4, 5]\n\nvar\n    vandermonde = newSeq[seq[int]]()\n    row: seq[int]\n\nfor i, xx in x:\n    row = newSeq[int]()\n    vandermonde.add(row)\n    for j, yy in y:\n        vandermonde[i].add(xx^yy)\n\nlet foo = vandermonde.toTensor()\n\necho foo\n\n# Tensor[system.int] of shape \"[5, 5]\" on backend \"Cpu\"\n# |1          1       1       1       1|\n# |2          4       8      16      32|\n# |3          9      27      81     243|\n# |4         16      64     256    1024|\n# |5         25     125     625    3125|\n\necho foo[1..2, 3..4] # slice\n\n# Tensor[system.int] of shape \"[2, 2]\" on backend \"Cpu\"\n# |16      32|\n# |81     243|\n\necho foo[_|-1, _] # reverse the order of the rows\n\n# Tensor[int] of shape \"[5, 5]\" on backend \"Cpu\"\n# |5      25      125     625     3125|\n# |4      16       64     256     1024|\n# |3       9       27      81      243|\n# |2       4        8      16       32|\n# |1       1        1       1        1|\n```\n\n### Reshaping and concatenation\n\n```Nim\nimport arraymancer, sequtils\n\nlet a = toSeq(1..4).toTensor.reshape(2,2)\n\nlet b = toSeq(5..8).toTensor.reshape(2,2)\n\nlet c = toSeq(11..16).toTensor\nlet c0 = c.reshape(3,2)\nlet c1 = c.reshape(2,3)\n\necho concat(a,b,c0, axis = 0)\n# Tensor[system.int] of shape \"[7, 2]\" on backend \"Cpu\"\n# |1      2|\n# |3      4|\n# |5      6|\n# |7      8|\n# |11    12|\n# |13    14|\n# |15    16|\n\necho concat(a,b,c1, axis = 1)\n# Tensor[system.int] of shape \"[2, 7]\" on backend \"Cpu\"\n# |1      2     5     6    11    12    13|\n# |3      4     7     8    14    15    16|\n```\n\n### Broadcasting\n\nImage from Scipy\n\n![](https://scipy.github.io/old-wiki/pages/image004de9e.gif)\n\n```Nim\nimport arraymancer\n\nlet j = [0, 10, 20, 30].toTensor.reshape(4,1)\nlet k = [0, 1, 2].toTensor.reshape(1,3)\n\necho j +. k\n# Tensor[system.int] of shape \"[4, 3]\" on backend \"Cpu\"\n# |0      1     2|\n# |10    11    12|\n# |20    21    22|\n# |30    31    32|\n```\n\n### A simple two layers neural network\n\nFrom [example 3](./examples/ex03_simple_two_layers.nim).\n\n```Nim\nimport arraymancer, strformat\n\ndiscard \"\"\"\nA fully-connected ReLU network with one hidden layer, trained to predict y from x\nby minimizing squared Euclidean distance.\n\"\"\"\n\n# ##################################################################\n# Environment variables\n\n# N is batch size; D_in is input dimension;\n# H is hidden dimension; D_out is output dimension.\nlet (N, D_in, H, D_out) = (64, 1000, 100, 10)\n\n# Create the autograd context that will hold the computational graph\nlet ctx = newContext Tensor[float32]\n\n# Create random Tensors to hold inputs and outputs, and wrap them in Variables.\nlet\n  x = ctx.variable(randomTensor[float32](N, D_in, 1'f32))\n  y = randomTensor[float32](N, D_out, 1'f32)\n\n# ##################################################################\n# Define the model\n\nnetwork TwoLayersNet:\n  layers:\n    fc1: Linear(D_in, H)\n    fc2: Linear(H, D_out)\n  forward x:\n    x.fc1.relu.fc2\n\nlet\n  model = ctx.init(TwoLayersNet)\n  optim = model.optimizer(SGD, learning_rate = 1e-4'f32)\n\n# ##################################################################\n# Training\n\nfor t in 0 ..\u003c 500:\n  let\n    y_pred = model.forward(x)\n    loss = y_pred.mse_loss(y)\n\n  echo \u0026\"Epoch {t}: loss {loss.value[0]}\"\n\n  loss.backprop()\n  optim.update()\n```\n\n### Teaser A text generated with Arraymancer's recurrent neural network\n\nFrom [example 6](./examples/ex06_shakespeare_generator.nim).\n\nTrained 45 min on my laptop CPU on Shakespeare and producing 4000 characters\n\n```\nWhter!\nTake's servant seal'd, making uponweed but rascally guess-boot,\nBare them be that been all ingal to me;\nYour play to the see's wife the wrong-pars\nWith child of queer wretchless dreadful cold\nCursters will how your part? I prince!\nThis is time not in a without a tands:\nYou are but foul to this.\nI talk and fellows break my revenges, so, and of the hisod\nAs you lords them or trues salt of the poort.\n\nROMEO:\nThou hast facted to keep thee, and am speak\nOf them; she's murder'd of your galla?\n\n# [...] See example 6 for full text generation samples\n```\n\n## Table of Contents\n\u003c!-- TOC --\u003e\n\n- [Arraymancer - A n-dimensional tensor (ndarray) library.](#arraymancer---a-n-dimensional-tensor-ndarray-library)\n  - [Show me some code](#show-me-some-code)\n    - [Tensor creation and slicing](#tensor-creation-and-slicing)\n    - [Reshaping and concatenation](#reshaping-and-concatenation)\n    - [Broadcasting](#broadcasting)\n    - [A simple two layers neural network](#a-simple-two-layers-neural-network)\n    - [Teaser A text generated with Arraymancer's recurrent neural network](#teaser-a-text-generated-with-arraymancers-recurrent-neural-network)\n  - [Table of Contents](#table-of-contents)\n  - [Installation](#installation)\n  - [Full documentation](#full-documentation)\n  - [Features](#features)\n    - [Arraymancer as a Deep Learning library](#arraymancer-as-a-deep-learning-library)\n      - [Fizzbuzz with fully-connected layers (also called Dense, Affine or Linear layers)](#fizzbuzz-with-fully-connected-layers-also-called-dense-affine-or-linear-layers)\n      - [Handwritten digit recognition with convolutions](#handwritten-digit-recognition-with-convolutions)\n      - [Sequence classification with stacked Recurrent Neural Networks](#sequence-classification-with-stacked-recurrent-neural-networks)\n    - [Tensors on CPU, on Cuda and OpenCL](#tensors-on-cpu-on-cuda-and-opencl)\n  - [What's new in Arraymancer v0.5.1 - July 2019](#whats-new-in-arraymancer-v051---july-2019)\n  - [4 reasons why Arraymancer](#4-reasons-why-arraymancer)\n    - [The Python community is struggling to bring Numpy up-to-speed](#the-python-community-is-struggling-to-bring-numpy-up-to-speed)\n    - [A researcher workflow is a fight against inefficiencies](#a-researcher-workflow-is-a-fight-against-inefficiencies)\n    - [Can be distributed almost dependency free](#can-be-distributed-almost-dependency-free)\n    - [Bridging the gap between deep learning research and production](#bridging-the-gap-between-deep-learning-research-and-production)\n    - [So why Arraymancer ?](#so-why-arraymancer-)\n  - [Future ambitions](#future-ambitions)\n\n\u003c!-- /TOC --\u003e\n\n## Installation\n\nNim is available in some Linux repositories and on Homebrew for macOS.\n\nI however recommend installing Nim in your user profile via [``choosenim``](https://github.com/dom96/choosenim). Once choosenim installed Nim, you can `nimble install arraymancer` which will pull the latest arraymancer release and all its dependencies.\n\nTo install Arraymancer development version you can use `nimble install arraymancer@#head`.\n\nArraymancer requires a BLAS and a LAPACK library.\n\n- On Windows you can get the [OpenBLAS](https://www.openblas.net) library, which combines BLAS and LAPACK into a single DLL (`libopenblas.dll`), from the binary packages section of the OpenBLAS web page. Alternatively you can download separate BLAS and LAPACK libraries from the [LAPACK for Windows](https://icl.cs.utk.edu/lapack-for-windows/lapack/) site. You must then copy or extract those DLLs into a folder on your path or into the folder containing your compilation target.\n- On MacOS, Apple Accelerate Framework is included in all MacOS versions and provides those.\n- On Linux, you can download libopenblas and liblapack through your package manager.\n\nWindows users may have to download `libopenblas.dll` from the binary releases\n  section of [openblas](https://www.openblas.net), extract it to the compilation\n\n## Full documentation\n\nDetailed API is available at Arraymancer official [documentation](https://mratsim.github.io/Arraymancer/). Note: This documentation is only generated for 0.X release. Check the [examples folder](examples/) for the latest devel evolutions.\n\n## Features\n\nFor now Arraymancer is mostly at the multidimensional array stage, in particular Arraymancer offers the following:\n\n- Basic math operations generalized to tensors (sin, cos, ...)\n- Matrix algebra primitives: Matrix-Matrix, Matrix-Vector multiplication.\n- Easy and efficient slicing including with ranges and steps.\n- No need to worry about \"vectorized\" operations.\n- Broadcasting support. Unlike Numpy it is explicit, you just need to use `+.` instead of `+`.\n- Plenty of reshaping operations: concat, reshape, split, chunk, permute, transpose.\n- Supports tensors of up to 6 dimensions. For example a stack of 4 3D RGB minifilms of 10 seconds would be 6 dimensions:\n  `[4, 10, 3, 64, 1920, 1080]` for `[nb_movies, time, colors, depth, height, width]`\n- Can read and write .csv, Numpy (.npy) and HDF5 files.\n- OpenCL and Cuda backed tensors (not as feature packed as CPU tensors at the moment).\n- Covariance matrices.\n- Eigenvalues and Eigenvectors decomposition.\n- Least squares solver.\n- K-means and PCA (Principal Component Analysis).\n\n### Arraymancer as a Deep Learning library\n\nDeep learning features can be explored but are considered unstable while I iron out their final interface.\n\nReminder: The final interface is still **work in progress.**\n\nYou can also watch the following animated [neural network demo](https://github.com/Vindaar/NeuralNetworkLiveDemo) which shows live training via [nim-plotly](https://github.com/brentp/nim-plotly).\n\n#### Fizzbuzz with fully-connected layers (also called Dense, Affine or Linear layers)\n\nNeural network definition extracted from [example 4](examples/ex04_fizzbuzz_interview_cheatsheet.nim).\n\n```Nim\nimport arraymancer\n\nconst\n  NumDigits = 10\n  NumHidden = 100\n\nnetwork FizzBuzzNet:\n  layers:\n    hidden: Linear(NumDigits, NumHidden)\n    output: Linear(NumHidden, 4)\n  forward x:\n    x.hidden.relu.output\n\nlet\n  ctx = newContext Tensor[float32]\n  model = ctx.init(FizzBuzzNet)\n  optim = model.optimizer(SGD, 0.05'f32)\n# ....\necho answer\n# @[\"1\", \"2\", \"fizz\", \"4\", \"buzz\", \"6\", \"7\", \"8\", \"fizz\", \"10\",\n#   \"11\", \"12\", \"13\", \"14\", \"15\", \"16\", \"17\", \"fizz\", \"19\", \"buzz\",\n#   \"fizz\", \"22\", \"23\", \"24\", \"buzz\", \"26\", \"fizz\", \"28\", \"29\", \"30\",\n#   \"31\", \"32\", \"fizz\", \"34\", \"buzz\", \"36\", \"37\", \"38\", \"39\", \"40\",\n#   \"41\", \"fizz\", \"43\", \"44\", \"fizzbuzz\", \"46\", \"47\", \"fizz\", \"49\", \"50\",\n#   \"fizz\", \"52\",\"53\", \"54\", \"buzz\", \"56\", \"fizz\", \"58\", \"59\", \"fizzbuzz\",\n#   \"61\", \"62\", \"63\", \"64\", \"buzz\", \"fizz\", \"67\", \"68\", \"fizz\", \"buzz\",\n#   \"71\", \"fizz\", \"73\", \"74\", \"75\", \"76\", \"77\",\"fizz\", \"79\", \"buzz\",\n#   \"fizz\", \"82\", \"83\", \"fizz\", \"buzz\", \"86\", \"fizz\", \"88\", \"89\", \"90\",\n#   \"91\", \"92\", \"fizz\", \"94\", \"buzz\", \"fizz\", \"97\", \"98\", \"fizz\", \"buzz\"]\n```\n\n#### Handwritten digit recognition with convolutions\n\nNeural network definition extracted from [example 2](examples/ex02_handwritten_digits_recognition.nim).\n\n```Nim\nimport arraymancer\n\nnetwork DemoNet:\n  layers:\n    cv1:        Conv2D(@[1, 28, 28], out_channels = 20, kernel_size = (5, 5))\n    mp1:        Maxpool2D(cv1.out_shape, kernel_size = (2,2), padding = (0,0), stride = (2,2))\n    cv2:        Conv2D(mp1.out_shape, out_channels = 50, kernel_size = (5, 5))\n    mp2:        MaxPool2D(cv2.out_shape, kernel_size = (2,2), padding = (0,0), stride = (2,2))\n    fl:         Flatten(mp2.out_shape)\n    hidden:     Linear(fl.out_shape[0], 500)\n    classifier: Linear(500, 10)\n  forward x:\n    x.cv1.relu.mp1.cv2.relu.mp2.fl.hidden.relu.classifier\n\nlet\n  ctx = newContext Tensor[float32] # Autograd/neural network graph\n  model = ctx.init(DemoNet)\n  optim = model.optimizer(SGD, learning_rate = 0.01'f32)\n\n# ...\n# Accuracy over 90% in a couple minutes on a laptop CPU\n```\n\n#### Sequence classification with stacked Recurrent Neural Networks\n\nNeural network definition extracted [example 5](examples/ex05_sequence_classification_GRU.nim).\n\n```Nim\nimport arraymancer\n\nconst\n  HiddenSize = 256\n  Layers = 4\n  BatchSize = 512\n\n\nnetwork TheGreatSequencer:\n  layers:\n    gru1: GRULayer(1, HiddenSize, 4) # (num_input_features, hidden_size, stacked_layers)\n    fc1: Linear(HiddenSize, 32)                  # 1 classifier per GRU layer\n    fc2: Linear(HiddenSize, 32)\n    fc3: Linear(HiddenSize, 32)\n    fc4: Linear(HiddenSize, 32)\n    classifier: Linear(32 * 4, 3)                # Stacking a classifier which learns from the other 4\n  forward x, hidden0:\n    let\n      (output, hiddenN) = gru1(x, hidden0)\n      clf1 = hiddenN[0, _, _].squeeze(0).fc1.relu\n      clf2 = hiddenN[1, _, _].squeeze(0).fc2.relu\n      clf3 = hiddenN[2, _, _].squeeze(0).fc3.relu\n      clf4 = hiddenN[3, _, _].squeeze(0).fc4.relu\n\n    # Concat all\n    # Since concat backprop is not implemented we cheat by stacking\n    # Then flatten\n    result = stack(clf1, clf2, clf3, clf4, axis = 2)\n    result = classifier(result.flatten)\n\n# Allocate the model\nlet\n  ctx = newContext Tensor[float32]\n  model = ctx.init(TheGreatSequencer)\n  optim = model.optimizer(SGD, 0.01'f32)\n\n# ...\nlet exam = ctx.variable([\n    [float32 0.10, 0.20, 0.30], # increasing\n    [float32 0.10, 0.90, 0.95], # increasing\n    [float32 0.45, 0.50, 0.55], # increasing\n    [float32 0.10, 0.30, 0.20], # non-monotonic\n    [float32 0.20, 0.10, 0.30], # non-monotonic\n    [float32 0.98, 0.97, 0.96], # decreasing\n    [float32 0.12, 0.05, 0.01], # decreasing\n    [float32 0.95, 0.05, 0.07]  # non-monotonic\n  ])\n# ...\necho answer.unsqueeze(1)\n# Tensor[ex05_sequence_classification_GRU.SeqKind] of shape [8, 1] of type \"SeqKind\" on backend \"Cpu\"\n# \t  Increasing|\n# \t  Increasing|\n# \t  Increasing|\n# \t  NonMonotonic|\n# \t  NonMonotonic|\n# \t  Increasing| \u003c----- Wrong!\n# \t  Decreasing|\n# \t  NonMonotonic|\n```\n\n#### Composing models\n\nNetwork models can also act as layers in other network definitions.\nThe handwritten-digit-recognition model above can also be written like this:\n\n```Nim\nimport arraymancer\n\nnetwork SomeConvNet:\n  layers h, w:\n    cv1:        Conv2D(@[1, h, w], 20, (5, 5))\n    mp1:        Maxpool2D(cv1.out_shape, (2,2), (0,0), (2,2))\n    cv2:        Conv2D(mp1.out_shape, 50, (5, 5))\n    mp2:        MaxPool2D(cv2.out_shape, (2,2), (0,0), (2,2))\n    fl:         Flatten(mp2.out_shape)\n  forward x:\n    x.cv1.relu.mp1.cv2.relu.mp2.fl\n\n# this model could be initialized like this: let model = ctx.init(SomeConvNet, h = 28, w = 28)\n\n# functions `out_shape` and `in_shape` returning a `seq[int]` are convention (but not strictly necessary)\n# for layers/models that have clearly defined output and input size\nproc out_shape*[T](self: SomeConvNet[T]): seq[int] =\n  self.fl.out_shape\nproc in_shape*[T](self: SomeConvNet[T]): seq[int] =\n  self.cv1.in_shape\n\nnetwork DemoNet:\n  layers:\n  # here we use the previously defined SomeConvNet as a layer\n    cv:         SomeConvNet(28, 28)\n    hidden:     Linear(cv.out_shape[0], 500)\n    classifier: Linear(hidden.out_shape[0], 10)\n  forward x:\n    x.cv.hidden.relu.classifier\n```\n\n#### Custom layers\n\nIt is also possible to create fully custom layers.\nThe documentation for this can be found in the [official API documentation](https://mratsim.github.io/Arraymancer/nn_dsl.html).\n\n### Tensors on CPU, on Cuda and OpenCL\n\nTensors, CudaTensors and CLTensors do not have the same features implemented yet.\nAlso CudaTensors and CLTensors can only be float32 or float64 while CpuTensors can be integers, string, boolean or any custom object.\n\nHere is a comparative table of the core features.\n\n| Action                                            | Tensor                      | CudaTensor                 | ClTensor                   |\n| ------------------------------------------------- | --------------------------- | -------------------------- | -------------------------- |\n| Accessing tensor properties                       | [x]                         | [x]                        | [x]                        |\n| Tensor creation                                   | [x]                         | by converting a cpu Tensor | by converting a cpu Tensor |\n| Accessing or modifying a single value             | [x]                         | []                         | []                         |\n| Iterating on a Tensor                             | [x]                         | []                         | []                         |\n| Slicing a Tensor                                  | [x]                         | [x]                        | [x]                        |\n| Slice mutation `a[1,_] = 10`                      | [x]                         | []                         | []                         |\n| Comparison `==`                                   | [x]                         | []                         | []                         |\n| Element-wise basic operations                     | [x]                         | [x]                        | [x]                        |\n| Universal functions                               | [x]                         | []                         | []                         |\n| Automatically broadcasted operations              | [x]                         | [x]                        | [x]                        |\n| Matrix-Matrix and Matrix-Vector multiplication    | [x]                         | [x]                        | [x]                        |\n| Displaying a tensor                               | [x]                         | [x]                        | [x]                        |\n| Higher-order functions (map, apply, reduce, fold) | [x]                         | internal only              | internal only              |\n| Transposing                                       | [x]                         | [x]                        | []                         |\n| Converting to contiguous                          | [x]                         | [x]                        | []                         |\n| Reshaping                                         | [x]                         | [x]                        | []                         |\n| Explicit broadcast                                | [x]                         | [x]                        | [x]                        |\n| Permuting dimensions                              | [x]                         | []                         | []                         |\n| Concatenating tensors along existing dimension    | [x]                         | []                         | []                         |\n| Squeezing singleton dimension                     | [x]                         | [x]                        | []                         |\n| Slicing + squeezing                               | [x]                         | []                         | []                         |\n\n## What's new in Arraymancer\n\nThe full changelog is available in [changelog.md](./changelog.md).\n\n## 4 reasons why Arraymancer\n\n### The Python community is struggling to bring Numpy up-to-speed\n\n- Numba JIT compiler\n- Dask delayed parallel computation graph\n- Cython to ease numerical computations in Python\n- Due to the GIL shared-memory parallelism (OpenMP) is not possible in pure Python\n- Use \"vectorized operations\" (i.e. don't use for loops in Python)\n\nWhy not use in a single language with all the blocks to build the most efficient scientific computing library with Python ergonomics.\n\nOpenMP batteries included.\n\n### A researcher workflow is a fight against inefficiencies\n\nResearchers in a heavy scientific computing domain often have the following workflow: Mathematica/Matlab/Python/R (prototyping) -\u003e C/C++/Fortran (speed, memory)\n\nWhy not use in a language as productive as Python and as fast as C? Code once, and don't spend months redoing the same thing at a lower level.\n\n### Can be distributed almost dependency free\n\nArraymancer models can be packaged in a self-contained binary that only depends on a BLAS library like OpenBLAS, MKL or Apple Accelerate (present on all Mac and iOS).\n\nThis means that there is no need to install a huge library or language ecosystem to use Arraymancer. This also makes it naturally suitable for resource-constrained devices like mobile phones and Raspberry Pi.\n\n### Bridging the gap between deep learning research and production\n\nThe deep learning frameworks are currently in two camps:\n\n- Research: Theano, Tensorflow, Keras, Torch, PyTorch\n- Production: Caffe, Darknet, (Tensorflow)\n\nFurthermore, Python preprocessing steps, unless using OpenCV, often needs a custom implementation (think text/speech preprocessing on phones).\n\n- Managing and deploying Python (2.7, 3.5, 3.6) and packages version in a robust manner requires devops-fu (virtualenv, Docker, ...)\n- Python data science ecosystem does not run on embedded devices (Nvidia Tegra/drones) or mobile phones, especially preprocessing dependencies.\n- Tensorflow is supposed to bridge the gap between research and production but its syntax and ergonomics are a pain to work with. Like for researchers, you need to code twice, \"Prototype in Keras, and when you need low-level --\u003e Tensorflow\".\n- Deployed models are static, there is no interface to add a new observation/training sample to any framework, what if you want to use a model as a webservice with online learning?\n\n[Relevant XKCD from Apr 30, 2018](https://xkcd.com/1987/)\n\n![Python environment mess](https://imgs.xkcd.com/comics/python_environment.png)\n\n### So why Arraymancer ?\n\nAll those pain points may seem like a huge undertaking however thanks to the Nim language, we can have Arraymancer:\n\n- Be as fast as C\n- Accelerated routines with Intel MKL/OpenBLAS or even NNPACK\n- Access to CUDA and CuDNN and generate custom CUDA kernels on the fly via metaprogramming.\n- Almost dependency free distribution (BLAS library)\n- A Python-like syntax with custom operators `a * b` for tensor multiplication instead of `a.dot(b)` (Numpy/Tensorflow) or `a.mm(b)` (Torch)\n- Numpy-like slicing ergonomics `t[0..4, 2..10|2]`\n- For everything that Nim doesn't have yet, you can use Nim bindings to C, C++, Objective-C or Javascript to bring it to Nim. Nim also has unofficial Python-\u003eNim and Nim-\u003ePython wrappers.\n\n## Future ambitions\n\nBecause apparently to be successful you need a vision, I would like Arraymancer to be:\n\n- The go-to tool for Deep Learning video processing. I.e. `vid = load_video(\"./cats/youtube_cat_video.mkv\")`\n- Target javascript, WebAssembly, Apple Metal, ARM devices, AMD Rocm, OpenCL, you name it.\n- The base of a Starcraft II AI bot.\n- Target cryptominers FPGAs because they drove the price of GPUs for honest deep-learners too high.\n","funding_links":[],"categories":["CUDA Tools Libraries, and Frameworks","CUDA Tools","Tools"],"sub_categories":["viii. Linear Regression","Interfaces","Mesh networks"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/mratsim.github.io%2FArraymancer%2F","html_url":"https://awesome.ecosyste.ms/projects/mratsim.github.io%2FArraymancer%2F","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/mratsim.github.io%2FArraymancer%2F/lists"}