{"id":13736652,"url":"https://github.com/andreaferretti/neo","last_synced_at":"2025-04-07T19:13:37.247Z","repository":{"id":42485783,"uuid":"92522688","full_name":"andreaferretti/neo","owner":"andreaferretti","description":"A matrix library","archived":false,"fork":false,"pushed_at":"2024-08-28T08:04:09.000Z","size":447,"stargazers_count":245,"open_issues_count":19,"forks_count":18,"subscribers_count":21,"default_branch":"master","last_synced_at":"2025-03-31T18:18:30.414Z","etag":null,"topics":["linear-algebra","matrices","nim","vectors"],"latest_commit_sha":null,"homepage":"https://andreaferretti.github.io/neo/","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/andreaferretti.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-05-26T15:12:19.000Z","updated_at":"2025-03-21T03:56:21.000Z","dependencies_parsed_at":"2024-01-06T11:52:44.237Z","dependency_job_id":"b9ad0dad-8c71-4df1-ba55-0482be56e6c1","html_url":"https://github.com/andreaferretti/neo","commit_stats":null,"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreaferretti%2Fneo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreaferretti%2Fneo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreaferretti%2Fneo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreaferretti%2Fneo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/andreaferretti","download_url":"https://codeload.github.com/andreaferretti/neo/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247312077,"owners_count":20918344,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["linear-algebra","matrices","nim","vectors"],"created_at":"2024-08-03T03:01:25.812Z","updated_at":"2025-04-07T19:13:36.639Z","avatar_url":"https://github.com/andreaferretti.png","language":"HTML","funding_links":[],"categories":["Algorithms"],"sub_categories":["Math"],"readme":"# 1. Neo - A Matrix library\n\n![logo](https://raw.githubusercontent.com/andreaferretti/neo/master/img/neo.png)\n\nThis library is meant to provide basic linear algebra operations for Nim\napplications. The ambition would be to become a stable basis on which to\ndevelop a scientific ecosystem for Nim, much like Numpy does for Python.\n\nThe library has been tested on Ubuntu Linux 16.04 and 20.04 64-bit using\neither ATLAS, OpenBlas or Intel MKL. It was also tested on OSX Yosemite to Monterey.\nThe GPU support has been tested using NVIDIA CUDA 8.0 up to 10.x.\n\nThe library is currently aligned with latest Nim devel.\n\nAPI documentation is [here](https://andreaferretti.github.io/neo/htmldocs/neo.html)\n\nA lot of examples are available in the tests.\n\nTable of contents\n-----------------\n\u003c!-- TOC depthfrom:2 depthto:6 orderedlist:false updateonsave:true withlinks:true --\u003e\n\n- [Introduction](#introduction)\n- [Working on the CPU](#working-on-the-cpu)\n  - [Dense linear algebra](#dense-linear-algebra)\n    - [Initialization](#initialization)\n    - [Working with 32-bit](#working-with-32-bit)\n    - [Accessors](#accessors)\n    - [Slicing](#slicing)\n    - [Iterators](#iterators)\n    - [Equality](#equality)\n    - [Pretty-print](#pretty-print)\n    - [Reshape operations](#reshape-operations)\n    - [BLAS Operations](#blas-operations)\n    - [Universal functions](#universal-functions)\n    - [Rewrite rules](#rewrite-rules)\n    - [Stacking vectors and matrices](#stacking-vectors-and-matrices)\n    - [Solving linear systems](#solving-linear-systems)\n    - [Computing eigenvalues and eigenvectors](#computing-eigenvalues-and-eigenvectors)\n  - [Sparse linear algebra](#sparse-linear-algebra)\n- [Working on the GPU](#working-on-the-gpu)\n  - [Dense linear algebra](#dense-linear-algebra)\n  - [Sparse linear algebra](#sparse-linear-algebra)\n- [Static typing for dimensions](#static-typing-for-dimensions)\n- [Design](#design)\n  - [On the CPU](#on-the-cpu)\n  - [Why fields are public](#why-fields-are-public)\n  - [On the GPU](#on-the-gpu)\n- [Linking external libraries](#linking-external-libraries)\n  - [Linking BLAS and LAPACK implementations](#linking-blas-and-lapack-implementations)\n  - [Linking CUDA](#linking-cuda)\n- [TODO](#todo)\n- [Contributing](#contributing)\n\n\u003c!-- /TOC --\u003e\n\n## 1.1. Introduction\n\nThe library revolves around operations on vectors and matrices of floating\npoint numbers. It allows to compute operations either on the CPU or on the\nGPU offering identical APIs.\n\nThe library defines types `Matrix[A]` and `Vector[A]`, where `A` is sometimes\nrestricted to be `float32` or `float64` (usually to use BLAS and LAPACK\nroutines). Actually, `Vector[A]` is just a small wrapper around `seq[A]`, which\nallows to perform linear algebra operations on standard Nim sequences without\ncopying.\n\nSimilar types exist on the GPU side, and there are facilities to move them\nback and forth from the CPU.\n\nNeo makes use of many standard libraries such as BLAS, LAPACK and CUDA. See\n[this section](#linking-external-libraries) to learn how to link the correct\nimplementation for your platform.\n\n## 1.2. Working on the CPU\n\n### 1.2.1. Dense linear algebra\n\n#### 1.2.1.1. Initialization\n\nHere we show a few ways to create matrices and vectors. All matrices methods\naccept a parameter to define whether to store the matrix in row-major (that is,\ndata are laid out in memory row by row) or column-major order (that is, data\nare laid out in memory column by column). The default is in each case\ncolumn-major.\n\nWhenever possible, we try to deduce whether to use 32 or 64 bits by appropriate\nparameters. When this is not possible, there is an optional parameter `float32`\nthat can be passed to specify the precision (the default is 64 bit).\n\nStatic matrices and vectors can be created like this:\n\n```nim\nimport neo\n\nlet\n  v1 = makeVector(5, proc(i: int): float64 = (i * i).float64)\n  v2 = randomVector(7, max = 3.0) # max is optional, default 1\n  v3 = constantVector(5, 3.5)\n  v4 = zeros(8)\n  v5 = ones(9)\n  v6 = vector(1.0, 2.0, 3.0, 4.0, 5.0)\n  v7 = vector([1.2, 3.4, 5.6])\n  m1 = makeMatrix(6, 3, proc(i, j: int): float64 = (i + j).float64)\n  m2 = randomMatrix(2, 8, max = 1.6) # max is optional, default 1\n  m3 = constantMatrix(3, 5, 1.8, order = rowMajor) # order is optional, default colMajor\n  m4 = ones(3, 6)\n  m5 = zeros(5, 2)\n  m6 = eye(7)\n  m7 = matrix(@[\n    @[1.2, 3.5, 4.3],\n    @[1.1, 4.2, 1.7]\n  ])\n```\n\nAll constructors that take as input an existing array or seq perform a copy of\nthe data for memory safety.\n\n#### 1.2.1.2. Working with 32-bit\n\nSome constructors (such as `zeros`) allow a type specifier if one wants to\ncreate a 32-bit vector or matrix. The following example all return 32-bit\nvectors and matrices\n\n```nim\nimport neo\n\nlet\n  v1 = makeVector(5, proc(i: int): float32 = (i * i).float32)\n  v2 = randomVector(7, max = 3'f32) # max is no longer optional, to distinguish 32/64 bit\n  v3 = constantVector(5, 3.5'f32)\n  v4 = zeros(8, float32)\n  v5 = ones(9, float32)\n  v6 = vector(1'f32, 2'f32, 3'f32, 4'f32, 5'f32)\n  v7 = vector([1.2'f32, 3.4'f32, 5.6'f32])\n  m1 = makeMatrix(6, 3, proc(i, j: int): float32 = (i + j).float32)\n  m2 = randomMatrix(2, 8, max = 1.6'f32)\n  m3 = constantMatrix(3, 5, 1.8'f32, order = rowMajor) # order is optional, default colMajor\n  m4 = ones(3, 6, float32)\n  m5 = zeros(5, 2, float32)\n  m6 = eye(7, float32)\n  m7 = matrix(@[\n    @[1.2'f32, 3.5'f32, 4.3'f32],\n    @[1.1'f32, 4.2'f32, 1.7'f32]\n  ])\n```\n\nOne can convert precision with `to32` or `to64`:\n\n```nim\nlet\n  v64 = randomVector(10)\n  v32 = v64.to32()\n  m32 = randomMatrix(3, 8, max = 1'f32)\n  m64 = m32.to64()\n```\n\nOnce vectors and matrices are created, everything is inferred, so there are no\ndifferences in working with 32-bit or 64-bit. All examples that follow are for\n64-bit, but they would work as well for 32-bit.\n\n#### 1.2.1.3. Accessors\n\nVectors can be accessed as expected:\n\n```nim\nvar v = randomVector(6)\nv[4] = 1.2\necho v[3]\n```\n\nSame for matrices, where `m[i, j]` denotes the item on row `i` and column `j`,\nregardless of the matrix order:\n\n```nim\nvar m = randomMatrix(3, 7)\nm[1, 3] = 0.8\necho m[2, 2]\n```\n\nOne can also map vectors and matrices via a proc:\n\n```nim\nlet\n  v1 = v.map(proc(x: float64): float64 = 2 - 3 * x)\n  m1 = m.map(proc(x: float64): float64 = 1 / x)\n```\n\n#### 1.2.1.4. Slicing\n\nThe `row` and `column` procs will return vectors that share memory with their\nparent matrix:\n\n```nim\nlet\n  m = randomMatrix(10, 10)\n  r2 = m.row(2)\n  c5 = m.column(5)\n```\n\nSimilarly, one can slice a matrix with the familiar notation:\n\n```nim\nlet\n  m = randomMatrix(10, 10)\n  m1 = m[2 .. 4, 3 .. 8]\n  m2 = m[All, 1 .. 5]\n```\n\nwhere `All` is a placeholder that denotes that no slicing occurs on that\ndimension.\n\nIn general it is convenient to have slicing, rows and columns that do not\ncopy data but share the underlying data sequence. This can have two possible\ndrawbacks:\n\n* the result may need to be modified while the original matrix stays unchanged,\n  or viceversa;\n* a small matrix or vector may hold a reference to a large data sequence,\n  preventing it to be garbage collected.\n\nIn this case, it is enough to call the `.clone()` proc to obtain a copy\nof the matrix or vector with its own storage.\n\n#### 1.2.1.5. Iterators\n\nOne can iterate over vector or matrix elements, as well as over rows and columns\n\n```nim\nlet\n  v = randomVector(6)\n  m = randomMatrix(3, 5)\nfor x in v: echo x\nfor i, x in v: echo i, x\nfor x in m: echo x\nfor t, x in m:\n  let (i, j) = t\n  echo i, j, x\nfor row in m.rows:\n  echo row[0]\nfor column in m.columns:\n  echo column[1]\n```\n\nOne important point about performance. When iterating over rows or columns,\nthe same `ref` is reused throughout - this entails that the loop is\nallocation-free, resulting in orders of magnitude faster loops. One should\npay attention not to hold to these references, because they will be mutated.\n\nThis means that - for instance - the following is correct:\n\n```nim\nlet m = randomMatrix(1000, 1000)\nvar columnSum = zeros(1000)\nfor c in m.columns =\n  columnSum += c\n```\n\nbut the following will give wrong results (all elements of `cols` will be\nidentical at the end):\n\n```nim\nlet m = randomMatrix(1000, 1000)\nvar cols = newSeq[Vector[float64]]()\nfor c in m.columns =\n  cols.add(c)\n```\n\nIf one needs a fresh reference for each element of the iteration, the\n`rowsSlow` and `columnSlow` iterators are available, hence the\nfollowing modification is ok:\n\n```nim\nlet m = randomMatrix(1000, 1000)\nvar cols = newSeq[Vector[float64]]()\nfor c in m.columnsSlow =\n  cols.add(c)\n```\n\n#### 1.2.1.6. Equality\n\nThere are two kinds of equality. The usual `==` operator will compare the\ncontents of vector and matrices exactly\n\n```nim\nlet\n  u = vector(1.0, 2.0, 3.0, 4.0)\n  v = vector(1.0, 2.0, 3.0, 4.0)\n  w = vector(1.0, 3.0, 3.0, 4.0)\nu == v # true\nu == w # false\n```\n\nUsually, though, one wants to take into account the errors introduced by\nfloating point operations. To do this, use the `=~` operator, or its\nnegation `!=~`:\n\n```nim\nlet\n  u = vector(1.0, 2.0, 3.0, 4.0)\n  v = vector(1.0, 2.000000001, 2.99999999, 4.0)\nu == v # false\nu =~ v # true\n```\n\n#### 1.2.1.7. Pretty-print\n\nBoth vectors and matrix have a pretty-print operation, so one can do\n\n```nim\nlet m = randomMatrix(3, 7)\necho m8\n```\n\nand get something like\n\n    [ [ 0.5024584865674662  0.0798945419892334  0.7512423051567048  0.9119041361916302  0.5868388894943912  0.3600554448403415  0.4419034543022882 ]\n      [ 0.8225964245706265  0.01608615513584155 0.1442007939324697  0.7623388321096165  0.8419745686508193  0.08792951865247645 0.2902529012579151 ]\n      [ 0.8488187232786935  0.422866666087792 0.1057975175658363  0.07968277822379832 0.7526946339452074  0.7698915909784674  0.02831893268471575 ] ]\n\n#### 1.2.1.8. Reshape operations\n\nThe following operations do not change the underlying memory layout of matrices\nand vectors. This means they run in very little time even on big matrices, but\nyou have to pay attention when mutating matrices and vectors produced in this\nway, since the underlying data is shared.\n\n```nim\nlet\n  m1 = randomMatrix(6, 9)\n  m2 = randomMatrix(9, 6)\n  v1 = randomVector(9)\necho m1.t # transpose, done in constant time without copying\necho m1 + m2.t\nlet m3 = m1.reshape(9, 6)\nlet m4 = v1.asMatrix(3, 3)\nlet v2 = m2.asVector\n```\n\nIn case you need to allocate a copy of the original data, say in order to\ntranspose a matrix and then mutate the transpose without altering the original\nmatrix, a `clone` operation is available:\n\n```nim\nlet m5 = m1.clone\n```\n\nNotice that `clone()` will be called internally anyway when using one of the\nreshape operations with a matrix that is not contiguous (that is, a matrix\nobtained by slicing).\n\nThere is also a hard transpose operation which, unlike `t()` will not try\nto share storage but will always create a new matrix instead and copy the\ndata to the new matrix (this way, it will also preserve  the row-major or\ncolum-major order). The hard transpose is denoted `T()`, so that\n\n```nim\nm.t == m.T\n```\n\nalways holds, although the internal representations differ.\n\n#### 1.2.1.9. BLAS Operations\n\nA few linear algebra operations are available, wrapping BLAS libraries:\n\n```nim\nvar v1 = randomVector(7)\nlet\n  v2 = randomVector(7)\n  m1 = randomMatrix(6, 9)\n  m2 = randomMatrix(9, 7)\necho 3.5 * v1\nv1 *= 2.3\necho v1 + v2\necho v1 - v2\necho v1 * v2 # dot product\necho v1 |*| v2 # Hadamard (component-wise) product\necho l_1(v1) # l_1 norm\necho l_2(v1) # l_2 norm\necho m2 * v1 # matrix-vector product\necho m1 * m2 # matrix-matrix product\necho m1 |*| m2 # Hadamard (component-wise) product\necho max(m1)\necho min(v2)\n```\n\n#### 1.2.1.10. Universal functions\n\nUniversal functions are real-valued functions that are extended to vectors\nand matrices by working element-wise. There are many common functions that are\nimplemented as universal functions:\n\n```nim\nsqrt\ncbrt\nlog10\nlog2\nlog\nexp\narccos\narcsin\narctan\ncos\ncosh\nsin\nsinh\ntan\ntanh\nerf\nerfc\nlgamma\ntgamma\ntrunc\nfloor\nceil\ndegToRad\nradToDeg\n```\n\nThis means that, for instance, the following check passes:\n\n```nim\n  let\n    v1 = vector(1.0, 2.3, 4.5, 3.2, 5.4)\n    v2 = log(v1)\n    v3 = v1.map(log)\n\n  assert v2 == v3\n```\n\nUniversal functions work both on 32 and 64 bit precision, on vectors and\nmatrices.\n\nIf you have a function `f` of type `proc(x: float64): float64` you can use\n\n```nim\nmakeUniversal(f)\n```\n\nto turn `f` into a (public) universal function. If you do not want to export\n`f`, there is the equivalent template `makeUniversalLocal`.\n\n#### 1.2.1.11. Rewrite rules\n\nA few rewrite rules allow to optimize a chain of linear algebra operations\ninto a single BLAS call. For instance, if you try\n\n```nim\necho v1 + 5.3 * v2\n```\n\nthis is not implemented as a scalar multiplication followed by a sum, but it\nis turned into a single function call.\n\n#### 1.2.1.12. Stacking vectors and matrices\n\nVectors can be stacked both horizontally (which gives a new vector)\n\n```nim\nlet\n  v1 = vector([1.0, 2.0])\n  v2 = vector([5.0, 7.0, 9.0])\n  v3 = vector([9.9, 8.8, 7.7, 6.6])\n\necho hstack(v1, v2, v3) #  vector([1.0, 2.0, 5.0, 7.0, 9.0, 9.9, 8.8, 7.7, 6.6])\n```\n\nor vertically (which gives a matrix having the vectors as rows)\n\n```nim\nlet\n  v1 = vector([1.0, 2.0, 3.0])\n  v2 = vector([5.0, 7.0, 9.0])\n  v3 = vector([9.9, 8.8, 7.7])\n\necho vstack(v1, v2, v3)\n# matrix(@[\n#   @[1.0, 2.0, 3.0],\n#   @[5.0, 7.0, 9.0],\n#   @[9.9, 8.8, 7.7]\n# ])\n```\n\nAlso, `concat` is an alias for `hstack`.\n\nMatrices can be stacked similarly, for instance\n\n```nim\nlet\n  m1 = matrix(@[\n    @[1.0, 2.0],\n    @[3.0, 4.0]\n  ])\n  m2 = matrix(@[\n    @[5.0, 7.0, 9.0],\n    @[6.0, 2.0, 1.0]\n  ])\n  m3 = matrix(@[\n    @[2.0, 2.0],\n    @[1.0, 3.0]\n  ])\necho hstack(m1, m2, m3)\n# m = matrix(@[\n#   @[1.0, 2.0, 5.0, 7.0, 9.0, 2.0, 2.0],\n#   @[3.0, 4.0, 6.0, 2.0, 1.0, 1.0, 3.0]\n# ])\n```\n\nTODO: stack matrices\n\n#### 1.2.1.13. Solving linear systems\n\nSome linear algebraic functions are included, currently for solving systems of\nlinear equations of the form `Ax = b`, for square matrices `A`. Functions to invert\nsquare invertible matrices are also provided. These throw floating-point errors\nin the case of non-invertible matrices.\n\nThese functions require a LAPACK implementation.\n\n```nim\nlet\n  a = randomMatrix(5, 5)\n  b = randomVector(5)\n\necho solve(a, b)\necho a \\ b # equivalent\necho a.inv()\n```\n\n#### 1.2.1.14. Computing eigenvalues and eigenvectors\n\nThese functions require a LAPACK implementation.\n\nTo be documented.\n\n### 1.2.2. Sparse linear algebra\n\nTo be documented.\n\n## 1.3. Working on the GPU\n\n### 1.3.1. Dense linear algebra\n\nIf you have a matrix or vector, you can move it on the GPU, and back\nlike this:\n\n```nim\nimport neo, neo/cuda\nlet\n  v = randomVector(12, max=1'f32)\n  vOnTheGpu = v.gpu()\n  vBackOnTheCpu = vOnTheGpu.cpu()\n```\n\nVectors and matrices on the GPU support linear-algebraic operations via cuBLAS,\nexactly like their CPU counterparts. A few operation - such as reading a single\nelement - are not supported, as it does not make much sense to copy a single\nvalue back and forth from the GPU. Usually it is advisable to move vectors\nand matrices to the GPU, make as many computations as possible there, and\nfinally move the result back to the CPU.\n\nThe following are all valid operations, assuming `v` and `w` are vectors on the\nGPU, `m` and `n` are matrices on the GPU and the dimensions are compatible:\n\n```nim\nv * 3'f32\nv + w\nv -= w\nm * v\nm - n\nm * n\n```\n\nFor more information, look at the tests in `tests/cudadense`.\n\n### 1.3.2. Sparse linear algebra\n\nTo be documented.\n\n## 1.4. Static typing for dimensions\n\nUnder `neo/statics` there exist types that encode vectors and matrices whose\ndimensions are known at compile time. They are defined as aliases of their\ndynamic counterparts:\n\n```nim\ntype\n  StaticVector*[N: static[int]; A] = distinct Vector[A]\n  StaticMatrix*[M, N: static[int]; A] = distinct Matrix[A]\n```\n\nIn this way, these types are fully interoperable with the dynamic ones.\nOne can freely convert between the two representations:\n\n```nim\nimport neo, neo/statics\n\nlet\n  u = randomVector(5) # static, of known dimension 5\n  v = u.asDynamic\n  w = v.asStatic(5)\n\nassert(u == w)\n```\n\nAll operations implemented by neo are also avaiable for static vectors and\nmatrices. The difference are that:\n\n* operations on static vectors and matrices will not compile if the dimensions\n  do not match\n* operations on static vectors and matrices will return other static vectors and\n  matrices, thereby automatically tracking dimensions.\n\nAn example of an operation that will not compile is\n\n```nim\nimport neo, neo/statics\n\nlet\n  m = statics.randomMatrix(5, 7) # static, of known dimension 5x7\n  n = statics.randomMatrix(4, 6) # static, of known dimension 4x6\n  p = statics.randomMatrix(7, 3) # static, of known dimension 7x3\n\ndiscard m * n # this will not compile\nlet x = m * p # this will infer dimension 5x3\n```\n\nBy converting back and forth between static and dynamic vectors and matrices -\nwhich can be done at no cost - one can incorporate data whose dimension is only\nknown at runtime, while at the same time having guaranteed dimension\ncompatibility whenever enough information is known at compile time.\n\nFor now, statics are only available on the CPU. It would be a nice contribution\nto extend this to GPU types.\n\n## 1.5. Design\n\n### 1.5.1. On the CPU\n\nOn the CPU, dense vectors and matrices are stored using this structure:\n\n```nim\ntype\n  MatrixShape* = enum\n    Diagonal, UpperTriangular, LowerTriangular, UpperHessenberg, LowerHessenberg, Symmetric\n  Vector*[A] = ref object\n    data*: seq[A]\n    fp*: ptr A # float pointer\n    len*, step*: int\n  Matrix*[A] = ref object\n    order*: OrderType\n    M*, N*, ld*: int # ld = leading dimension\n    fp*: ptr A # float pointer\n    data*: seq[A]\n    shape*: set[MatrixShape]\n```\n\nEach store some information on dimensions (`len` for vectors, `M` and `N` for\nmatrices) and a pointer to the beginning of the actual data `fp`.\n\nThe `order` of a matrix can be `colMajor` or `rowMajor`: the first one means\nthat the matrix is stored column by column, the second row by row.\n\nTo make it easier to share data without copying, but still keep the data\ngarbage collected, the actual data is usually allocated in a `seq`, here called\n`data`, which can be shared between matrices and their slices, or row and\ncolumn vectors. The pointer `fp` is usually a pointer somewhere inside this\nsequence, although this is not required.\n\nAll operations are expressed in terms of `fp`, so `data` is not really\nrequired. When the last reference to `data` goes away, though, the sequence\nis garbage collected and there will be no more pointers inside it. If there is\na small vector or matrix holding the last reference to a big chunk of\ndata, it may be more convenient to copy it to a fresh location and free the\ndata itself: this can be done by using the `clone()` operation.\n\nVectors are not required to be contiguous, and they have a `step` parameter\nthat determines how far apart are their elements. This is useful when\ntaking a `row` of a column major matrix or the `column` of a row major one.\n\nMatrices can also not be contiguous. When taking a minor of a column major\nmatrix, one gets a submatrix whose elements are contiguous in a column, but\nwhose column are not contiguously placed. Rather, the distance (in elements)\nbetween the start of two successive columns is the same as the parent matrix,\nand is called the leading dimension of the matrix (here stored as `ld`). A\nsimilar remark holds for row major matrices, where `ld` is the number of\nelements between the beginning of rows.\n\nThis design allows to have matrices or vectors that are not managed by the\ngarbage collector. In this case, it is enough to set `fp` manually, and\nleave `data` nil. This allows to support\n\n* matrices and vectors with data on the stack, which can be constructed\n  using the `stackVector` and `stackMatrix` constructors (and which are\n  only valid as long as the relevant data lives on the stack), and\n* matrices and vectors allocated manually on the shared heap, which can\n  be constructed using the `sharedVector` and `sharedMatrix` constructors,\n  and destructed with `dealloc`.\n\n### 1.5.2. Why fields are public\n\nNotice that all members of the types are public, but in general **it is not\nsafe** to change them if you don't know what you are doing. These fields are\nnot generally meant to be changed, and a previous version of the library\nhad them private. In general, though, a user may need access to some of\nthese fields for performance reasons, so they are exposed.\n\nAn example of this case is the `rows` (or `columns`) iterator. Neo keeps\nvector and matrix types on the heap (they are `ref` types). This prevents\naccidental copying, but has the downside that creating a new one requires\nallocation. When iterating over rows in a loop, one wants to avoid to trigger\nmany costly allocations, since the underlying data is always the same, and\nthe only thing that changes is the position of the vectors inside this\ndata array. For this reason, the iterator is implemented as follows:\n\n```nim\niterator rows*[A](m: Matrix[A]): auto {. inline .} =\n  let\n    mp = cast[CPointer[A]](m.fp)\n    step = if m.order == rowMajor: m.ld else: 1\n  var v = m.row(0)\n  yield v\n  for i in 1 ..\u003c m.M:\n    v.fp = addr(mp[i * step])\n    yield v\n```\n\nThere is a single vector which is reused at each step and the iterator\nalways yields this vector, where `fp` is changed. A user that wants - say -\nto implement a similar iteration over some minors of a matrix may need\nto perform a similar trick, and preventing to change `fp` would impede\nthis optimization.\n\n### 1.5.3. On the GPU\n\nOn the GPU side, the definitions are similar:\n\n```nim\ntype\n  CudaVector*[A] = object\n    data*: ref[ptr A]\n    fp*: ptr A\n    len, step*: int32\n  CudaMatrix*[A] = object\n    M*, N*, ld*: int32\n    data*: ref[ptr A]\n    fp*: ptr A\n    shape*: set[MatrixShape]\n```\n\nThe main difference here is that one cannot store the underlying data in\na sequence, because data is allocated on a device, and the CUDA api returns\nthe relevant pointers, over which we have no control.\n\nTo have a similar approach to the former case, the CUDA pointer to the data\nis wrapped inside a `ref`. The actual pointer used in computation is, again,\n`fp`, while `data` is shared for slices, or rows and vectors of a matrix.\n\nWhen the last reference to `data` goes away, a finalizer calls the CUDA\nroutines to clean up the allocated memory.\n\nAlso, CUDA matrices are only column major for now, although this is going\nto change in the future.\n\n## 1.6. Linking external libraries\n\n### 1.6.1. Linking BLAS and LAPACK implementations\n\nNeo requires to link some BLAS and LAPACK implementation to perform the actual\nlinear algebra operations. By default, it tries to link whatever are the default\nsystem-wide implementations.\n\nYou can link against different implementations by a combination of:\n\n* changing the path for linked libraries (use\n  [`--clibdir`](https://nim-lang.org/docs/nimc.html#compiler-usage-command-line-switches)\n  for this)\n* using the `--define:blas` flag. By default, the system tries to load a BLAS\n  library called `blas`, which translates into something called `blas.dll`\n  or `libblas.so` according to the underling operating system. To link,\n  say, the library `libopenblas.so.3` on Linux, you should pass to Nim the\n  option `--define:blas=openblas`.\n* using the `--define:lapack` flag. By default, the system tries to load a LAPACK\n  library called `lapack`, which translates into something called `lapack.dll`\n  or `liblapack.so` according to the underling operating system. To link,\n  say, the library `libopenblas.so.3` on Linux, you should pass to Nim the\n  option `--define:lapack=openblas`.\n\nSee the tasks inside [neo.nimble](https://github.com/andreaferretti/neo/blob/master/neo.nimble)\nfor a few examples.\n\nPackages for various BLAS or LAPACK implementations are available from the package\nmanagers of many Linux distributions. On OSX one can add the brew formulas\nfrom [Homebrew Science](https://github.com/Homebrew/homebrew-science), such\nas `brew install homebrew/science/openblas`.\n\nYou may also need to add suitable paths for the includes and library dirs.\nOn OSX, this should do the trick\n\n```nim\nswitch(\"clibdir\", \"/usr/local/opt/openblas/lib\")\nswitch(\"cincludes\", \"/usr/local/opt/openblas/include\")\n```\n\nIf you have problems with MKL, you may want to link it statically. Just pass\nthe options\n\n```nim\n--dynlibOverride:mkl_intel_lp64\n--passL:${PATH_TO_MKL}/libmkl_intel_lp64.a\n```\n\nto enable static linking.\n\nOn Windows, it is recommended to use [MSYS2](https://www.msys2.org/) to install\nthe mingw compiler toolchain and compatible OpenBLAS library. For 64-bit builds,\nthis would be:\n\n```\npacman -S mingw-w64-x86_64-gcc mingw-w64-x86_64-openblas\n```\n\nYou should then add `MSYS2_ROOT\\mingw64\\bin` to your PATH. Programs using nimblas\ncan then be compiled using the `-d:blas=libopenblas` switch. At runtime, `libopenblas,dll`\nshould be loaded from the mingw64 bin directory you added to your PATH, though it\nis suggested to distribute this DLL file alongside your executable if your are\npublishing binary packages.\n\n### 1.6.2. Linking CUDA\n\nIt is possible to delegate work to the GPU using CUDA. The library has been\ntested to work with NVIDIA CUDA 8.0, but it is possible that earlier\nversions will work as well. In order to compile and link against CUDA, you\nshould make the appropriate headers and libraries available. If they are not\nglobally set, you can pass suitable options to the Nim compiler, such as\n\n```\n--cincludes:\"/usr/local/cuda/include\"\n--clibdir:\"/usr/local/cuda/lib64\"\n```\n\nSupport for CUDA is under the package `neo/cuda`, that needs to be imported\nexplicitly.\n\n## 1.7. TODO\n\nSee the [issue list](https://github.com/andreaferretti/neo/issues)\n\n## 1.8. Contributing\n\nEvery contribution is very much appreciated! This can range from:\n\n* using the library and reporting any issues and any configuration on which\n  it works fine\n* building other parts of the scientific environment on top of it\n* writing blog posts and tutorials\n* helping with the documentation\n* contributing actual code (see the\n  [issue list](https://github.com/andreaferretti/neo/issues) section)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandreaferretti%2Fneo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandreaferretti%2Fneo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandreaferretti%2Fneo/lists"}