{"id":17113019,"url":"https://github.com/emmt/mayoptimize.jl","last_synced_at":"2025-04-09T20:33:47.537Z","repository":{"id":44412703,"uuid":"238525369","full_name":"emmt/MayOptimize.jl","owner":"emmt","description":"Conditionally optimize Julia code","archived":false,"fork":false,"pushed_at":"2024-05-02T17:04:50.000Z","size":1222,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-23T22:34:45.750Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Julia","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/emmt.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-05T18:58:17.000Z","updated_at":"2024-05-02T17:04:53.000Z","dependencies_parsed_at":"2024-05-02T18:37:29.789Z","dependency_job_id":null,"html_url":"https://github.com/emmt/MayOptimize.jl","commit_stats":{"total_commits":63,"total_committers":1,"mean_commits":63.0,"dds":0.0,"last_synced_commit":"b5248566f2b3c600da5ad115c5a457668f3d60d5"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emmt%2FMayOptimize.jl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emmt%2FMayOptimize.jl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emmt%2FMayOptimize.jl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emmt%2FMayOptimize.jl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/emmt","download_url":"https://codeload.github.com/emmt/MayOptimize.jl/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248107879,"owners_count":21049026,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-14T17:02:20.148Z","updated_at":"2025-04-09T20:33:47.512Z","avatar_url":"https://github.com/emmt.png","language":"Julia","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Conditionally optimize Julia code\n\n[![License](http://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat)](./LICENSE.md)\n[![Build Status](https://github.com/emmt/MayOptimize.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/emmt/MayOptimize.jl/actions/workflows/CI.yml?query=branch%3Amaster)\n[![Build Status](https://ci.appveyor.com/api/projects/status/github/emmt/MayOptimize.jl?branch=master)](https://ci.appveyor.com/project/emmt/MayOptimize-jl/branch/master)\n[![Coverage](https://codecov.io/gh/emmt/MayOptimize.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/emmt/MayOptimize.jl)\n\nWhen writing high performance [Julia][julia-url] code, you may want to keep a\nreference code that perform bound checking, another version that assumes valid\nindices (and thus avoid bound checking) and perhaps a more heavily optimized\nversion that requires loop vectorization.  The [MayOptimize][repository-url]\npackage let you have the 3 variants available with a *single* version of the\ncode.\n\n\n## Documentation\n\nThe usage of [MayOptimize][repository-url] is summarized in the following short\nexample:\n\n```julia\nusing MayOptimize\n\nfunction foo!(::Type{P}, x::AbstractArray{T}) where {T\u003c:Real, P\u003c:OptimLevel}\n    s = zero(T)\n    # Loop 1: compute the sum of values.\n    @maybe_inbounds P for i in eachindex(x)\n        s += x[i]\n    end\n    # Loop 2: fill with sum of values.\n    @maybe_vectorized P for i in eachindex(x)\n        x[i] += s\n    end\n    return x, s\nend\n```\n\nNote that the two above loops are preceded by the macros `@maybe_inbounds` and\n`@maybe_vectorized` which both take 2 arguments: a parameter `P` and an\nexpression or a block of code (the 2nd argument must be a simple `for` loop for\nthe `@maybe_vectorized` macro).\n\nHow is compiled the expression or the block of code is determined by the\ntype parameter `P`:\n\n- `P \u003c: Debug` for debugging or reference code that performs bound checking and\n  no vectorization.\n\n- `P \u003c: InBounds` for code that assumes valid indices and thus avoids bound\n  checking.\n\n- `P \u003c: Vectorize` for code that assumes valid indices and requires\n  vectorization.\n\nA block of code provided to the `@maybe_inbounds` macro will be compiled with\nbound checking (and thus no vectorization) if `P \u003c: Debug` and without bound\nchecking (as if `@inbounds` was specified) if `P \u003c: InBounds`.  Since\n`Vectorize \u003c: InBounds`, specifying `Vectorize` in `@maybe_inbounds` also avoid\nbound checking.\n\nA block of code provided to the `@maybe_vectorized` macro will be compiled with\nbound checking and no vectorization if `P \u003c: Debug`, with no bound checking if\n`P \u003c: InBounds` (as if `@inbounds` was specified) and with no bound checking\nand vectorization if `P \u003c: Vectorize` (as if both `@inbounds` and `@simd` were\nspecified).\n\nHence which version of `foo!` is called is decided by Julia method dispatcher\naccording to the abstract types `Debug`, `InBounds` or `Vectorize` exported by\n`MayOptimize`.  Calling:\n\n```julia\nfoo!(Debug, x)\n```\n\nexecutes a version that checks bounds and does no vectorization, while calling:\n\n```julia\nfoo!(InBounds, x)\n```\n\nexecutes a version that avoids bound checking (in the 2 loops) and finally\ncalling:\n\n```julia\nfoo!(Vectorize, x)\n```\nexecutes a version that avoids bound checking (in the 2 loops) and vectorizes\nthe second loop.\n\nIt is easy to provide a default version so that other users need not have to\nbother choosing which version to use.  For instance, assuming that you have\nchecked that your code has no issues with indexing but that vectorization makes\nalmost no difference, you may write:\n\n```julia\nfoo!(x::AbstractArray{T}) where {T\u003c:Real} = foo!(InBounds, x)\n```\n\nand decide later to change the default optimization level.\n\n\n## Installation\n\nIn Julia, hit the `]` key to switch to the package manager REPL (you should get\na `... pkg\u003e` prompt) and type:\n\n```julia\nadd MayOptimize\n```\n\nNo other packages are needed.\n\n\n## Examples\n\n### Left divison by a triangular matrix\n\n`MayOptimize` extends a few base linear algebra methods such as the `ldiv!`\nmethod to perform the left division of a vector `b` by a matrix `A` and can be\ncalled as:\n\n```julia\nusing MayOptimize, LinearAlgebra\nldiv!(opt, A, b)\nldiv!(opt, y, A, b)\n```\n\nIn the first case, the operation is done in-place and `b` is overwritten with\n`A\\b`, in the second case, `A\\b` is stored in `y`.  Argument `opt` can be\n`MayOptimize.Standard` to use Julia standard method (probably BLAS), `Debug`,\n`InBounds`, or `Vectorize` to compile Julia code in `MayOptimize` with\ndifferent optimization settings.  The following figures (obtained with Julia\n1.6.3 on an AMD Ryzen Threadripper 2950X 16-Core processor) show how efficient\ncan be Julia code when compiled with well chosen optimization settings (note\nthe 1.7 gain compared to the standard implementation when `@simd` is used in\nthe innermost loop level).  Having a look at [`src/linalg.jl`](src/linalg.jl),\nyou can realize that the code is identical for the `Debug`, `InBounds` or\n`Vectorize` settings (only the `opt` argument changes) and that this code turns\nout to be pretty straightforward.\n\n![Left division by a lower triangular matrix](figs/ldiv-L-median.png \"\")\n\n![Left division by the transpose of a lower triangular matrix](figs/ldiv-Lt-median.png \"\")\n\n![Left division by an upper triangular matrix](figs/ldiv-R-median.png \"\")\n\n![Left division by the transpose of an upper triangular matrix](figs/ldiv-Rt-median.png \"\")\n\n\n### Cholesky decomposition\n\n`MayOptimize` also extends the `cholesky` and `cholesky!` methods to perform\nthe Cholesky decomposition (without pivoting) of an Hermitian matrix `A` by\nregular Julia code and with optimization level `opt`:\n\n```julia\nusing MayOptimize, LinearAlgebra\ncholesky!(opt, A)\nB = cholesky(opt, A)\n```\n\nIn the first case, the decomposition is done in-place and the uopper or lower\ntriangular part of `A` is overwritten with one factor of its Cholesky\ndecomposition which is returned.  In the second case, `A` is left unchanged.\nApart from the `opt` argument (which also avoids *type-piracy*) and rounding\nerrors, the result is the same as with the standard method provided by\n`LinearAlgebra` and which calls BLAS.  As illustrated below, the Julia code may\nbe much faster than BLAS for matrices of size smaller or equal 200×200 in spite\nof the fact that BLAS may run on several threads whereas the optimized Julia\ncode is executed on a single thread.\n\nThe `opt` argument specifies the optimization level (`Debug`, `InBounds`, or\n`Vectorize`) and/or the algorithm used for the decomposition\n(`CholeskyBanachiewiczLowerI`, `CholeskyBanachiewiczLowerII`,\n`CholeskyBanachiewiczUpper`, `CholeskyCroutLower`, `CholeskyCroutUpperI`, or\n`CholeskyCroutUpperII`).  For instance, choose `op` to be\n`CholeskyBanachiewiczLower(Vectorize)` to compute the `L'⋅L` Cholesky\nfactorization with `L` lower triangular by the Cholesky-Banachiewicz (row-size)\nalgorithm with loop vectorization.  If only an algorithm is specified without\noptimization level, the best optimization level for this algorithm is used.\nConversely, if only the optimization level is specified, the fastest algorithm\nis used.  However these default choices are optimal for a testing machine\nwhich may be different than yours.\n\nThe following figures (obtained with Julia 1.6.3 with `Float32` values on an\nAMD Ryzen Threadripper 2950X 16-Core processor) show how efficient can be Julia\ncode when compiled with well chosen optimization settings (note the 200% gain\ncompared to the BLAS implementation when `@simd` is used in the innermost loop\nlevels for 100×100 matrices).\n\n![Cholesky decomposition with no optimization](figs/cholesky-debug-median.png \"\")\n\n![Cholesky decomposition with in-bounds optimization](figs/cholesky-inbounds-median.png \"\")\n\n![Cholesky decomposition with SIMD vectorization](figs/cholesky-vectorize-median.png \"\")\n\n\n[repository-url]:  https://github.com/emmt/MayOptimize.jl\n\n[julia-url]: https://julialang.org/\n[julia-pkgs-url]: https://pkg.julialang.org/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Femmt%2Fmayoptimize.jl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Femmt%2Fmayoptimize.jl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Femmt%2Fmayoptimize.jl/lists"}