{"id":13648723,"url":"https://github.com/bitshifter/mathbench-rs","last_synced_at":"2025-04-13T05:07:17.336Z","repository":{"id":54548372,"uuid":"173564894","full_name":"bitshifter/mathbench-rs","owner":"bitshifter","description":"Comparing performance of Rust math libraries for common 3D game and graphics tasks","archived":false,"fork":false,"pushed_at":"2024-10-29T09:21:54.000Z","size":293,"stargazers_count":208,"open_issues_count":7,"forks_count":15,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-13T05:07:10.853Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bitshifter.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-03-03T11:16:57.000Z","updated_at":"2025-04-08T19:24:35.000Z","dependencies_parsed_at":"2024-10-29T10:47:46.853Z","dependency_job_id":null,"html_url":"https://github.com/bitshifter/mathbench-rs","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bitshifter%2Fmathbench-rs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bitshifter%2Fmathbench-rs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bitshifter%2Fmathbench-rs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bitshifter%2Fmathbench-rs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bitshifter","download_url":"https://codeload.github.com/bitshifter/mathbench-rs/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248665747,"owners_count":21142123,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T01:04:28.959Z","updated_at":"2025-04-13T05:07:17.304Z","avatar_url":"https://github.com/bitshifter.png","language":"Rust","readme":"# mathbench\n\n[![Build Status]][travis-ci]\n\n`mathbench` is a suite of unit tests and benchmarks comparing the output and\nperformance of a number of different Rust linear algebra libraries for common\ngame and graphics development tasks.\n\n`mathbench` is written by the author of [`glam`][glam] and has been used to\ncompare the performance of `glam` with other similar 3D math libraries targeting\ngames and graphics development, including:\n\n* [`cgmath`][cgmath]\n* [`euclid`][euclid]\n* [`nalgebra`][nalgebra]\n* [`pathfinder_geometry`][pathfinder_geometry]\n* [`static-math`][static-math]\n* [`ultraviolet`][ultraviolet]\n* [`vek`][vek]\n\n[Build Status]: https://travis-ci.org/bitshifter/mathbench-rs.svg?branch=master\n[travis-ci]: https://travis-ci.org/bitshifter/mathbench-rs\n[cgmath]: https://crates.io/crates/cgmath\n[euclid]: https://crates.io/crates/euclid\n[glam]: https://github.com/bitshifter/glam-rs\n[nalgebra]: https://nalgebra.org\n[pathfinder_geometry]: https://crates.io/crates/pathfinder_geometry\n[static-math]: https://crates.io/crates/static-math\n[ultraviolet]: https://crates.io/crates/ultraviolet\n[vek]: https://crates.io/crates/vek\n\n## The benchmarks\n\nAll benchmarks are performed using [Criterion.rs]. Benchmarks are logically into\nthe following categories:\n\n* return self - attempts to measure overhead of benchmarking each type.\n* single operations - measure the performance of single common operations on\n  types, e.g. a matrix inverse, vector normalization or multiplying two\n  matrices.\n* throughput operations - measure the performance of common operations on\n  batches of data. These measure operations that would commonly be processing\n  batches of input, for example transforming a number of vectors with the same\n  matrix.\n* workload operations - these attempt to recreate common workloads found in game\n  development to try and demonstrate performance on real world tasks.\n\nDespite best attempts, take the results of micro benchmarks with a pinch of\nsalt.\n\n[Criterion.rs]: https://bheisler.github.io/criterion.rs/book/index.html\n\n### Operation benchmarks\n\n* `matrix benches` - performs common matrix operations such as transpose,\n  inverse, determinant and multiply.\n* `rotation 3d benches` - perform common 3D rotation operations.\n* `transform 2d \u0026 3d benches` - bench special purpose 2D and 3D transform types.\n  These can be compared to 3x3 and 4x4 matrix benches to some extent.\n* `transformations benches` - performs affine transformations on vectors - uses\n  the best available type for the job, either matrix or transform types\n  depending on the library.\n* `vector benches` - perform common vector operations.\n\n### Workload benchmarks\n\n* `euler bench` - performs an Euler integration on arrays of 2D and 3D vectors\n\nThe benchmarks are currently focused on `f32` types as that is all `glam`\ncurrently supports.\n\n## Crate differences\n\nDifferent libraries have different features and different ways of achieving the\nsame goal. For the purpose of trying to get a performance comparison sometimes\n`mathbench` compares similar functionality, but sometimes it's not exactly the\nsame. Below is a list of differences between libraries that are notable for\nperformance comparisons.\n\n### Matrices versus transforms\n\nThe `euclid` library does not support generic square matrix types like the other\nlibraries tested. Rather it has 2D and 3D transform types which can transform 2D\nand 3D vector and point types. Each library has different types for supporting\ntransforms but `euclid` is unique amongst the libraries tested in that is\ndoesn't have generic square matrix types.\n\nThe `Transform2D` is stored as a 3x2 row major matrix that can be used to\ntransform 2D vectors and points.\n\nSimilarly `Transform3D` is used for transforming 3D vectors and points. This\nis represented as a 4x4 matrix so it is more directly comparable to the other\nlibraries however it doesn't support some operations like transpose.\n\nThere is no equivalent to a 2x2 matrix type in `euclid`.\n\n### Matrix inverse\n\nNote that `cgmath` and `nalgebra` matrix inverse methods return an `Option`\nwhereas `glam` and `euclid` do not. If a non-invertible matrix is inverted by\n`glam` or `euclid` the result will be invalid (it will contain NaNs).\n\n### Quaternions versus rotors\n\nMost libraries provide quaternions for performing rotations except for\n`ultraviolet` which provides rotors.\n\n## Wide benchmarks\n\nAll benchmarks are gated as either \"wide\" or \"scalar\". This division allows us\nto more fairly compare these different styles of libraries.\n\n\"scalar\" benchmarks operate on standard scalar `f32` values, doing calculations\non one piece of data at a time (or in the case of a \"horizontal\" SIMD library\nlike `glam`, one `Vec3`/`Vec4` at a time).\n\n\"wide\" benchmarks operate in a \"vertical\" AoSoA (Array-of-Struct-of-Arrays)\nfashion, which is a programming model that allows the potential to more fully\nuse the advantages of SIMD operations. However, it has the cost of making\nalgorithm design harder, as scalar algorithms cannot be directly used by \"wide\"\narchitectures. Because of this difference in algorithms, we also can't really\n*directly* compare the performance of \"scalar\" vs \"wide\" types because they\ndon't *quite* do the same thing (wide types operate on multiple pieces of data\nat the same time).\n\nThe \"wide\" benchmarks still include `glam`, a scalar-only library, as a\ncomparison. Even though the comparison is somewhat apples-to-oranges, in each of\nthese cases, when running \"wide\" benchmark variants, `glam` is configured to do\nthe exact same *amount* of final work, producing the same outputs that the\n\"wide\" versions would. The purpose is to give an idea of the possible throughput\nbenefits of \"wide\" types compared to writing the same algorithms with a scalar\ntype, at the cost of extra care being needed to write the algorithm.\n\nTo learn more about AoSoA architecture, see [this blog\npost](https://www.rustsim.org/blog/2020/03/23/simd-aosoa-in-nalgebra/) by the\nauthor of `nalgebra` which goes more in depth to how AoSoA works and its\npossible benefits. Also take a look at the [\"Examples\"\nsection](https://github.com/termhn/ultraviolet#examples) of `ultraviolet`'s\nREADME, which contains a discussion of how to port scalar algorithms to wide\nones, with the examples of the Euler integration and ray-sphere intersection\nbenchmarks from `mathbench`.\n\nNote that the `nalgebra_f32x4` and `nalgebra_f32x8` benchmarks require a Rust\n\nAdditionally the `f32x8` benchmarks will require the `AVX2` instruction set, to\nenable that you will need to build with `RUSTFLAGS='-C target-feature=+avx2`.\n\n## Build settings\n\nThe default `profile.bench` settings are used, these are documented in the\n[cargo reference].\n\nSome math libraries are optimized to use specific instruction sets and may\nbenefit building with settings different to the defaults. Typically a game team\nwill need to decided on a minimum specification that they will target. Deciding\non a minimum specifiction dictates the potential audience size for a project.\nThis is an important decision for any game and it will be different for every\nproject. `mathbench` doesn't want to make assumptions about what build settings\nany particular project may want to use which is why default settings are used.\n\nI would encourage users who to use build settigs different to the defaults to\nrun the benchmarks themselves and consider publishing their results.\n\n[cargo reference]: https://doc.rust-lang.org/cargo/reference/profiles.html#bench\n\n## Benchmark results\n\nThe following is a table of benchmarks produced by `mathbench` comparing `glam`\nperformance to `cgmath`, `nalgebra`, `euclid`, `vek`, `pathfinder_geometry`,\n`static-math` and `ultraviolet` on `f32` data.\n\nThese benchmarks were performed on an [Intel i7-4710HQ] CPU on Linux. They were\ncompiled with the `1.56.1 (59eed8a2a 2021-11-01)` Rust compiler. Lower\n(better) numbers are highlighted within a 2.5% range of the minimum for each\nrow.\n\nThe versions of the libraries tested were:\n\n* `cgmath` - `0.18.0`\n* `euclid` - `0.22.6`\n* `glam` - `0.20.1`\n* `nalgebra` - `0.29.0`\n* `pathfinder_geometry` - `0.5.1`\n* `static-math` - `0.2.3`\n* `ultraviolet` - `0.8.1`\n* `vek` - `0.15.3` (`repr_c` types)\n\nSee the full [mathbench report] for more detailed results.\n\n### Scalar benchmarks\n\nRun with the command:\n\n```sh\ncargo bench --features scalar scalar\n```\n\n| benchmark                      |          glam   |        cgmath   |      nalgebra   |       euclid   |           vek   |    pathfinder   |   static-math   |   ultraviolet   |\n|--------------------------------|-----------------|-----------------|-----------------|----------------|-----------------|-----------------|-----------------|-----------------|\n| euler 2d x10000                |      16.23 us   |      16.13 us   |    __9.954 us__ |     16.18 us   |       16.2 us   |      10.42 us   |     __9.97 us__ |      16.17 us   |\n| euler 3d x10000                |    __15.95 us__ |      32.11 us   |      32.13 us   |     32.13 us   |      32.13 us   |    __16.27 us__ |      32.16 us   |      32.11 us   |\n| matrix2 determinant            |   __2.0386 ns__ |     2.0999 ns   |     2.1018 ns   |      N/A       |     2.0997 ns   |     2.0987 ns   |     2.0962 ns   |     2.1080 ns   |\n| matrix2 inverse                |   __2.8226 ns__ |     8.4418 ns   |     7.6303 ns   |      N/A       |       N/A       |     3.3459 ns   |     9.4636 ns   |     5.8796 ns   |\n| matrix2 mul matrix2            |   __2.6036 ns__ |     5.0007 ns   |     4.8172 ns   |      N/A       |     9.3814 ns   |   __2.5516 ns__ |     4.7274 ns   |     4.9428 ns   |\n| matrix2 mul vector2 x1         |     2.4904 ns   |     2.6144 ns   |     2.8714 ns   |      N/A       |     4.2139 ns   |   __2.0839 ns__ |     2.8873 ns   |     2.6250 ns   |\n| matrix2 mul vector2 x100       |   227.5271 ns   |   243.3579 ns   |   265.1698 ns   |      N/A       |   400.6940 ns   | __219.7127 ns__ |   267.8780 ns   |   243.9880 ns   |\n| matrix2 return self            |   __2.4235 ns__ |     2.8841 ns   |     2.8756 ns   |      N/A       |     2.8754 ns   |   __2.4147 ns__ |     2.8717 ns   |     2.8697 ns   |\n| matrix2 transpose              |   __2.2887 ns__ |     3.0645 ns   |     7.9154 ns   |      N/A       |     2.9635 ns   |       N/A       |     3.0637 ns   |     3.0652 ns   |\n| matrix3 determinant            |     3.9129 ns   |   __3.8107 ns__ |   __3.8191 ns__ |      N/A       |   __3.8180 ns__ |       N/A       |   __3.8151 ns__ |     8.9368 ns   |\n| matrix3 inverse                |    17.5373 ns   |    18.6931 ns   |  __12.3183 ns__ |      N/A       |       N/A       |       N/A       |    12.8195 ns   |    21.9098 ns   |\n| matrix3 mul matrix3            |     9.9578 ns   |    13.3648 ns   |     7.8154 ns   |      N/A       |    35.5802 ns   |       N/A       |   __6.4938 ns__ |    10.0527 ns   |\n| matrix3 mul vector3 x1         |     4.8090 ns   |     4.9339 ns   |   __4.5046 ns__ |      N/A       |    12.5518 ns   |       N/A       |     4.8002 ns   |     4.8118 ns   |\n| matrix3 mul vector3 x100       |   __0.4836 us__ |   __0.4808 us__ |   __0.4755 us__ |      N/A       |      1.247 us   |       N/A       |   __0.4816 us__ |   __0.4755 us__ |\n| matrix3 return self            |   __5.4421 ns__ |   __5.4469 ns__ |   __5.4526 ns__ |      N/A       |   __5.4656 ns__ |       N/A       |   __5.4718 ns__ |   __5.4043 ns__ |\n| matrix3 transpose              |   __9.9567 ns__ |  __10.0794 ns__ |    10.9704 ns   |      N/A       |   __9.9257 ns__ |       N/A       |    10.7350 ns   |    10.5334 ns   |\n| matrix4 determinant            |   __6.2050 ns__ |    11.1041 ns   |    69.2549 ns   |   17.1809 ns   |    18.5233 ns   |       N/A       |    16.5331 ns   |     8.2704 ns   |\n| matrix4 inverse                |  __16.4386 ns__ |    47.0674 ns   |    71.8174 ns   |   64.1356 ns   |   284.3703 ns   |       N/A       |    52.6993 ns   |    41.1780 ns   |\n| matrix4 mul matrix4            |   __7.7715 ns__ |    26.7308 ns   |     8.6500 ns   |   10.4414 ns   |    86.1501 ns   |       N/A       |    21.7985 ns   |    26.8056 ns   |\n| matrix4 mul vector4 x1         |   __3.0303 ns__ |     7.7400 ns   |     3.4091 ns   |      N/A       |    21.0968 ns   |       N/A       |     6.2971 ns   |     6.2537 ns   |\n| matrix4 mul vector4 x100       |   __0.6136 us__ |     0.9676 us   |    __0.627 us__ |      N/A       |      2.167 us   |       N/A       |     0.7893 us   |     0.8013 us   |\n| matrix4 return self            |     7.1741 ns   |   __6.8838 ns__ |     7.5030 ns   |      N/A       |     7.0410 ns   |       N/A       |   __6.7768 ns__ |     6.9508 ns   |\n| matrix4 transpose              |   __6.6826 ns__ |    12.4966 ns   |    15.3265 ns   |      N/A       |    12.6386 ns   |       N/A       |    15.2657 ns   |    12.3396 ns   |\n| ray-sphere intersection x10000 |       56.2 us   |       55.7 us   |    __15.32 us__ |     55.45 us   |      56.02 us   |       N/A       |       N/A       |      50.94 us   |\n| rotation3 inverse              |   __2.3113 ns__ |     3.1752 ns   |     3.3292 ns   |    3.3311 ns   |     3.1808 ns   |       N/A       |     8.7109 ns   |     3.6535 ns   |\n| rotation3 mul rotation3        |   __3.6584 ns__ |     7.5255 ns   |     7.4808 ns   |    8.1393 ns   |    14.1636 ns   |       N/A       |     6.8044 ns   |     7.6386 ns   |\n| rotation3 mul vector3 x1       |   __6.4950 ns__ |     7.6808 ns   |     7.5784 ns   |    7.5746 ns   |    18.2547 ns   |       N/A       |     7.2727 ns   |     8.9732 ns   |\n| rotation3 mul vector3 x100     |   __0.6465 us__ |     0.7844 us   |     0.7573 us   |    0.7533 us   |      1.769 us   |       N/A       |     0.7317 us   |     0.9416 us   |\n| rotation3 return self          |   __2.4928 ns__ |     2.8740 ns   |     2.8687 ns   |      N/A       |     2.8724 ns   |       N/A       |     4.7868 ns   |     2.8722 ns   |\n| transform point2 x1            |     2.7854 ns   |     2.8878 ns   |     4.4207 ns   |    2.8667 ns   |    11.9427 ns   |   __2.3601 ns__ |       N/A       |     4.1770 ns   |\n| transform point2 x100          |     0.3316 us   |     0.3574 us   |     0.4445 us   |  __0.3008 us__ |      1.212 us   |     0.3184 us   |       N/A       |     0.4332 us   |\n| transform point3 x1            |   __2.9619 ns__ |    10.6812 ns   |     6.1037 ns   |    7.7051 ns   |    13.2607 ns   |     3.0934 ns   |       N/A       |     6.8419 ns   |\n| transform point3 x100          |   __0.6095 us__ |       1.27 us   |     0.8064 us   |    0.7674 us   |      1.446 us   |   __0.6189 us__ |       N/A       |     0.8899 us   |\n| transform vector2 x1           |   __2.4944 ns__ |       N/A       |     3.7174 ns   |    2.6273 ns   |    11.9424 ns   |       N/A       |       N/A       |     3.0458 ns   |\n| transform vector2 x100         |     0.3125 us   |       N/A       |     0.3871 us   |  __0.2817 us__ |      1.213 us   |       N/A       |       N/A       |     0.3649 us   |\n| transform vector3 x1           |   __2.8091 ns__ |     7.7343 ns   |     5.5064 ns   |    4.4810 ns   |    15.4097 ns   |       N/A       |       N/A       |     4.8819 ns   |\n| transform vector3 x100         |   __0.6035 us__ |     0.9439 us   |     0.7573 us   |    0.6327 us   |       1.63 us   |       N/A       |       N/A       |     0.6703 us   |\n| transform2 inverse             |   __9.0256 ns__ |       N/A       |    12.2614 ns   |    9.4803 ns   |       N/A       |   __8.9047 ns__ |       N/A       |       N/A       |\n| transform2 mul transform2      |     4.5111 ns   |       N/A       |     8.1434 ns   |    5.8677 ns   |       N/A       |   __3.8513 ns__ |       N/A       |       N/A       |\n| transform2 return self         |   __4.1707 ns__ |       N/A       |     5.4356 ns   |    4.2775 ns   |       N/A       |   __4.1117 ns__ |       N/A       |       N/A       |\n| transform3 inverse             |  __10.9869 ns__ |       N/A       |    71.4437 ns   |   56.0136 ns   |       N/A       |    23.0392 ns   |       N/A       |       N/A       |\n| transform3 mul transform3d     |   __6.5903 ns__ |       N/A       |     8.5673 ns   |   10.1802 ns   |       N/A       |     7.6587 ns   |       N/A       |       N/A       |\n| transform3 return self         |   __7.1828 ns__ |       N/A       |   __7.2619 ns__ |  __7.2407 ns__ |       N/A       |   __7.3214 ns__ |       N/A       |       N/A       |\n| vector3 cross                  |   __2.4257 ns__ |     3.6842 ns   |     3.7945 ns   |    3.6821 ns   |     3.8323 ns   |       N/A       |     3.8622 ns   |     3.6927 ns   |\n| vector3 dot                    |   __2.1055 ns__ |     2.3179 ns   |     2.3174 ns   |    2.3190 ns   |     2.3195 ns   |       N/A       |     2.3204 ns   |     2.3160 ns   |\n| vector3 length                 |   __2.5020 ns__ |   __2.5002 ns__ |     2.5986 ns   |  __2.5013 ns__ |   __2.5021 ns__ |       N/A       |   __2.5036 ns__ |   __2.5017 ns__ |\n| vector3 normalize              |   __4.0454 ns__ |     5.8411 ns   |     8.4069 ns   |    8.0679 ns   |     8.8137 ns   |       N/A       |       N/A       |     5.8440 ns   |\n| vector3 return self            |   __2.4087 ns__ |     3.1021 ns   |     3.1061 ns   |      N/A       |     3.1052 ns   |       N/A       |     3.1136 ns   |     3.1071 ns   |\n\n### Wide benchmarks\n\nThese benchmarks were performed on an [Intel i7-4710HQ] CPU on Linux. They were\ncompiled with the `1.59.0-nightly (207c80f10 2021-11-30)` Rust compiler. Lower\n(better) numbers are highlighted within a 2.5% range of the minimum for each\nrow.\n\nThe versions of the libraries tested were:\n\n* `glam` - `0.20.1`\n* `nalgebra` - `0.29.0`\n* `ultraviolet` - `0.8.1`\n\nRun with the command:\n\n```sh\nRUSTFLAGS='-C target-feature=+avx2' cargo +nightly bench --features wide wide\n```\n\n| benchmark                      |    glam_f32x1   |   ultraviolet_f32x4   |   nalgebra_f32x4   |   ultraviolet_f32x8   |   nalgebra_f32x8   |\n|--------------------------------|-----------------|-----------------------|--------------------|-----------------------|--------------------|\n| euler 2d x80000                |      142.7 us   |          __63.47 us__ |       __63.94 us__ |            69.27 us   |         69.25 us   |\n| euler 3d x80000                |      141.2 us   |          __97.18 us__ |       __95.78 us__ |            103.7 us   |         105.7 us   |\n| matrix2 determinant x16        |    18.6849 ns   |          11.4259 ns   |          N/A       |         __9.9982 ns__ |          N/A       |\n| matrix2 inverse x16            |    39.1219 ns   |          29.8933 ns   |          N/A       |        __22.8757 ns__ |          N/A       |\n| matrix2 mul matrix2 x16        |    42.7342 ns   |          36.4879 ns   |          N/A       |        __33.4814 ns__ |          N/A       |\n| matrix2 mul matrix2 x256       |   959.1663 ns   |         935.4148 ns   |          N/A       |       __862.0910 ns__ |          N/A       |\n| matrix2 mul vector2 x16        |    41.2464 ns   |          18.2382 ns   |          N/A       |        __17.2550 ns__ |          N/A       |\n| matrix2 mul vector2 x256       |   698.1177 ns   |       __544.5315 ns__ |          N/A       |       __540.9743 ns__ |          N/A       |\n| matrix2 return self x16        |    32.7553 ns   |          29.5064 ns   |          N/A       |        __21.4492 ns__ |          N/A       |\n| matrix2 transpose x16          |    32.3247 ns   |          46.4836 ns   |          N/A       |        __20.0852 ns__ |          N/A       |\n| matrix3 determinant x16        |    53.2366 ns   |          25.0158 ns   |          N/A       |        __22.1503 ns__ |          N/A       |\n| matrix3 inverse x16            |   275.9330 ns   |          78.3532 ns   |          N/A       |        __69.2627 ns__ |          N/A       |\n| matrix3 mul matrix3 x16        |   239.6124 ns   |       __115.2934 ns__ |          N/A       |       __116.6237 ns__ |          N/A       |\n| matrix3 mul matrix3 x256       |       3.26 us   |          __1.959 us__ |          N/A       |          __1.963 us__ |          N/A       |\n| matrix3 mul vector3 x16        |    78.4972 ns   |        __40.4734 ns__ |          N/A       |          47.0164 ns   |          N/A       |\n| matrix3 mul vector3 x256       |      1.293 us   |            __1.0 us__ |          N/A       |          __1.007 us__ |          N/A       |\n| matrix3 return self x16        |   112.4312 ns   |          78.4870 ns   |          N/A       |        __67.3272 ns__ |          N/A       |\n| matrix3 transpose x16          |   116.9654 ns   |         100.1097 ns   |          N/A       |        __67.4544 ns__ |          N/A       |\n| matrix4 determinant x16        |    98.8388 ns   |        __56.1177 ns__ |          N/A       |        __55.7623 ns__ |          N/A       |\n| matrix4 inverse x16            |   276.2637 ns   |         191.7471 ns   |          N/A       |       __163.8408 ns__ |          N/A       |\n| matrix4 mul matrix4 x16        |   230.9916 ns   |       __222.3948 ns__ |          N/A       |       __221.8563 ns__ |          N/A       |\n| matrix4 mul matrix4 x256       |      3.793 us   |          __3.545 us__ |          N/A       |             3.67 us   |          N/A       |\n| matrix4 mul vector4 x16        |    92.9485 ns   |        __87.7341 ns__ |          N/A       |          90.4404 ns   |          N/A       |\n| matrix4 mul vector4 x256       |     __1.58 us__ |          __1.542 us__ |          N/A       |            1.596 us   |          N/A       |\n| matrix4 return self x16        |   175.6153 ns   |       __158.7861 ns__ |          N/A       |         167.6639 ns   |          N/A       |\n| matrix4 transpose x16          |   184.0498 ns   |         193.5497 ns   |          N/A       |       __147.1365 ns__ |          N/A       |\n| ray-sphere intersection x80000 |      567.9 us   |            154.8 us   |          N/A       |          __61.49 us__ |          N/A       |\n| rotation3 inverse x16          |    32.7517 ns   |          32.8107 ns   |          N/A       |        __22.3662 ns__ |          N/A       |\n| rotation3 mul rotation3 x16    |    58.9408 ns   |          38.6848 ns   |          N/A       |        __34.3223 ns__ |          N/A       |\n| rotation3 mul vector3 x16      |   130.6707 ns   |          36.7861 ns   |          N/A       |        __26.1154 ns__ |          N/A       |\n| rotation3 return self x16      |    32.4345 ns   |          32.5213 ns   |          N/A       |        __21.8325 ns__ |          N/A       |\n| transform point2 x16           |    52.6534 ns   |        __31.4527 ns__ |          N/A       |          32.7317 ns   |          N/A       |\n| transform point2 x256          |   888.5654 ns   |       __831.9341 ns__ |          N/A       |       __848.0397 ns__ |          N/A       |\n| transform point3 x16           |    96.9017 ns   |        __81.6828 ns__ |          N/A       |        __82.8904 ns__ |          N/A       |\n| transform point3 x256          |      1.567 us   |          __1.398 us__ |          N/A       |           __1.43 us__ |          N/A       |\n| transform vector2 x16          |    43.7679 ns   |        __29.9349 ns__ |          N/A       |          31.8630 ns   |          N/A       |\n| transform vector2 x256         |   858.5660 ns   |       __825.0261 ns__ |          N/A       |         851.7501 ns   |          N/A       |\n| transform vector3 x16          |    96.5535 ns   |        __80.1612 ns__ |          N/A       |          85.0659 ns   |          N/A       |\n| transform vector3 x256         |      1.557 us   |          __1.394 us__ |          N/A       |            1.438 us   |          N/A       |\n| vector3 cross x16              |    42.1941 ns   |          26.6677 ns   |          N/A       |        __22.0924 ns__ |          N/A       |\n| vector3 dot x16                |    29.1805 ns   |          12.7972 ns   |          N/A       |        __12.2872 ns__ |          N/A       |\n| vector3 length x16             |    32.6014 ns   |           9.7692 ns   |          N/A       |         __9.4271 ns__ |          N/A       |\n| vector3 normalize x16          |    65.8815 ns   |          24.1661 ns   |          N/A       |        __20.3579 ns__ |          N/A       |\n| vector3 return self x16        |    32.0051 ns   |          42.9462 ns   |          N/A       |        __16.7808 ns__ |          N/A       |\n\n[Intel i7-4710HQ]: https://ark.intel.com/content/www/us/en/ark/products/78930/intel-core-i7-4710hq-processor-6m-cache-up-to-3-50-ghz.html\n[mathbench report]: https://bitshifter.github.io/mathbench/0.4.1/report/index.html\n\n## Running the benchmarks\n\nThe benchmarks use the criterion crate which works on stable Rust, they can be\nrun with:\n\n```sh\ncargo bench\n```\n\nFor the best results close other applications on the machine you are using to\nbenchmark!\n\nWhen running \"wide\" benchmarks, be sure you compile with with the appropriate\n`target-feature`s enabled, e.g. `+avx2`, for best results.\n\nThere is a script in `scripts/summary.py` to summarize the results in a nice\nfashion. It requires Python 3 and the `prettytable` Python module, then can\nbe run to generate an ASCII output.\n\n## Default and optional features\n\nAll libraries except for `glam` are optional for running benchmarks. The default\nfeatures include `cgmath`, `ultraviolet` and `nalgebra`. These can be disabled\nwith:\n\n```sh\ncargo bench --no-default-features\n```\n\nTo selectively enable a specific default feature again use:\n\n```sh\ncargo bench --no-default-features --features nalgebra\n```\n\nNote that you can filter which benchmarks to run at runtime by using\nCriterion's filtering feature. For example, to only run scalar benchmarks\nand not wide ones, use:\n\n```sh\ncargo bench \"scalar\"\n```\n\nYou can also get more granular. For example to only run wide matrix2 benchmarks,\nuse:\n\n```sh\ncargo bench --features wide \"wide matrix2\"\n```\n\nor to only run the scalar \"vec3 length\" benchmark for `glam`, use:\n\n```sh\ncargo bench \"scalar vec3 length/glam\"\n```\n\n### Crate features\n\nThere are a few extra features in addition to the direct features referring to\neach benchmarked library.\n\n* `ultraviolet_f32x4`, `ultraviolet_f32x8`, `nalgebra_f32x4`,\n  `nalgebra_f32x8` - these each enable benchmarking specific wide types from\n  each of `ultraviolet` or `nalgebra`.\n* `ultraviolet_wide`, `nalgebra_wide` - these enable benchmarking all wide\n  types from `ultraviolet` or `nalgebra` respectively.\n* `wide` - enables all \"wide\" type benchmarks\n* `all` - enables all supported libraries, including wide and scalar ones.\n* `unstable` - see next section\n\n#### `unstable` feature\n\nThe `unstable` feature requires a nightly compiler, and it allows us to tell\nrustc not to inline certain functions within hot benchmark loops. This is used\nin the ray-sphere intersection benchmark in order to simulate situations where\nthe autovectorizer would not be able to properly vectorize your code.\n\n## Running the tests\n\nThe tests can be run using:\n\n```sh\ncargo test\n```\n\n## Publishing results\n\nWhen publishing benchmark results it is important to document the details of how\nthe benchmarks were run, including:\n\n* The version of `mathbench` used\n* The versions of all libraries benched\n* The Rust version\n* The build settings used, especially when they differ from the defaults\n* The specification of the hardware that was used\n* The output of `scripts/summary.py`\n* The full Criterion output from `target/criterion`\n\n## Adding a new library\n\nThere are different steps involved for adding a unit tests and benchmarks for a\nnew library.\n\nBenchmarks require an implementation of the `mathbench::RandomVec` trait for the\ntypes you want to benchmark. If the type implements the `rand` crate\n`distribution::Distribution` trait for `Standard` then you can simply use the\n`impl_random_vec!` macro in `src/lib.rs`. Otherwise you can provide a function\nthat generates a new random value of your type pass that to `impl_random_vec!`.\n\nTo add the new libary type to a benchmark, add another `bench_function` call to\nthe `Criterion` `BenchmarkGroup`.\n\nIncrement the patch version number of `mathbench` in the `Cargo.toml`.\n\nUpdate `CHANGELOG.md`.\n\n## Build times\n\n`mathbench` also includes a tool for comparing full build times in\n`tools/buildbench`. Incremental build times are not measured as it would be non\ntrivial to create a meaningful test across different math crates.\n\nThe `buildbench` tool uses the `-Z timings` feature of the nightly build of\n`cargo`, thus you need a nightly build to run it.\n\n`buildbench` generates a `Cargo.toml` and empty `src/lib.rs` in a temporary\ndirectory for each library, recording some build time information which is\nincluded in the summary table below. The temporary directory is created every\ntime the tool is run so this is a full build from a clean state.\n\nEach library is only built once so you may wish to run `buildbench` multiple\ntimes to ensure results are consistent.\n\nBy default crates are built using the `release` profile with default features\nenabled. There are options for building the `dev` profile or without default\nfeatures, see `buildbench --help` for more information.\n\nThe columns outputted include the total build time, the self build time which is\nthe time it took to build the crate on it's own excluding dependencies, and the\nnumber of units which is the number of dependencies (this will be 2 at minimum).\n\nWhen comparing build times keep in mind that each library has different feature\nsets and that naturally larger libraries will take longer to build. For many\ncrates tested the dependencies take longer than the math crate. Also keep in\nmind if you are already building one of the dependencies in your project you\nwon't pay the build cost twice (unless it's a different version).\n\n| crate               | version | total (s) | self (s) | units |\n|:--------------------|:--------|----------:|---------:|------:|\n| cgmath              | 0.17.0  |       6.8 |      3.0 |    17 |\n| euclid              | 0.22.1  |       3.4 |      1.0 |     4 |\n| glam                | 0.9.4   |       1.1 |      0.6 |     2 |\n| nalgebra            | 0.22.0  |      24.2 |     18.0 |    24 |\n| pathfinder_geometry | 0.5.1   |       3.0 |      0.3 |     8 |\n| static-math         | 0.1.6   |       6.9 |      1.7 |    10 |\n| ultraviolet         | 0.5.1   |       2.5 |      1.3 |     4 |\n| vek                 | 0.12.0  |      34.4 |     10.1 |    16 |\n\nThese benchmarks were performed on an [Intel i7-4710HQ] CPU with 16GB RAM and a\nToshiba MQ01ABD100 HDD (SATA 3Gbps 5400RPM) on Linux.\n\n## License\n\nLicensed under either of\n\n* Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE)\n  or http://www.apache.org/licenses/LICENSE-2.0)\n* MIT license ([LICENSE-MIT](LICENSE-MIT)\n  or http://opensource.org/licenses/MIT)\n\nat your option.\n\n## Contribution\n\nContributions in any form (issues, pull requests, etc.) to this project must\nadhere to Rust's [Code of Conduct].\n\nUnless you explicitly state otherwise, any contribution intentionally submitted\nfor inclusion in the work by you, as defined in the Apache-2.0 license, shall be\ndual licensed as above, without any additional terms or conditions.\n\n[Code of Conduct]: https://www.rust-lang.org/en-US/conduct.html\n\n## Support\n\nIf you are interested in contributing or have a request or suggestion\n[create an issue] on github.\n\n[create an issue]: https://github.com/bitshifter/mathbench-rs/issues\n","funding_links":[],"categories":["Rust","Scientific Computation"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbitshifter%2Fmathbench-rs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbitshifter%2Fmathbench-rs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbitshifter%2Fmathbench-rs/lists"}