https://github.com/bitshifter/mathbench-rs

Comparing performance of Rust math libraries for common 3D game and graphics tasks
https://github.com/bitshifter/mathbench-rs
Last synced: 11 months ago
JSON representation
Comparing performance of Rust math libraries for common 3D game and graphics tasks
Host: GitHub
URL: https://github.com/bitshifter/mathbench-rs
Owner: bitshifter
License: other
Created: 2019-03-03T11:16:57.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2024-10-29T09:21:54.000Z (over 1 year ago)
Last Synced: 2025-04-13T05:07:10.853Z (11 months ago)
Language: Rust
Size: 286 KB
Stars: 208
Watchers: 4
Forks: 15
Open Issues: 7
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE-APACHE
Awesome Lists containing this project

awesome-rust-list - mathbench - rs?style=social"/> : mathbench is a suite of unit tests and benchmarks comparing the output and performance of a number of different Rust linear algebra libraries for common game and graphics development tasks. (Scientific Computation)
AwesomeCppGameDev - mathbench-rs
README

          # mathbench

[![Build Status]][travis-ci]

`mathbench` is a suite of unit tests and benchmarks comparing the output and

performance of a number of different Rust linear algebra libraries for common

game and graphics development tasks.

`mathbench` is written by the author of [`glam`][glam] and has been used to

compare the performance of `glam` with other similar 3D math libraries targeting

games and graphics development, including:

* [`cgmath`][cgmath]

* [`euclid`][euclid]

* [`nalgebra`][nalgebra]

* [`pathfinder_geometry`][pathfinder_geometry]

* [`static-math`][static-math]

* [`ultraviolet`][ultraviolet]

* [`vek`][vek]

[Build Status]: https://travis-ci.org/bitshifter/mathbench-rs.svg?branch=master

[travis-ci]: https://travis-ci.org/bitshifter/mathbench-rs

[cgmath]: https://crates.io/crates/cgmath

[euclid]: https://crates.io/crates/euclid

[glam]: https://github.com/bitshifter/glam-rs

[nalgebra]: https://nalgebra.org

[pathfinder_geometry]: https://crates.io/crates/pathfinder_geometry

[static-math]: https://crates.io/crates/static-math

[ultraviolet]: https://crates.io/crates/ultraviolet

[vek]: https://crates.io/crates/vek

## The benchmarks

All benchmarks are performed using [Criterion.rs]. Benchmarks are logically into

the following categories:

* return self - attempts to measure overhead of benchmarking each type.

* single operations - measure the performance of single common operations on

  types, e.g. a matrix inverse, vector normalization or multiplying two

  matrices.

* throughput operations - measure the performance of common operations on

  batches of data. These measure operations that would commonly be processing

  batches of input, for example transforming a number of vectors with the same

  matrix.

* workload operations - these attempt to recreate common workloads found in game

  development to try and demonstrate performance on real world tasks.

Despite best attempts, take the results of micro benchmarks with a pinch of

salt.

[Criterion.rs]: https://bheisler.github.io/criterion.rs/book/index.html

### Operation benchmarks

* `matrix benches` - performs common matrix operations such as transpose,

  inverse, determinant and multiply.

* `rotation 3d benches` - perform common 3D rotation operations.

* `transform 2d & 3d benches` - bench special purpose 2D and 3D transform types.

  These can be compared to 3x3 and 4x4 matrix benches to some extent.

* `transformations benches` - performs affine transformations on vectors - uses

  the best available type for the job, either matrix or transform types

  depending on the library.

* `vector benches` - perform common vector operations.

### Workload benchmarks

* `euler bench` - performs an Euler integration on arrays of 2D and 3D vectors

The benchmarks are currently focused on `f32` types as that is all `glam`

currently supports.

## Crate differences

Different libraries have different features and different ways of achieving the

same goal. For the purpose of trying to get a performance comparison sometimes

`mathbench` compares similar functionality, but sometimes it's not exactly the

same. Below is a list of differences between libraries that are notable for

performance comparisons.

### Matrices versus transforms

The `euclid` library does not support generic square matrix types like the other

libraries tested. Rather it has 2D and 3D transform types which can transform 2D

and 3D vector and point types. Each library has different types for supporting

transforms but `euclid` is unique amongst the libraries tested in that is

doesn't have generic square matrix types.

The `Transform2D` is stored as a 3x2 row major matrix that can be used to

transform 2D vectors and points.

Similarly `Transform3D` is used for transforming 3D vectors and points. This

is represented as a 4x4 matrix so it is more directly comparable to the other

libraries however it doesn't support some operations like transpose.

There is no equivalent to a 2x2 matrix type in `euclid`.

### Matrix inverse

Note that `cgmath` and `nalgebra` matrix inverse methods return an `Option`

whereas `glam` and `euclid` do not. If a non-invertible matrix is inverted by

`glam` or `euclid` the result will be invalid (it will contain NaNs).

### Quaternions versus rotors

Most libraries provide quaternions for performing rotations except for

`ultraviolet` which provides rotors.

## Wide benchmarks

All benchmarks are gated as either "wide" or "scalar". This division allows us

to more fairly compare these different styles of libraries.

"scalar" benchmarks operate on standard scalar `f32` values, doing calculations

on one piece of data at a time (or in the case of a "horizontal" SIMD library

like `glam`, one `Vec3`/`Vec4` at a time).

"wide" benchmarks operate in a "vertical" AoSoA (Array-of-Struct-of-Arrays)

fashion, which is a programming model that allows the potential to more fully

use the advantages of SIMD operations. However, it has the cost of making

algorithm design harder, as scalar algorithms cannot be directly used by "wide"

architectures. Because of this difference in algorithms, we also can't really

*directly* compare the performance of "scalar" vs "wide" types because they

don't *quite* do the same thing (wide types operate on multiple pieces of data

at the same time).

The "wide" benchmarks still include `glam`, a scalar-only library, as a

comparison. Even though the comparison is somewhat apples-to-oranges, in each of

these cases, when running "wide" benchmark variants, `glam` is configured to do

the exact same *amount* of final work, producing the same outputs that the

"wide" versions would. The purpose is to give an idea of the possible throughput

benefits of "wide" types compared to writing the same algorithms with a scalar

type, at the cost of extra care being needed to write the algorithm.

To learn more about AoSoA architecture, see [this blog

post](https://www.rustsim.org/blog/2020/03/23/simd-aosoa-in-nalgebra/) by the

author of `nalgebra` which goes more in depth to how AoSoA works and its

possible benefits. Also take a look at the ["Examples"

section](https://github.com/termhn/ultraviolet#examples) of `ultraviolet`'s

README, which contains a discussion of how to port scalar algorithms to wide

ones, with the examples of the Euler integration and ray-sphere intersection

benchmarks from `mathbench`.

Note that the `nalgebra_f32x4` and `nalgebra_f32x8` benchmarks require a Rust

Additionally the `f32x8` benchmarks will require the `AVX2` instruction set, to

enable that you will need to build with `RUSTFLAGS='-C target-feature=+avx2`.

## Build settings

The default `profile.bench` settings are used, these are documented in the

[cargo reference].

Some math libraries are optimized to use specific instruction sets and may

benefit building with settings different to the defaults. Typically a game team

will need to decided on a minimum specification that they will target. Deciding

on a minimum specifiction dictates the potential audience size for a project.

This is an important decision for any game and it will be different for every

project. `mathbench` doesn't want to make assumptions about what build settings

any particular project may want to use which is why default settings are used.

I would encourage users who to use build settigs different to the defaults to

run the benchmarks themselves and consider publishing their results.

[cargo reference]: https://doc.rust-lang.org/cargo/reference/profiles.html#bench

## Benchmark results

The following is a table of benchmarks produced by `mathbench` comparing `glam`

performance to `cgmath`, `nalgebra`, `euclid`, `vek`, `pathfinder_geometry`,

`static-math` and `ultraviolet` on `f32` data.

These benchmarks were performed on an [Intel i7-4710HQ] CPU on Linux. They were

compiled with the `1.56.1 (59eed8a2a 2021-11-01)` Rust compiler. Lower

(better) numbers are highlighted within a 2.5% range of the minimum for each

row.

The versions of the libraries tested were:

* `cgmath` - `0.18.0`

* `euclid` - `0.22.6`

* `glam` - `0.20.1`

* `nalgebra` - `0.29.0`

* `pathfinder_geometry` - `0.5.1`

* `static-math` - `0.2.3`

* `ultraviolet` - `0.8.1`

* `vek` - `0.15.3` (`repr_c` types)

See the full [mathbench report] for more detailed results.

### Scalar benchmarks

Run with the command:

```sh

cargo bench --features scalar scalar

```

| benchmark 
|-------------------- 
| euler 2d x10000 
| euler 3d x10000 
| matrix2 determinant 
| matrix2 inverse 
| matrix2 mul matrix2 
| matrix2 mul vector2 x1 
| matrix2 mul vector2 x100 
| matrix2 return self 
| matrix2 transpose 
| matrix3 determinant 
| matrix3 inverse 
| matrix3 mul matrix3 
| matrix3 mul vector3 x1 
| matrix3 mul vector3 x100 
| matrix3 return self 
| matrix3 transpose 
| matrix4 determinant 
| matrix4 inverse 
| matrix4 mul matrix4 
| matrix4 mul vector4 x1 
| matrix4 mul vector4 x100 
| matrix4 return self 
| matrix4 transpose 
| ray-sphere 
| rotation3 inverse 
| rotation3 mul rotation3 
| rotation3 mul vector3 x1 
| rotation3 mul vector3 x100 
| rotation3 return self 
| transform point2 x1 
| transform point2 x100 
| transform point3 x1 
| transform point3 x100 
| transform vector2 x1 
| transform vector2 x100 
| transform vector3 x1 
| transform vector3 x100 
| transform2 inverse 
| transform2 mul transform2 
| transform2 return self 
| transform3 inverse 
| transform3 mul transform3d 
| transform3 return self 
| vector3 cross 
| vector3 dot 
| vector3 length 
| vector3 normalize 
| vector3 return self

|          glam   |        cgmath   |      nalgebra   |       euclid   |           vek   |    pathfinder   |   static-math   |   ultraviolet   | ------------|-----------------|-----------------|-----------------|----------------|-----------------|-----------------|-----------------|-----------------| |      16.23 us   |      16.13 us   |    __9.954 us__ |     16.18 us   |       16.2 us   |      10.42 us   |     __9.97 us__ |      16.17 us   | |    __15.95 us__ |      32.11 us   |      32.13 us   |     32.13 us   |      32.13 us   |    __16.27 us__ |      32.16 us   |      32.11 us   | |   __2.0386 ns__ |     2.0999 ns   |     2.1018 ns   |      N/A       |     2.0997 ns   |     2.0987 ns   |     2.0962 ns   |     2.1080 ns   | |   __2.8226 ns__ |     8.4418 ns   |     7.6303 ns   |      N/A       |       N/A       |     3.3459 ns   |     9.4636 ns   |     5.8796 ns   | |   __2.6036 ns__ |     5.0007 ns   |     4.8172 ns   |      N/A       |     9.3814 ns   |   __2.5516 ns__ |     4.7274 ns   |     4.9428 ns   | |     2.4904 ns   |     2.6144 ns   |     2.8714 ns   |      N/A       |     4.2139 ns   |   __2.0839 ns__ |     2.8873 ns   |     2.6250 ns   | |   227.5271 ns   |   243.3579 ns   |   265.1698 ns   |      N/A       |   400.6940 ns   | __219.7127 ns__ |   267.8780 ns   |   243.9880 ns   | |   __2.4235 ns__ |     2.8841 ns   |     2.8756 ns   |      N/A       |     2.8754 ns   |   __2.4147 ns__ |     2.8717 ns   |     2.8697 ns   | |   __2.2887 ns__ |     3.0645 ns   |     7.9154 ns   |      N/A       |     2.9635 ns   |       N/A       |     3.0637 ns   |     3.0652 ns   | |     3.9129 ns   |   __3.8107 ns__ |   __3.8191 ns__ |      N/A       |   __3.8180 ns__ |       N/A       |   __3.8151 ns__ |     8.9368 ns   | |    17.5373 ns   |    18.6931 ns   |  __12.3183 ns__ |      N/A       |       N/A       |       N/A       |    12.8195 ns   |    21.9098 ns   | |     9.9578 ns   |    13.3648 ns   |     7.8154 ns   |      N/A       |    35.5802 ns   |       N/A       |   __6.4938 ns__ |    10.0527 ns   | |     4.8090 ns   |     4.9339 ns   |   __4.5046 ns__ |      N/A       |    12.5518 ns   |       N/A       |     4.8002 ns   |     4.8118 ns   | |   __0.4836 us__ |   __0.4808 us__ |   __0.4755 us__ |      N/A       |      1.247 us   |       N/A       |   __0.4816 us__ |   __0.4755 us__ | |   __5.4421 ns__ |   __5.4469 ns__ |   __5.4526 ns__ |      N/A       |   __5.4656 ns__ |       N/A       |   __5.4718 ns__ |   __5.4043 ns__ | |   __9.9567 ns__ |  __10.0794 ns__ |    10.9704 ns   |      N/A       |   __9.9257 ns__ |       N/A       |    10.7350 ns   |    10.5334 ns   | |   __6.2050 ns__ |    11.1041 ns   |    69.2549 ns   |   17.1809 ns   |    18.5233 ns   |       N/A       |    16.5331 ns   |     8.2704 ns   | |  __16.4386 ns__ |    47.0674 ns   |    71.8174 ns   |   64.1356 ns   |   284.3703 ns   |       N/A       |    52.6993 ns   |    41.1780 ns   | |   __7.7715 ns__ |    26.7308 ns   |     8.6500 ns   |   10.4414 ns   |    86.1501 ns   |       N/A       |    21.7985 ns   |    26.8056 ns   | |   __3.0303 ns__ |     7.7400 ns   |     3.4091 ns   |      N/A       |    21.0968 ns   |       N/A       |     6.2971 ns   |     6.2537 ns   | |   __0.6136 us__ |     0.9676 us   |    __0.627 us__ |      N/A       |      2.167 us   |       N/A       |     0.7893 us   |     0.8013 us   | |     7.1741 ns   |   __6.8838 ns__ |     7.5030 ns   |      N/A       |     7.0410 ns   |       N/A       |   __6.7768 ns__ |     6.9508 ns   | |   __6.6826 ns__ |    12.4966 ns   |    15.3265 ns   |      N/A       |    12.6386 ns   |       N/A       |    15.2657 ns   |    12.3396 ns   | intersection x10000 |       56.2 us   |       55.7 us   |    __15.32 us__ |     55.45 us   |      56.02 us   |       N/A       |       N/A       |      50.94 us   | |   __2.3113 ns__ |     3.1752 ns   |     3.3292 ns   |    3.3311 ns   |     3.1808 ns   |       N/A       |     8.7109 ns   |     3.6535 ns   | |   __3.6584 ns__ |     7.5255 ns   |     7.4808 ns   |    8.1393 ns   |    14.1636 ns   |       N/A       |     6.8044 ns   |     7.6386 ns   | |   __6.4950 ns__ |     7.6808 ns   |     7.5784 ns   |    7.5746 ns   |    18.2547 ns   |       N/A       |     7.2727 ns   |     8.9732 ns   | |   __0.6465 us__ |     0.7844 us   |     0.7573 us   |    0.7533 us   |      1.769 us   |       N/A       |     0.7317 us   |     0.9416 us   | |   __2.4928 ns__ |     2.8740 ns   |     2.8687 ns   |      N/A       |     2.8724 ns   |       N/A       |     4.7868 ns   |     2.8722 ns   | |     2.7854 ns   |     2.8878 ns   |     4.4207 ns   |    2.8667 ns   |    11.9427 ns   |   __2.3601 ns__ |       N/A       |     4.1770 ns   | |     0.3316 us   |     0.3574 us   |     0.4445 us   |  __0.3008 us__ |      1.212 us   |     0.3184 us   |       N/A       |     0.4332 us   | |   __2.9619 ns__ |    10.6812 ns   |     6.1037 ns   |    7.7051 ns   |    13.2607 ns   |     3.0934 ns   |       N/A       |     6.8419 ns   | |   __0.6095 us__ |       1.27 us   |     0.8064 us   |    0.7674 us   |      1.446 us   |   __0.6189 us__ |       N/A       |     0.8899 us   | |   __2.4944 ns__ |       N/A       |     3.7174 ns   |    2.6273 ns   |    11.9424 ns   |       N/A       |       N/A       |     3.0458 ns   | |     0.3125 us   |       N/A       |     0.3871 us   |  __0.2817 us__ |      1.213 us   |       N/A       |       N/A       |     0.3649 us   | |   __2.8091 ns__ |     7.7343 ns   |     5.5064 ns   |    4.4810 ns   |    15.4097 ns   |       N/A       |       N/A       |     4.8819 ns   | |   __0.6035 us__ |     0.9439 us   |     0.7573 us   |    0.6327 us   |       1.63 us   |       N/A       |       N/A       |     0.6703 us   | |   __9.0256 ns__ |       N/A       |    12.2614 ns   |    9.4803 ns   |       N/A       |   __8.9047 ns__ |       N/A       |       N/A       | |     4.5111 ns   |       N/A       |     8.1434 ns   |    5.8677 ns   |       N/A       |   __3.8513 ns__ |       N/A       |       N/A       | |   __4.1707 ns__ |       N/A       |     5.4356 ns   |    4.2775 ns   |       N/A       |   __4.1117 ns__ |       N/A       |       N/A       | |  __10.9869 ns__ |       N/A       |    71.4437 ns   |   56.0136 ns   |       N/A       |    23.0392 ns   |       N/A       |       N/A       | |   __6.5903 ns__ |       N/A       |     8.5673 ns   |   10.1802 ns   |       N/A       |     7.6587 ns   |       N/A       |       N/A       | |   __7.1828 ns__ |       N/A       |   __7.2619 ns__ |  __7.2407 ns__ |       N/A       |   __7.3214 ns__ |       N/A       |       N/A       | |   __2.4257 ns__ |     3.6842 ns   |     3.7945 ns   |    3.6821 ns   |     3.8323 ns   |       N/A       |     3.8622 ns   |     3.6927 ns   | |   __2.1055 ns__ |     2.3179 ns   |     2.3174 ns   |    2.3190 ns   |     2.3195 ns   |       N/A       |     2.3204 ns   |     2.3160 ns   | |   __2.5020 ns__ |   __2.5002 ns__ |     2.5986 ns   |  __2.5013 ns__ |   __2.5021 ns__ |       N/A       |   __2.5036 ns__ |   __2.5017 ns__ | |   __4.0454 ns__ |     5.8411 ns   |     8.4069 ns   |    8.0679 ns   |     8.8137 ns   |       N/A       |       N/A       |     5.8440 ns   | |   __2.4087 ns__ |     3.1021 ns   |     3.1061 ns   |      N/A       |     3.1052 ns   |       N/A       |     3.1136 ns   |     3.1071 ns   |

### Wide benchmarks

These benchmarks were performed on an [Intel i7-4710HQ] CPU on Linux. They were

compiled with the `1.59.0-nightly (207c80f10 2021-11-30)` Rust compiler. Lower

(better) numbers are highlighted within a 2.5% range of the minimum for each

row.

The versions of the libraries tested were:

* `glam` - `0.20.1`

* `nalgebra` - `0.29.0`

* `ultraviolet` - `0.8.1`

Run with the command:

```sh

RUSTFLAGS='-C target-feature=+avx2' cargo +nightly bench --features wide wide

```

| benchmark                      |    glam_f32x1   |   ultraviolet_f32x4   |   nalgebra_f32x4   |   ultraviolet_f32x8   |   nalgebra_f32x8   |

|--------------------------------|-----------------|-----------------------|--------------------|-----------------------|--------------------|

| euler 2d x80000                |      142.7 us   |          __63.47 us__ |       __63.94 us__ |            69.27 us   |         69.25 us   |

| euler 3d x80000                |      141.2 us   |          __97.18 us__ |       __95.78 us__ |            103.7 us   |         105.7 us   |

| matrix2 determinant x16        |    18.6849 ns   |          11.4259 ns   |          N/A       |         __9.9982 ns__ |          N/A       |

| matrix2 inverse x16            |    39.1219 ns   |          29.8933 ns   |          N/A       |        __22.8757 ns__ |          N/A       |

| matrix2 mul matrix2 x16        |    42.7342 ns   |          36.4879 ns   |          N/A       |        __33.4814 ns__ |          N/A       |

| matrix2 mul matrix2 x256       |   959.1663 ns   |         935.4148 ns   |          N/A       |       __862.0910 ns__ |          N/A       |

| matrix2 mul vector2 x16        |    41.2464 ns   |          18.2382 ns   |          N/A       |        __17.2550 ns__ |          N/A       |

| matrix2 mul vector2 x256       |   698.1177 ns   |       __544.5315 ns__ |          N/A       |       __540.9743 ns__ |          N/A       |

| matrix2 return self x16        |    32.7553 ns   |          29.5064 ns   |          N/A       |        __21.4492 ns__ |          N/A       |

| matrix2 transpose x16          |    32.3247 ns   |          46.4836 ns   |          N/A       |        __20.0852 ns__ |          N/A       |

| matrix3 determinant x16        |    53.2366 ns   |          25.0158 ns   |          N/A       |        __22.1503 ns__ |          N/A       |

| matrix3 inverse x16            |   275.9330 ns   |          78.3532 ns   |          N/A       |        __69.2627 ns__ |          N/A       |

| matrix3 mul matrix3 x16        |   239.6124 ns   |       __115.2934 ns__ |          N/A       |       __116.6237 ns__ |          N/A       |

| matrix3 mul matrix3 x256       |       3.26 us   |          __1.959 us__ |          N/A       |          __1.963 us__ |          N/A       |

| matrix3 mul vector3 x16        |    78.4972 ns   |        __40.4734 ns__ |          N/A       |          47.0164 ns   |          N/A       |

| matrix3 mul vector3 x256       |      1.293 us   |            __1.0 us__ |          N/A       |          __1.007 us__ |          N/A       |

| matrix3 return self x16        |   112.4312 ns   |          78.4870 ns   |          N/A       |        __67.3272 ns__ |          N/A       |

| matrix3 transpose x16          |   116.9654 ns   |         100.1097 ns   |          N/A       |        __67.4544 ns__ |          N/A       |

| matrix4 determinant x16        |    98.8388 ns   |        __56.1177 ns__ |          N/A       |        __55.7623 ns__ |          N/A       |

| matrix4 inverse x16            |   276.2637 ns   |         191.7471 ns   |          N/A       |       __163.8408 ns__ |          N/A       |

| matrix4 mul matrix4 x16        |   230.9916 ns   |       __222.3948 ns__ |          N/A       |       __221.8563 ns__ |          N/A       |

| matrix4 mul matrix4 x256       |      3.793 us   |          __3.545 us__ |          N/A       |             3.67 us   |          N/A       |

| matrix4 mul vector4 x16        |    92.9485 ns   |        __87.7341 ns__ |          N/A       |          90.4404 ns   |          N/A       |

| matrix4 mul vector4 x256       |     __1.58 us__ |          __1.542 us__ |          N/A       |            1.596 us   |          N/A       |

| matrix4 return self x16        |   175.6153 ns   |       __158.7861 ns__ |          N/A       |         167.6639 ns   |          N/A       |

| matrix4 transpose x16          |   184.0498 ns   |         193.5497 ns   |          N/A       |       __147.1365 ns__ |          N/A       |

| ray-sphere intersection x80000 |      567.9 us   |            154.8 us   |          N/A       |          __61.49 us__ |          N/A       |

| rotation3 inverse x16          |    32.7517 ns   |          32.8107 ns   |          N/A       |        __22.3662 ns__ |          N/A       |

| rotation3 mul rotation3 x16    |    58.9408 ns   |          38.6848 ns   |          N/A       |        __34.3223 ns__ |          N/A       |

| rotation3 mul vector3 x16      |   130.6707 ns   |          36.7861 ns   |          N/A       |        __26.1154 ns__ |          N/A       |

| rotation3 return self x16      |    32.4345 ns   |          32.5213 ns   |          N/A       |        __21.8325 ns__ |          N/A       |

| transform point2 x16           |    52.6534 ns   |        __31.4527 ns__ |          N/A       |          32.7317 ns   |          N/A       |

| transform point2 x256          |   888.5654 ns   |       __831.9341 ns__ |          N/A       |       __848.0397 ns__ |          N/A       |

| transform point3 x16           |    96.9017 ns   |        __81.6828 ns__ |          N/A       |        __82.8904 ns__ |          N/A       |

| transform point3 x256          |      1.567 us   |          __1.398 us__ |          N/A       |           __1.43 us__ |          N/A       |

| transform vector2 x16          |    43.7679 ns   |        __29.9349 ns__ |          N/A       |          31.8630 ns   |          N/A       |

| transform vector2 x256         |   858.5660 ns   |       __825.0261 ns__ |          N/A       |         851.7501 ns   |          N/A       |

| transform vector3 x16          |    96.5535 ns   |        __80.1612 ns__ |          N/A       |          85.0659 ns   |          N/A       |

| transform vector3 x256         |      1.557 us   |          __1.394 us__ |          N/A       |            1.438 us   |          N/A       |

| vector3 cross x16              |    42.1941 ns   |          26.6677 ns   |          N/A       |        __22.0924 ns__ |          N/A       |

| vector3 dot x16                |    29.1805 ns   |          12.7972 ns   |          N/A       |        __12.2872 ns__ |          N/A       |

| vector3 length x16             |    32.6014 ns   |           9.7692 ns   |          N/A       |         __9.4271 ns__ |          N/A       |

| vector3 normalize x16          |    65.8815 ns   |          24.1661 ns   |          N/A       |        __20.3579 ns__ |          N/A       |

| vector3 return self x16        |    32.0051 ns   |          42.9462 ns   |          N/A       |        __16.7808 ns__ |          N/A       |

[Intel i7-4710HQ]: https://ark.intel.com/content/www/us/en/ark/products/78930/intel-core-i7-4710hq-processor-6m-cache-up-to-3-50-ghz.html

[mathbench report]: https://bitshifter.github.io/mathbench/0.4.1/report/index.html

## Running the benchmarks

The benchmarks use the criterion crate which works on stable Rust, they can be

run with:

```sh

cargo bench

```

For the best results close other applications on the machine you are using to

benchmark!

When running "wide" benchmarks, be sure you compile with with the appropriate

`target-feature`s enabled, e.g. `+avx2`, for best results.

There is a script in `scripts/summary.py` to summarize the results in a nice

fashion. It requires Python 3 and the `prettytable` Python module, then can

be run to generate an ASCII output.

## Default and optional features

All libraries except for `glam` are optional for running benchmarks. The default

features include `cgmath`, `ultraviolet` and `nalgebra`. These can be disabled

with:

```sh

cargo bench --no-default-features

```

To selectively enable a specific default feature again use:

```sh

cargo bench --no-default-features --features nalgebra

```

Note that you can filter which benchmarks to run at runtime by using

Criterion's filtering feature. For example, to only run scalar benchmarks

and not wide ones, use:

```sh

cargo bench "scalar"

```

You can also get more granular. For example to only run wide matrix2 benchmarks,

use:

```sh

cargo bench --features wide "wide matrix2"

```

or to only run the scalar "vec3 length" benchmark for `glam`, use:

```sh

cargo bench "scalar vec3 length/glam"

```

### Crate features

There are a few extra features in addition to the direct features referring to

each benchmarked library.

* `ultraviolet_f32x4`, `ultraviolet_f32x8`, `nalgebra_f32x4`,

  `nalgebra_f32x8` - these each enable benchmarking specific wide types from

  each of `ultraviolet` or `nalgebra`.

* `ultraviolet_wide`, `nalgebra_wide` - these enable benchmarking all wide

  types from `ultraviolet` or `nalgebra` respectively.

* `wide` - enables all "wide" type benchmarks

* `all` - enables all supported libraries, including wide and scalar ones.

* `unstable` - see next section

#### `unstable` feature

The `unstable` feature requires a nightly compiler, and it allows us to tell

rustc not to inline certain functions within hot benchmark loops. This is used

in the ray-sphere intersection benchmark in order to simulate situations where

the autovectorizer would not be able to properly vectorize your code.

## Running the tests

The tests can be run using:

```sh

cargo test

```

## Publishing results

When publishing benchmark results it is important to document the details of how

the benchmarks were run, including:

* The version of `mathbench` used

* The versions of all libraries benched

* The Rust version

* The build settings used, especially when they differ from the defaults

* The specification of the hardware that was used

* The output of `scripts/summary.py`

* The full Criterion output from `target/criterion`

## Adding a new library

There are different steps involved for adding a unit tests and benchmarks for a

new library.

Benchmarks require an implementation of the `mathbench::RandomVec` trait for the

types you want to benchmark. If the type implements the `rand` crate

`distribution::Distribution` trait for `Standard` then you can simply use the

`impl_random_vec!` macro in `src/lib.rs`. Otherwise you can provide a function

that generates a new random value of your type pass that to `impl_random_vec!`.

To add the new libary type to a benchmark, add another `bench_function` call to

the `Criterion` `BenchmarkGroup`.

Increment the patch version number of `mathbench` in the `Cargo.toml`.

Update `CHANGELOG.md`.

## Build times

`mathbench` also includes a tool for comparing full build times in

`tools/buildbench`. Incremental build times are not measured as it would be non

trivial to create a meaningful test across different math crates.

The `buildbench` tool uses the `-Z timings` feature of the nightly build of

`cargo`, thus you need a nightly build to run it.

`buildbench` generates a `Cargo.toml` and empty `src/lib.rs` in a temporary

directory for each library, recording some build time information which is

included in the summary table below. The temporary directory is created every

time the tool is run so this is a full build from a clean state.

Each library is only built once so you may wish to run `buildbench` multiple

times to ensure results are consistent.

By default crates are built using the `release` profile with default features

enabled. There are options for building the `dev` profile or without default

features, see `buildbench --help` for more information.

The columns outputted include the total build time, the self build time which is

the time it took to build the crate on it's own excluding dependencies, and the

number of units which is the number of dependencies (this will be 2 at minimum).

When comparing build times keep in mind that each library has different feature

sets and that naturally larger libraries will take longer to build. For many

crates tested the dependencies take longer than the math crate. Also keep in

mind if you are already building one of the dependencies in your project you

won't pay the build cost twice (unless it's a different version).

| crate               | version | total (s) | self (s) | units |

|:--------------------|:--------|----------:|---------:|------:|

| cgmath              | 0.17.0  |       6.8 |      3.0 |    17 |

| euclid              | 0.22.1  |       3.4 |      1.0 |     4 |

| glam                | 0.9.4   |       1.1 |      0.6 |     2 |

| nalgebra            | 0.22.0  |      24.2 |     18.0 |    24 |

| pathfinder_geometry | 0.5.1   |       3.0 |      0.3 |     8 |

| static-math         | 0.1.6   |       6.9 |      1.7 |    10 |

| ultraviolet         | 0.5.1   |       2.5 |      1.3 |     4 |

| vek                 | 0.12.0  |      34.4 |     10.1 |    16 |

These benchmarks were performed on an [Intel i7-4710HQ] CPU with 16GB RAM and a

Toshiba MQ01ABD100 HDD (SATA 3Gbps 5400RPM) on Linux.

## License

Licensed under either of

* Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE)

  or http://www.apache.org/licenses/LICENSE-2.0)

* MIT license ([LICENSE-MIT](LICENSE-MIT)

  or http://opensource.org/licenses/MIT)

at your option.

## Contribution

Contributions in any form (issues, pull requests, etc.) to this project must

adhere to Rust's [Code of Conduct].

Unless you explicitly state otherwise, any contribution intentionally submitted

for inclusion in the work by you, as defined in the Apache-2.0 license, shall be

dual licensed as above, without any additional terms or conditions.

[Code of Conduct]: https://www.rust-lang.org/en-US/conduct.html

## Support

If you are interested in contributing or have a request or suggestion

[create an issue] on github.

[create an issue]: https://github.com/bitshifter/mathbench-rs/issues
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bitshifter/mathbench-rs

Awesome Lists containing this project

README