Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jeroenbakker-atmind/rs-fractal-julia

Julia Fractal Generator in rust.
https://github.com/jeroenbakker-atmind/rs-fractal-julia

Last synced: about 1 month ago
JSON representation

Julia Fractal Generator in rust.

Awesome Lists containing this project

README

        

Julia fractal generator for rust.

Project started when adding huge image support in blender image engine. During development it
wasn't easy to generate/download huge images that were also interesting to show.

A second goal of the project is to exercise at writing vectorization code (AVX1/AVX2) in assembly.
Rust is mostly used for IO and outer loops. There are some rust native kernels for time
comparison.

Results running on an Intel(R) Core(TM) i7-8550U CPU (Slowest on top).

```
test benchmark::bench_native_f64 ... bench: 26,907,733 ns/iter (+/- 1,039,052)
test benchmark::bench_cpu_f64 ... bench: 26,821,107 ns/iter (+/- 971,015)
test benchmark::bench_cpu_f32 ... bench: 22,525,174 ns/iter (+/- 961,333)
test benchmark::bench_native_f32 ... bench: 22,246,048 ns/iter (+/- 1,016,533)
test benchmark::bench_asm_xmm_f64_scalar ... bench: 21,793,094 ns/iter (+/- 929,025)
test benchmark::bench_asm_xmm_f32_scalar ... bench: 21,760,044 ns/iter (+/- 1,005,370)
test benchmark::bench_asm_xmm_f64_packed ... bench: 12,938,314 ns/iter (+/- 685,412)
test benchmark::bench_asm_ymm_f64_packed ... bench: 7,932,974 ns/iter (+/- 401,188)
test benchmark::bench_asm_xmm_f32_packed ... bench: 7,837,280 ns/iter (+/- 429,013)
test benchmark::bench_asm_ymm_f32_packed ... bench: 4,821,649 ns/iter (+/- 338,150)
```

Remarkably double precision is slower in rust. This could be to bad vectorization. Still need to
have a look at the generated assembly. Modern CPU only calculate in f64 precision. To support f32 it uses
bit sizzling inside the CPU.

The best kernel (performance vs precision) is julia_ymm_f64_packed. The kernel only support
calculating a multiple of the scalar packing number of items.

```
xmm_f32 = 4
xmm_f64 = 2
ymm_f32 = 8
ymm_f64 = 4
```

The kernels are optimized for readability. The kernels can still be improved performance wise.

Note: That this has been developed on a Linux OS and hasn't been tested on other OS's.
Other OS's and linkers require different stack management.
Note: Will only compile and run on AVX2 X86 processors. There isn't any check if your
CPU is supported.