https://github.com/samhaswon/simd_blend_modes
https://github.com/samhaswon/simd_blend_modes
Last synced: 23 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/samhaswon/simd_blend_modes
- Owner: samhaswon
- License: mit
- Created: 2026-02-02T19:12:20.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-02-05T00:23:09.000Z (4 months ago)
- Last Synced: 2026-02-05T11:59:05.548Z (4 months ago)
- Language: C
- Size: 4.85 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Agents: AGENTS.md
Awesome Lists containing this project
README
# SIMD Blend Modes
This project reimplements the blend modes from [`blend_modes`](https://github.com/flrs/blend_modes) with C kernels and SIMD
(SSE4.2/AVX2) acceleration. It supports uint8 and float32 NumPy inputs in the range 0..255
and returns output dtype/channel count matching the background image. Missing alpha channels
are treated as fully opaque (255). Opacity defaults to 1.0.
This is mostly intended to be a mostly drop-in replacement, but with a more permissive
API that allows you to go faster if you don't need FP32 arrays or the information of an
Alpha channel for some layers.
## Build and Install
### General
```bash
pip install simd-blend-modes
```
### Development
```bash
pip install -r requirements-dev.txt
pip install -e .
```
## Usage
```python
import numpy as np
import simd_blend_modes as sbm
background = np.zeros((512, 512, 4), dtype=np.uint8)
foreground = np.zeros((512, 512, 4), dtype=np.uint8)
out = sbm.screen(background, foreground, 0.5)
```
Inputs:
- Dtypes: `np.uint8` or `np.float32` only.
- Value range: 0..255 for both dtypes.
- This expects float32 inputs to be cast from uint8, not normalized as well.
- Shapes: `H x W x C` with `C` = 3 (RGB) or 4 (RGBA).
- Output: dtype and channel count match the background image.
- Alpha: if a source is RGB (3 channels), alpha is treated as 255 (fully opaque).
- Opacity: the third argument is optional; defaults to `1.0`.
Supported blend modes:
- [`normal`](https://en.wikipedia.org/wiki/Blend_modes#Normal_blend_mode)
- [`soft_light`](https://en.wikipedia.org/wiki/Blend_modes#Soft_Light)
- [`lighten_only`](https://en.wikipedia.org/wiki/Blend_modes#Lighten_Only)
- [`screen`](https://en.wikipedia.org/wiki/Blend_modes#Screen)
- [`dodge`](https://en.wikipedia.org/wiki/Blend_modes#Dodge_and_burn)
- [`addition`](https://en.wikipedia.org/wiki/Blend_modes#Addition)
- [`darken_only`](https://en.wikipedia.org/wiki/Blend_modes#Darken_Only)
- [`multiply`](https://en.wikipedia.org/wiki/Blend_modes#Multiply)
- [`hard_light`](https://en.wikipedia.org/wiki/Blend_modes#Hard_Light)
- [`difference`](https://en.wikipedia.org/wiki/Blend_modes#Difference)
- [`subtract`](https://en.wikipedia.org/wiki/Blend_modes#Subtract)
- `grain_extract` (known from GIMP)
- `grain_merge` (known from GIMP)
- [`divide`](https://en.wikipedia.org/wiki/Blend_modes#Divide)
- [`overlay`](https://en.wikipedia.org/wiki/Blend_modes#Overlay)
You can force a kernel by passing a string (or `KernelKind` value):
```python
out = sbm.screen(background, foreground, 0.5, "avx2")
```
## Tests
Correctness and performance:
```bash
python3 -m unittest discover tests/
```
Performance:
```bash
python3 -m unittest tests.test_performance
```
The performance test prints a markdown table of per-kernel speedups vs the NumPy reference
for common square sizes and screen resolutions.
## ARM
ARM isn't properly supported as I do not have a new enough ARM CPU to test on.
Nor do I wish to use a cloud VM to test it. So, if you want ARM support, open a PR.
It should build and be faster, but there's no SIMD support there (yet).
ARM builds run in scalar-only mode (x86 SIMD is compile-time gated). To test ARM under Docker,
enable emulation and then build with the ARM platform.
If you don't already have buildx/binfmt configured, run:
```bash
docker run --privileged --rm tonistiigi/binfmt --install arm64
```
Then build or run the ARM container:
```bash
docker compose up --build
```
This is incredibly slow. I wouldn't actually do this, but it's here.
## Notes
- SIMD kernels are selected at runtime: AVX2 → SSE4.2 → scalar.
- ARM builds are supported in scalar-only mode; x86 SIMD is compile-time gated. CI does not emit
ARM artifacts.
- Reference tests adapted from the original project live in `tests/reference_blend_modes_tests.py`
and are skipped unless the `blend_modes` package and test assets are available.
- The SIMD paths currently assume contiguous arrays (the input validation enforces this).
## Performance
| Mode | Kernel | Ref (s) | Kernel (s) | Speedup | Percent Change |
| ------------- | ------ | -------- | ---------- | ------- | -------------- |
| normal | scalar | 0.167029 | 0.033657 | 4.96x | -79.85% |
| normal | sse42 | 0.167029 | 0.011444 | 14.60x | -93.15% |
| normal | avx2 | 0.167029 | 0.010915 | 15.30x | -93.47% |
| soft_light | scalar | 0.227108 | 0.040065 | 5.67x | -82.36% |
| soft_light | sse42 | 0.227108 | 0.012021 | 18.89x | -94.71% |
| soft_light | avx2 | 0.227108 | 0.011137 | 20.39x | -95.10% |
| lighten_only | scalar | 0.168433 | 0.043472 | 3.87x | -74.19% |
| lighten_only | sse42 | 0.168433 | 0.011929 | 14.12x | -92.92% |
| lighten_only | avx2 | 0.168433 | 0.010955 | 15.38x | -93.50% |
| screen | scalar | 0.179914 | 0.038705 | 4.65x | -78.49% |
| screen | sse42 | 0.179914 | 0.012018 | 14.97x | -93.32% |
| screen | avx2 | 0.179914 | 0.011221 | 16.03x | -93.76% |
| dodge | scalar | 0.182316 | 0.041261 | 4.42x | -77.37% |
| dodge | sse42 | 0.182316 | 0.012758 | 14.29x | -93.00% |
| dodge | avx2 | 0.182316 | 0.011420 | 15.96x | -93.74% |
| addition | scalar | 0.174717 | 0.061780 | 2.83x | -64.64% |
| addition | sse42 | 0.174717 | 0.012799 | 13.65x | -92.67% |
| addition | avx2 | 0.174717 | 0.011310 | 15.45x | -93.53% |
| darken_only | scalar | 0.172792 | 0.043941 | 3.93x | -74.57% |
| darken_only | sse42 | 0.172792 | 0.011946 | 14.46x | -93.09% |
| darken_only | avx2 | 0.172792 | 0.011097 | 15.57x | -93.58% |
| multiply | scalar | 0.174347 | 0.039088 | 4.46x | -77.58% |
| multiply | sse42 | 0.174347 | 0.011828 | 14.74x | -93.22% |
| multiply | avx2 | 0.174347 | 0.010915 | 15.97x | -93.74% |
| hard_light | scalar | 0.255683 | 0.076160 | 3.36x | -70.21% |
| hard_light | sse42 | 0.255683 | 0.012848 | 19.90x | -94.98% |
| hard_light | avx2 | 0.255683 | 0.011286 | 22.65x | -95.59% |
| difference | scalar | 0.230410 | 0.038175 | 6.04x | -83.43% |
| difference | sse42 | 0.230410 | 0.011963 | 19.26x | -94.81% |
| difference | avx2 | 0.230410 | 0.010998 | 20.95x | -95.23% |
| subtract | scalar | 0.172605 | 0.039590 | 4.36x | -77.06% |
| subtract | sse42 | 0.172605 | 0.012698 | 13.59x | -92.64% |
| subtract | avx2 | 0.172605 | 0.011318 | 15.25x | -93.44% |
| grain_extract | scalar | 0.177783 | 0.051094 | 3.48x | -71.26% |
| grain_extract | sse42 | 0.177783 | 0.012071 | 14.73x | -93.21% |
| grain_extract | avx2 | 0.177783 | 0.010952 | 16.23x | -93.84% |
| grain_merge | scalar | 0.178960 | 0.050727 | 3.53x | -71.65% |
| grain_merge | sse42 | 0.178960 | 0.012008 | 14.90x | -93.29% |
| grain_merge | avx2 | 0.178960 | 0.011038 | 16.21x | -93.83% |
| divide | scalar | 0.181181 | 0.040103 | 4.52x | -77.87% |
| divide | sse42 | 0.181181 | 0.012229 | 14.82x | -93.25% |
| divide | avx2 | 0.181181 | 0.011248 | 16.11x | -93.79% |
| overlay | scalar | 0.237504 | 0.072933 | 3.26x | -69.29% |
| overlay | sse42 | 0.237504 | 0.012377 | 19.19x | -94.79% |
| overlay | avx2 | 0.237504 | 0.011132 | 21.33x | -95.31% |
Per-kernel, size, and type results
| Case | Input | Channels | Opacity | Mode | Kernel | Ref (s) | Kernel (s) | Speedup | Percent Change |
| --------- | ------- | -------- | ------- | ------------- | ------ | -------- | ---------- | ------- | -------------- |
| 256x256 | uint8 | 3 | 0.50 | normal | scalar | 0.007060 | 0.001647 | 4.29x | -76.67% |
| 256x256 | uint8 | 3 | 0.50 | normal | sse42 | 0.007060 | 0.000707 | 9.98x | -89.98% |
| 256x256 | uint8 | 3 | 0.50 | normal | avx2 | 0.007060 | 0.000709 | 9.96x | -89.96% |
| 256x256 | uint8 | 3 | 0.50 | soft_light | scalar | 0.008574 | 0.001827 | 4.69x | -78.69% |
| 256x256 | uint8 | 3 | 0.50 | soft_light | sse42 | 0.008574 | 0.000833 | 10.30x | -90.29% |
| 256x256 | uint8 | 3 | 0.50 | soft_light | avx2 | 0.008574 | 0.000738 | 11.62x | -91.39% |
| 256x256 | uint8 | 3 | 0.50 | lighten_only | scalar | 0.007031 | 0.001946 | 3.61x | -72.33% |
| 256x256 | uint8 | 3 | 0.50 | lighten_only | sse42 | 0.007031 | 0.000790 | 8.90x | -88.76% |
| 256x256 | uint8 | 3 | 0.50 | lighten_only | avx2 | 0.007031 | 0.000704 | 9.99x | -89.99% |
| 256x256 | uint8 | 3 | 0.50 | screen | scalar | 0.007561 | 0.001829 | 4.13x | -75.81% |
| 256x256 | uint8 | 3 | 0.50 | screen | sse42 | 0.007561 | 0.000814 | 9.29x | -89.23% |
| 256x256 | uint8 | 3 | 0.50 | screen | avx2 | 0.007561 | 0.000739 | 10.23x | -90.22% |
| 256x256 | uint8 | 3 | 0.50 | dodge | scalar | 0.007170 | 0.001871 | 3.83x | -73.91% |
| 256x256 | uint8 | 3 | 0.50 | dodge | sse42 | 0.007170 | 0.000812 | 8.83x | -88.67% |
| 256x256 | uint8 | 3 | 0.50 | dodge | avx2 | 0.007170 | 0.000732 | 9.79x | -89.79% |
| 256x256 | uint8 | 3 | 0.50 | addition | scalar | 0.007163 | 0.002554 | 2.80x | -64.35% |
| 256x256 | uint8 | 3 | 0.50 | addition | sse42 | 0.007163 | 0.000820 | 8.74x | -88.56% |
| 256x256 | uint8 | 3 | 0.50 | addition | avx2 | 0.007163 | 0.000731 | 9.79x | -89.79% |
| 256x256 | uint8 | 3 | 0.50 | darken_only | scalar | 0.007034 | 0.001954 | 3.60x | -72.22% |
| 256x256 | uint8 | 3 | 0.50 | darken_only | sse42 | 0.007034 | 0.000771 | 9.12x | -89.03% |
| 256x256 | uint8 | 3 | 0.50 | darken_only | avx2 | 0.007034 | 0.000691 | 10.18x | -90.18% |
| 256x256 | uint8 | 3 | 0.50 | multiply | scalar | 0.006933 | 0.001771 | 3.91x | -74.46% |
| 256x256 | uint8 | 3 | 0.50 | multiply | sse42 | 0.006933 | 0.000770 | 9.01x | -88.90% |
| 256x256 | uint8 | 3 | 0.50 | multiply | avx2 | 0.006933 | 0.000690 | 10.05x | -90.05% |
| 256x256 | uint8 | 3 | 0.50 | hard_light | scalar | 0.008719 | 0.003057 | 2.85x | -64.94% |
| 256x256 | uint8 | 3 | 0.50 | hard_light | sse42 | 0.008719 | 0.000834 | 10.46x | -90.44% |
| 256x256 | uint8 | 3 | 0.50 | hard_light | avx2 | 0.008719 | 0.000727 | 11.99x | -91.66% |
| 256x256 | uint8 | 3 | 0.50 | difference | scalar | 0.008743 | 0.001775 | 4.92x | -79.70% |
| 256x256 | uint8 | 3 | 0.50 | difference | sse42 | 0.008743 | 0.000801 | 10.91x | -90.84% |
| 256x256 | uint8 | 3 | 0.50 | difference | avx2 | 0.008743 | 0.000719 | 12.16x | -91.78% |
| 256x256 | uint8 | 3 | 0.50 | subtract | scalar | 0.007122 | 0.001672 | 4.26x | -76.52% |
| 256x256 | uint8 | 3 | 0.50 | subtract | sse42 | 0.007122 | 0.000816 | 8.73x | -88.54% |
| 256x256 | uint8 | 3 | 0.50 | subtract | avx2 | 0.007122 | 0.000699 | 10.19x | -90.19% |
| 256x256 | uint8 | 3 | 0.50 | grain_extract | scalar | 0.006934 | 0.002214 | 3.13x | -68.07% |
| 256x256 | uint8 | 3 | 0.50 | grain_extract | sse42 | 0.006934 | 0.000794 | 8.74x | -88.55% |
| 256x256 | uint8 | 3 | 0.50 | grain_extract | avx2 | 0.006934 | 0.000696 | 9.96x | -89.96% |
| 256x256 | uint8 | 3 | 0.50 | grain_merge | scalar | 0.006903 | 0.002156 | 3.20x | -68.76% |
| 256x256 | uint8 | 3 | 0.50 | grain_merge | sse42 | 0.006903 | 0.000813 | 8.49x | -88.22% |
| 256x256 | uint8 | 3 | 0.50 | grain_merge | avx2 | 0.006903 | 0.000754 | 9.16x | -89.08% |
| 256x256 | uint8 | 3 | 0.50 | divide | scalar | 0.007122 | 0.001819 | 3.92x | -74.46% |
| 256x256 | uint8 | 3 | 0.50 | divide | sse42 | 0.007122 | 0.000836 | 8.51x | -88.25% |
| 256x256 | uint8 | 3 | 0.50 | divide | avx2 | 0.007122 | 0.000758 | 9.40x | -89.36% |
| 256x256 | uint8 | 3 | 0.50 | overlay | scalar | 0.008676 | 0.002970 | 2.92x | -65.77% |
| 256x256 | uint8 | 3 | 0.50 | overlay | sse42 | 0.008676 | 0.000793 | 10.94x | -90.86% |
| 256x256 | uint8 | 3 | 0.50 | overlay | avx2 | 0.008676 | 0.000706 | 12.29x | -91.86% |
| 256x256 | uint8 | 4 | 0.50 | normal | scalar | 0.003088 | 0.001293 | 2.39x | -58.13% |
| 256x256 | uint8 | 4 | 0.50 | normal | sse42 | 0.003088 | 0.000182 | 17.01x | -94.12% |
| 256x256 | uint8 | 4 | 0.50 | normal | avx2 | 0.003088 | 0.000162 | 19.11x | -94.77% |
| 256x256 | uint8 | 4 | 0.50 | soft_light | scalar | 0.006625 | 0.001629 | 4.07x | -75.42% |
| 256x256 | uint8 | 4 | 0.50 | soft_light | sse42 | 0.006625 | 0.000221 | 29.92x | -96.66% |
| 256x256 | uint8 | 4 | 0.50 | soft_light | avx2 | 0.006625 | 0.000207 | 32.03x | -96.88% |
| 256x256 | uint8 | 4 | 0.50 | lighten_only | scalar | 0.005348 | 0.001731 | 3.09x | -67.63% |
| 256x256 | uint8 | 4 | 0.50 | lighten_only | sse42 | 0.005348 | 0.000195 | 27.45x | -96.36% |
| 256x256 | uint8 | 4 | 0.50 | lighten_only | avx2 | 0.005348 | 0.000186 | 28.72x | -96.52% |
| 256x256 | uint8 | 4 | 0.50 | screen | scalar | 0.005297 | 0.001556 | 3.40x | -70.62% |
| 256x256 | uint8 | 4 | 0.50 | screen | sse42 | 0.005297 | 0.000218 | 24.26x | -95.88% |
| 256x256 | uint8 | 4 | 0.50 | screen | avx2 | 0.005297 | 0.000193 | 27.46x | -96.36% |
| 256x256 | uint8 | 4 | 0.50 | dodge | scalar | 0.005452 | 0.001668 | 3.27x | -69.40% |
| 256x256 | uint8 | 4 | 0.50 | dodge | sse42 | 0.005452 | 0.000248 | 22.01x | -95.46% |
| 256x256 | uint8 | 4 | 0.50 | dodge | avx2 | 0.005452 | 0.000206 | 26.49x | -96.23% |
| 256x256 | uint8 | 4 | 0.50 | addition | scalar | 0.005437 | 0.001983 | 2.74x | -63.53% |
| 256x256 | uint8 | 4 | 0.50 | addition | sse42 | 0.005437 | 0.000265 | 20.53x | -95.13% |
| 256x256 | uint8 | 4 | 0.50 | addition | avx2 | 0.005437 | 0.000199 | 27.30x | -96.34% |
| 256x256 | uint8 | 4 | 0.50 | darken_only | scalar | 0.005319 | 0.001718 | 3.10x | -67.71% |
| 256x256 | uint8 | 4 | 0.50 | darken_only | sse42 | 0.005319 | 0.000199 | 26.78x | -96.27% |
| 256x256 | uint8 | 4 | 0.50 | darken_only | avx2 | 0.005319 | 0.000187 | 28.39x | -96.48% |
| 256x256 | uint8 | 4 | 0.50 | multiply | scalar | 0.005352 | 0.001621 | 3.30x | -69.70% |
| 256x256 | uint8 | 4 | 0.50 | multiply | sse42 | 0.005352 | 0.000212 | 25.27x | -96.04% |
| 256x256 | uint8 | 4 | 0.50 | multiply | avx2 | 0.005352 | 0.000191 | 27.97x | -96.42% |
| 256x256 | uint8 | 4 | 0.50 | hard_light | scalar | 0.007153 | 0.002625 | 2.72x | -63.30% |
| 256x256 | uint8 | 4 | 0.50 | hard_light | sse42 | 0.007153 | 0.000242 | 29.58x | -96.62% |
| 256x256 | uint8 | 4 | 0.50 | hard_light | avx2 | 0.007153 | 0.000199 | 36.03x | -97.22% |
| 256x256 | uint8 | 4 | 0.50 | difference | scalar | 0.007306 | 0.001604 | 4.55x | -78.04% |
| 256x256 | uint8 | 4 | 0.50 | difference | sse42 | 0.007306 | 0.000205 | 35.70x | -97.20% |
| 256x256 | uint8 | 4 | 0.50 | difference | avx2 | 0.007306 | 0.000191 | 38.29x | -97.39% |
| 256x256 | uint8 | 4 | 0.50 | subtract | scalar | 0.005437 | 0.001492 | 3.64x | -72.56% |
| 256x256 | uint8 | 4 | 0.50 | subtract | sse42 | 0.005437 | 0.000267 | 20.35x | -95.08% |
| 256x256 | uint8 | 4 | 0.50 | subtract | avx2 | 0.005437 | 0.000227 | 23.97x | -95.83% |
| 256x256 | uint8 | 4 | 0.50 | grain_extract | scalar | 0.005462 | 0.001929 | 2.83x | -64.68% |
| 256x256 | uint8 | 4 | 0.50 | grain_extract | sse42 | 0.005462 | 0.000212 | 25.73x | -96.11% |
| 256x256 | uint8 | 4 | 0.50 | grain_extract | avx2 | 0.005462 | 0.000205 | 26.69x | -96.25% |
| 256x256 | uint8 | 4 | 0.50 | grain_merge | scalar | 0.005275 | 0.001919 | 2.75x | -63.63% |
| 256x256 | uint8 | 4 | 0.50 | grain_merge | sse42 | 0.005275 | 0.000218 | 24.22x | -95.87% |
| 256x256 | uint8 | 4 | 0.50 | grain_merge | avx2 | 0.005275 | 0.000201 | 26.22x | -96.19% |
| 256x256 | uint8 | 4 | 0.50 | divide | scalar | 0.005521 | 0.001627 | 3.39x | -70.54% |
| 256x256 | uint8 | 4 | 0.50 | divide | sse42 | 0.005521 | 0.000221 | 24.99x | -96.00% |
| 256x256 | uint8 | 4 | 0.50 | divide | avx2 | 0.005521 | 0.000201 | 27.50x | -96.36% |
| 256x256 | uint8 | 4 | 0.50 | overlay | scalar | 0.006683 | 0.002572 | 2.60x | -61.51% |
| 256x256 | uint8 | 4 | 0.50 | overlay | sse42 | 0.006683 | 0.000227 | 29.46x | -96.61% |
| 256x256 | uint8 | 4 | 0.50 | overlay | avx2 | 0.006683 | 0.000205 | 32.67x | -96.94% |
| 256x256 | float32 | 3 | 0.50 | normal | scalar | 0.005614 | 0.000506 | 11.10x | -90.99% |
| 256x256 | float32 | 3 | 0.50 | normal | sse42 | 0.005614 | 0.000225 | 25.01x | -96.00% |
| 256x256 | float32 | 3 | 0.50 | normal | avx2 | 0.005614 | 0.000144 | 38.94x | -97.43% |
| 256x256 | float32 | 3 | 0.50 | soft_light | scalar | 0.008346 | 0.000616 | 13.55x | -92.62% |
| 256x256 | float32 | 3 | 0.50 | soft_light | sse42 | 0.008346 | 0.000127 | 65.64x | -98.48% |
| 256x256 | float32 | 3 | 0.50 | soft_light | avx2 | 0.008346 | 0.000078 | 107.67x | -99.07% |
| 256x256 | float32 | 3 | 0.50 | lighten_only | scalar | 0.006958 | 0.000748 | 9.30x | -89.25% |
| 256x256 | float32 | 3 | 0.50 | lighten_only | sse42 | 0.006958 | 0.000107 | 64.94x | -98.46% |
| 256x256 | float32 | 3 | 0.50 | lighten_only | avx2 | 0.006958 | 0.000067 | 104.50x | -99.04% |
| 256x256 | float32 | 3 | 0.50 | screen | scalar | 0.007108 | 0.000657 | 10.82x | -90.75% |
| 256x256 | float32 | 3 | 0.50 | screen | sse42 | 0.007108 | 0.000115 | 61.63x | -98.38% |
| 256x256 | float32 | 3 | 0.50 | screen | avx2 | 0.007108 | 0.000069 | 103.07x | -99.03% |
| 256x256 | float32 | 3 | 0.50 | dodge | scalar | 0.006983 | 0.000635 | 11.00x | -90.90% |
| 256x256 | float32 | 3 | 0.50 | dodge | sse42 | 0.006983 | 0.000126 | 55.61x | -98.20% |
| 256x256 | float32 | 3 | 0.50 | dodge | avx2 | 0.006983 | 0.000079 | 87.97x | -98.86% |
| 256x256 | float32 | 3 | 0.50 | addition | scalar | 0.007158 | 0.001560 | 4.59x | -78.20% |
| 256x256 | float32 | 3 | 0.50 | addition | sse42 | 0.007158 | 0.000110 | 65.25x | -98.47% |
| 256x256 | float32 | 3 | 0.50 | addition | avx2 | 0.007158 | 0.000078 | 91.57x | -98.91% |
| 256x256 | float32 | 3 | 0.50 | darken_only | scalar | 0.006830 | 0.000772 | 8.84x | -88.69% |
| 256x256 | float32 | 3 | 0.50 | darken_only | sse42 | 0.006830 | 0.000112 | 60.80x | -98.36% |
| 256x256 | float32 | 3 | 0.50 | darken_only | avx2 | 0.006830 | 0.000065 | 104.68x | -99.04% |
| 256x256 | float32 | 3 | 0.50 | multiply | scalar | 0.006988 | 0.000569 | 12.29x | -91.86% |
| 256x256 | float32 | 3 | 0.50 | multiply | sse42 | 0.006988 | 0.000109 | 64.39x | -98.45% |
| 256x256 | float32 | 3 | 0.50 | multiply | avx2 | 0.006988 | 0.000068 | 102.93x | -99.03% |
| 256x256 | float32 | 3 | 0.50 | hard_light | scalar | 0.008962 | 0.001796 | 4.99x | -79.97% |
| 256x256 | float32 | 3 | 0.50 | hard_light | sse42 | 0.008962 | 0.000134 | 66.84x | -98.50% |
| 256x256 | float32 | 3 | 0.50 | hard_light | avx2 | 0.008962 | 0.000074 | 121.92x | -99.18% |
| 256x256 | float32 | 3 | 0.50 | difference | scalar | 0.008783 | 0.000583 | 15.06x | -93.36% |
| 256x256 | float32 | 3 | 0.50 | difference | sse42 | 0.008783 | 0.000181 | 48.62x | -97.94% |
| 256x256 | float32 | 3 | 0.50 | difference | avx2 | 0.008783 | 0.000067 | 130.85x | -99.24% |
| 256x256 | float32 | 3 | 0.50 | subtract | scalar | 0.007209 | 0.000675 | 10.68x | -90.64% |
| 256x256 | float32 | 3 | 0.50 | subtract | sse42 | 0.007209 | 0.000113 | 63.60x | -98.43% |
| 256x256 | float32 | 3 | 0.50 | subtract | avx2 | 0.007209 | 0.000068 | 106.31x | -99.06% |
| 256x256 | float32 | 3 | 0.50 | grain_extract | scalar | 0.007080 | 0.001008 | 7.02x | -85.76% |
| 256x256 | float32 | 3 | 0.50 | grain_extract | sse42 | 0.007080 | 0.000120 | 58.96x | -98.30% |
| 256x256 | float32 | 3 | 0.50 | grain_extract | avx2 | 0.007080 | 0.000076 | 93.38x | -98.93% |
| 256x256 | float32 | 3 | 0.50 | grain_merge | scalar | 0.007030 | 0.001011 | 6.95x | -85.62% |
| 256x256 | float32 | 3 | 0.50 | grain_merge | sse42 | 0.007030 | 0.000130 | 54.09x | -98.15% |
| 256x256 | float32 | 3 | 0.50 | grain_merge | avx2 | 0.007030 | 0.000066 | 105.80x | -99.05% |
| 256x256 | float32 | 3 | 0.50 | divide | scalar | 0.007192 | 0.000630 | 11.41x | -91.24% |
| 256x256 | float32 | 3 | 0.50 | divide | sse42 | 0.007192 | 0.000161 | 44.69x | -97.76% |
| 256x256 | float32 | 3 | 0.50 | divide | avx2 | 0.007192 | 0.000073 | 98.93x | -98.99% |
| 256x256 | float32 | 3 | 0.50 | overlay | scalar | 0.008223 | 0.001632 | 5.04x | -80.16% |
| 256x256 | float32 | 3 | 0.50 | overlay | sse42 | 0.008223 | 0.000124 | 66.30x | -98.49% |
| 256x256 | float32 | 3 | 0.50 | overlay | avx2 | 0.008223 | 0.000070 | 118.17x | -99.15% |
| 256x256 | float32 | 4 | 0.50 | normal | scalar | 0.004337 | 0.000616 | 7.04x | -85.79% |
| 256x256 | float32 | 4 | 0.50 | normal | sse42 | 0.004337 | 0.000136 | 31.80x | -96.86% |
| 256x256 | float32 | 4 | 0.50 | normal | avx2 | 0.004337 | 0.000147 | 29.53x | -96.61% |
| 256x256 | float32 | 4 | 0.50 | soft_light | scalar | 0.006553 | 0.000705 | 9.30x | -89.24% |
| 256x256 | float32 | 4 | 0.50 | soft_light | sse42 | 0.006553 | 0.000179 | 36.70x | -97.27% |
| 256x256 | float32 | 4 | 0.50 | soft_light | avx2 | 0.006553 | 0.000175 | 37.39x | -97.33% |
| 256x256 | float32 | 4 | 0.50 | lighten_only | scalar | 0.005270 | 0.000780 | 6.76x | -85.21% |
| 256x256 | float32 | 4 | 0.50 | lighten_only | sse42 | 0.005270 | 0.000162 | 32.49x | -96.92% |
| 256x256 | float32 | 4 | 0.50 | lighten_only | avx2 | 0.005270 | 0.000180 | 29.34x | -96.59% |
| 256x256 | float32 | 4 | 0.50 | screen | scalar | 0.005236 | 0.000669 | 7.83x | -87.23% |
| 256x256 | float32 | 4 | 0.50 | screen | sse42 | 0.005236 | 0.000179 | 29.25x | -96.58% |
| 256x256 | float32 | 4 | 0.50 | screen | avx2 | 0.005236 | 0.000182 | 28.84x | -96.53% |
| 256x256 | float32 | 4 | 0.50 | dodge | scalar | 0.005545 | 0.000833 | 6.65x | -84.97% |
| 256x256 | float32 | 4 | 0.50 | dodge | sse42 | 0.005545 | 0.000242 | 22.91x | -95.64% |
| 256x256 | float32 | 4 | 0.50 | dodge | avx2 | 0.005545 | 0.000189 | 29.32x | -96.59% |
| 256x256 | float32 | 4 | 0.50 | addition | scalar | 0.006072 | 0.001358 | 4.47x | -77.64% |
| 256x256 | float32 | 4 | 0.50 | addition | sse42 | 0.006072 | 0.000189 | 32.15x | -96.89% |
| 256x256 | float32 | 4 | 0.50 | addition | avx2 | 0.006072 | 0.000193 | 31.54x | -96.83% |
| 256x256 | float32 | 4 | 0.50 | darken_only | scalar | 0.005460 | 0.000911 | 6.00x | -83.32% |
| 256x256 | float32 | 4 | 0.50 | darken_only | sse42 | 0.005460 | 0.000174 | 31.43x | -96.82% |
| 256x256 | float32 | 4 | 0.50 | darken_only | avx2 | 0.005460 | 0.000187 | 29.15x | -96.57% |
| 256x256 | float32 | 4 | 0.50 | multiply | scalar | 0.005718 | 0.000650 | 8.80x | -88.63% |
| 256x256 | float32 | 4 | 0.50 | multiply | sse42 | 0.005718 | 0.000172 | 33.16x | -96.98% |
| 256x256 | float32 | 4 | 0.50 | multiply | avx2 | 0.005718 | 0.000194 | 29.48x | -96.61% |
| 256x256 | float32 | 4 | 0.50 | hard_light | scalar | 0.007159 | 0.001851 | 3.87x | -74.14% |
| 256x256 | float32 | 4 | 0.50 | hard_light | sse42 | 0.007159 | 0.000225 | 31.80x | -96.85% |
| 256x256 | float32 | 4 | 0.50 | hard_light | avx2 | 0.007159 | 0.000188 | 38.03x | -97.37% |
| 256x256 | float32 | 4 | 0.50 | difference | scalar | 0.007116 | 0.000657 | 10.83x | -90.77% |
| 256x256 | float32 | 4 | 0.50 | difference | sse42 | 0.007116 | 0.000163 | 43.73x | -97.71% |
| 256x256 | float32 | 4 | 0.50 | difference | avx2 | 0.007116 | 0.000196 | 36.31x | -97.25% |
| 256x256 | float32 | 4 | 0.50 | subtract | scalar | 0.005387 | 0.000843 | 6.39x | -84.35% |
| 256x256 | float32 | 4 | 0.50 | subtract | sse42 | 0.005387 | 0.000187 | 28.74x | -96.52% |
| 256x256 | float32 | 4 | 0.50 | subtract | avx2 | 0.005387 | 0.000188 | 28.66x | -96.51% |
| 256x256 | float32 | 4 | 0.50 | grain_extract | scalar | 0.005355 | 0.001089 | 4.92x | -79.66% |
| 256x256 | float32 | 4 | 0.50 | grain_extract | sse42 | 0.005355 | 0.000175 | 30.69x | -96.74% |
| 256x256 | float32 | 4 | 0.50 | grain_extract | avx2 | 0.005355 | 0.000180 | 29.67x | -96.63% |
| 256x256 | float32 | 4 | 0.50 | grain_merge | scalar | 0.005238 | 0.001067 | 4.91x | -79.62% |
| 256x256 | float32 | 4 | 0.50 | grain_merge | sse42 | 0.005238 | 0.000168 | 31.14x | -96.79% |
| 256x256 | float32 | 4 | 0.50 | grain_merge | avx2 | 0.005238 | 0.000174 | 30.08x | -96.68% |
| 256x256 | float32 | 4 | 0.50 | divide | scalar | 0.005559 | 0.000801 | 6.94x | -85.58% |
| 256x256 | float32 | 4 | 0.50 | divide | sse42 | 0.005559 | 0.000180 | 30.88x | -96.76% |
| 256x256 | float32 | 4 | 0.50 | divide | avx2 | 0.005559 | 0.000182 | 30.54x | -96.73% |
| 256x256 | float32 | 4 | 0.50 | overlay | scalar | 0.006694 | 0.001743 | 3.84x | -73.97% |
| 256x256 | float32 | 4 | 0.50 | overlay | sse42 | 0.006694 | 0.000191 | 35.01x | -97.14% |
| 256x256 | float32 | 4 | 0.50 | overlay | avx2 | 0.006694 | 0.000180 | 37.11x | -97.31% |
| 512x512 | uint8 | 3 | 0.50 | normal | scalar | 0.032529 | 0.006406 | 5.08x | -80.31% |
| 512x512 | uint8 | 3 | 0.50 | normal | sse42 | 0.032529 | 0.002739 | 11.87x | -91.58% |
| 512x512 | uint8 | 3 | 0.50 | normal | avx2 | 0.032529 | 0.002812 | 11.57x | -91.36% |
| 512x512 | uint8 | 3 | 0.00 | normal | scalar | 0.032365 | 0.002500 | 12.95x | -92.28% |
| 512x512 | uint8 | 3 | 0.00 | normal | sse42 | 0.032365 | 0.002463 | 13.14x | -92.39% |
| 512x512 | uint8 | 3 | 0.00 | normal | avx2 | 0.032365 | 0.002465 | 13.13x | -92.39% |
| 512x512 | uint8 | 3 | 1.00 | normal | scalar | 0.031105 | 0.002495 | 12.47x | -91.98% |
| 512x512 | uint8 | 3 | 1.00 | normal | sse42 | 0.031105 | 0.002602 | 11.96x | -91.64% |
| 512x512 | uint8 | 3 | 1.00 | normal | avx2 | 0.031105 | 0.002519 | 12.35x | -91.90% |
| 512x512 | uint8 | 3 | 0.50 | soft_light | scalar | 0.049049 | 0.007300 | 6.72x | -85.12% |
| 512x512 | uint8 | 3 | 0.50 | soft_light | sse42 | 0.049049 | 0.003426 | 14.32x | -93.01% |
| 512x512 | uint8 | 3 | 0.50 | soft_light | avx2 | 0.049049 | 0.002923 | 16.78x | -94.04% |
| 512x512 | uint8 | 3 | 0.00 | soft_light | scalar | 0.044721 | 0.002523 | 17.72x | -94.36% |
| 512x512 | uint8 | 3 | 0.00 | soft_light | sse42 | 0.044721 | 0.002661 | 16.81x | -94.05% |
| 512x512 | uint8 | 3 | 0.00 | soft_light | avx2 | 0.044721 | 0.002501 | 17.88x | -94.41% |
| 512x512 | uint8 | 3 | 1.00 | soft_light | scalar | 0.042222 | 0.007468 | 5.65x | -82.31% |
| 512x512 | uint8 | 3 | 1.00 | soft_light | sse42 | 0.042222 | 0.003163 | 13.35x | -92.51% |
| 512x512 | uint8 | 3 | 1.00 | soft_light | avx2 | 0.042222 | 0.002817 | 14.99x | -93.33% |
| 512x512 | uint8 | 3 | 0.50 | lighten_only | scalar | 0.037859 | 0.007898 | 4.79x | -79.14% |
| 512x512 | uint8 | 3 | 0.50 | lighten_only | sse42 | 0.037859 | 0.003189 | 11.87x | -91.58% |
| 512x512 | uint8 | 3 | 0.50 | lighten_only | avx2 | 0.037859 | 0.002969 | 12.75x | -92.16% |
| 512x512 | uint8 | 3 | 0.00 | lighten_only | scalar | 0.043547 | 0.002659 | 16.38x | -93.89% |
| 512x512 | uint8 | 3 | 0.00 | lighten_only | sse42 | 0.043547 | 0.002652 | 16.42x | -93.91% |
| 512x512 | uint8 | 3 | 0.00 | lighten_only | avx2 | 0.043547 | 0.002530 | 17.21x | -94.19% |
| 512x512 | uint8 | 3 | 1.00 | lighten_only | scalar | 0.035719 | 0.007904 | 4.52x | -77.87% |
| 512x512 | uint8 | 3 | 1.00 | lighten_only | sse42 | 0.035719 | 0.003054 | 11.70x | -91.45% |
| 512x512 | uint8 | 3 | 1.00 | lighten_only | avx2 | 0.035719 | 0.002744 | 13.02x | -92.32% |
| 512x512 | uint8 | 3 | 0.50 | screen | scalar | 0.041262 | 0.007197 | 5.73x | -82.56% |
| 512x512 | uint8 | 3 | 0.50 | screen | sse42 | 0.041262 | 0.003170 | 13.02x | -92.32% |
| 512x512 | uint8 | 3 | 0.50 | screen | avx2 | 0.041262 | 0.002898 | 14.24x | -92.98% |
| 512x512 | uint8 | 3 | 0.00 | screen | scalar | 0.041981 | 0.002535 | 16.56x | -93.96% |
| 512x512 | uint8 | 3 | 0.00 | screen | sse42 | 0.041981 | 0.002532 | 16.58x | -93.97% |
| 512x512 | uint8 | 3 | 0.00 | screen | avx2 | 0.041981 | 0.002885 | 14.55x | -93.13% |
| 512x512 | uint8 | 3 | 1.00 | screen | scalar | 0.036188 | 0.007125 | 5.08x | -80.31% |
| 512x512 | uint8 | 3 | 1.00 | screen | sse42 | 0.036188 | 0.003290 | 11.00x | -90.91% |
| 512x512 | uint8 | 3 | 1.00 | screen | avx2 | 0.036188 | 0.002892 | 12.51x | -92.01% |
| 512x512 | uint8 | 3 | 0.50 | dodge | scalar | 0.039442 | 0.007667 | 5.14x | -80.56% |
| 512x512 | uint8 | 3 | 0.50 | dodge | sse42 | 0.039442 | 0.003526 | 11.18x | -91.06% |
| 512x512 | uint8 | 3 | 0.50 | dodge | avx2 | 0.039442 | 0.003565 | 11.06x | -90.96% |
| 512x512 | uint8 | 3 | 0.00 | dodge | scalar | 0.037772 | 0.002517 | 15.01x | -93.34% |
| 512x512 | uint8 | 3 | 0.00 | dodge | sse42 | 0.037772 | 0.002493 | 15.15x | -93.40% |
| 512x512 | uint8 | 3 | 0.00 | dodge | avx2 | 0.037772 | 0.002480 | 15.23x | -93.43% |
| 512x512 | uint8 | 3 | 1.00 | dodge | scalar | 0.036383 | 0.007272 | 5.00x | -80.01% |
| 512x512 | uint8 | 3 | 1.00 | dodge | sse42 | 0.036383 | 0.003223 | 11.29x | -91.14% |
| 512x512 | uint8 | 3 | 1.00 | dodge | avx2 | 0.036383 | 0.002854 | 12.75x | -92.15% |
| 512x512 | uint8 | 3 | 0.50 | addition | scalar | 0.036252 | 0.010036 | 3.61x | -72.32% |
| 512x512 | uint8 | 3 | 0.50 | addition | sse42 | 0.036252 | 0.003142 | 11.54x | -91.33% |
| 512x512 | uint8 | 3 | 0.50 | addition | avx2 | 0.036252 | 0.002881 | 12.58x | -92.05% |
| 512x512 | uint8 | 3 | 0.00 | addition | scalar | 0.038296 | 0.002557 | 14.98x | -93.32% |
| 512x512 | uint8 | 3 | 0.00 | addition | sse42 | 0.038296 | 0.002573 | 14.89x | -93.28% |
| 512x512 | uint8 | 3 | 0.00 | addition | avx2 | 0.038296 | 0.002731 | 14.02x | -92.87% |
| 512x512 | uint8 | 3 | 1.00 | addition | scalar | 0.037970 | 0.013665 | 2.78x | -64.01% |
| 512x512 | uint8 | 3 | 1.00 | addition | sse42 | 0.037970 | 0.003161 | 12.01x | -91.68% |
| 512x512 | uint8 | 3 | 1.00 | addition | avx2 | 0.037970 | 0.002805 | 13.54x | -92.61% |
| 512x512 | uint8 | 3 | 0.50 | darken_only | scalar | 0.038589 | 0.007867 | 4.91x | -79.61% |
| 512x512 | uint8 | 3 | 0.50 | darken_only | sse42 | 0.038589 | 0.003126 | 12.34x | -91.90% |
| 512x512 | uint8 | 3 | 0.50 | darken_only | avx2 | 0.038589 | 0.002807 | 13.75x | -92.73% |
| 512x512 | uint8 | 3 | 0.00 | darken_only | scalar | 0.036788 | 0.002476 | 14.86x | -93.27% |
| 512x512 | uint8 | 3 | 0.00 | darken_only | sse42 | 0.036788 | 0.002491 | 14.77x | -93.23% |
| 512x512 | uint8 | 3 | 0.00 | darken_only | avx2 | 0.036788 | 0.002502 | 14.71x | -93.20% |
| 512x512 | uint8 | 3 | 1.00 | darken_only | scalar | 0.040669 | 0.008091 | 5.03x | -80.11% |
| 512x512 | uint8 | 3 | 1.00 | darken_only | sse42 | 0.040669 | 0.003341 | 12.17x | -91.78% |
| 512x512 | uint8 | 3 | 1.00 | darken_only | avx2 | 0.040669 | 0.002901 | 14.02x | -92.87% |
| 512x512 | uint8 | 3 | 0.50 | multiply | scalar | 0.039158 | 0.007244 | 5.41x | -81.50% |
| 512x512 | uint8 | 3 | 0.50 | multiply | sse42 | 0.039158 | 0.003220 | 12.16x | -91.78% |
| 512x512 | uint8 | 3 | 0.50 | multiply | avx2 | 0.039158 | 0.002846 | 13.76x | -92.73% |
| 512x512 | uint8 | 3 | 0.00 | multiply | scalar | 0.037589 | 0.002511 | 14.97x | -93.32% |
| 512x512 | uint8 | 3 | 0.00 | multiply | sse42 | 0.037589 | 0.002518 | 14.93x | -93.30% |
| 512x512 | uint8 | 3 | 0.00 | multiply | avx2 | 0.037589 | 0.002487 | 15.11x | -93.38% |
| 512x512 | uint8 | 3 | 1.00 | multiply | scalar | 0.036258 | 0.007306 | 4.96x | -79.85% |
| 512x512 | uint8 | 3 | 1.00 | multiply | sse42 | 0.036258 | 0.003086 | 11.75x | -91.49% |
| 512x512 | uint8 | 3 | 1.00 | multiply | avx2 | 0.036258 | 0.002786 | 13.01x | -92.32% |
| 512x512 | uint8 | 3 | 0.50 | hard_light | scalar | 0.046371 | 0.012216 | 3.80x | -73.66% |
| 512x512 | uint8 | 3 | 0.50 | hard_light | sse42 | 0.046371 | 0.003281 | 14.13x | -92.92% |
| 512x512 | uint8 | 3 | 0.50 | hard_light | avx2 | 0.046371 | 0.002917 | 15.90x | -93.71% |
| 512x512 | uint8 | 3 | 0.00 | hard_light | scalar | 0.048398 | 0.002487 | 19.46x | -94.86% |
| 512x512 | uint8 | 3 | 0.00 | hard_light | sse42 | 0.048398 | 0.002474 | 19.56x | -94.89% |
| 512x512 | uint8 | 3 | 0.00 | hard_light | avx2 | 0.048398 | 0.002482 | 19.50x | -94.87% |
| 512x512 | uint8 | 3 | 1.00 | hard_light | scalar | 0.044786 | 0.012263 | 3.65x | -72.62% |
| 512x512 | uint8 | 3 | 1.00 | hard_light | sse42 | 0.044786 | 0.003228 | 13.88x | -92.79% |
| 512x512 | uint8 | 3 | 1.00 | hard_light | avx2 | 0.044786 | 0.002860 | 15.66x | -93.61% |
| 512x512 | uint8 | 3 | 0.50 | difference | scalar | 0.043277 | 0.006947 | 6.23x | -83.95% |
| 512x512 | uint8 | 3 | 0.50 | difference | sse42 | 0.043277 | 0.003069 | 14.10x | -92.91% |
| 512x512 | uint8 | 3 | 0.50 | difference | avx2 | 0.043277 | 0.002823 | 15.33x | -93.48% |
| 512x512 | uint8 | 3 | 0.00 | difference | scalar | 0.043158 | 0.002487 | 17.35x | -94.24% |
| 512x512 | uint8 | 3 | 0.00 | difference | sse42 | 0.043158 | 0.002572 | 16.78x | -94.04% |
| 512x512 | uint8 | 3 | 0.00 | difference | avx2 | 0.043158 | 0.002478 | 17.42x | -94.26% |
| 512x512 | uint8 | 3 | 1.00 | difference | scalar | 0.044465 | 0.007097 | 6.27x | -84.04% |
| 512x512 | uint8 | 3 | 1.00 | difference | sse42 | 0.044465 | 0.003064 | 14.51x | -93.11% |
| 512x512 | uint8 | 3 | 1.00 | difference | avx2 | 0.044465 | 0.002764 | 16.09x | -93.78% |
| 512x512 | uint8 | 3 | 0.50 | subtract | scalar | 0.035946 | 0.006777 | 5.30x | -81.15% |
| 512x512 | uint8 | 3 | 0.50 | subtract | sse42 | 0.035946 | 0.003167 | 11.35x | -91.19% |
| 512x512 | uint8 | 3 | 0.50 | subtract | avx2 | 0.035946 | 0.002782 | 12.92x | -92.26% |
| 512x512 | uint8 | 3 | 0.00 | subtract | scalar | 0.036983 | 0.002518 | 14.69x | -93.19% |
| 512x512 | uint8 | 3 | 0.00 | subtract | sse42 | 0.036983 | 0.002463 | 15.01x | -93.34% |
| 512x512 | uint8 | 3 | 0.00 | subtract | avx2 | 0.036983 | 0.002484 | 14.89x | -93.28% |
| 512x512 | uint8 | 3 | 1.00 | subtract | scalar | 0.036562 | 0.006835 | 5.35x | -81.31% |
| 512x512 | uint8 | 3 | 1.00 | subtract | sse42 | 0.036562 | 0.003133 | 11.67x | -91.43% |
| 512x512 | uint8 | 3 | 1.00 | subtract | avx2 | 0.036562 | 0.002789 | 13.11x | -92.37% |
| 512x512 | uint8 | 3 | 0.50 | grain_extract | scalar | 0.036522 | 0.008680 | 4.21x | -76.23% |
| 512x512 | uint8 | 3 | 0.50 | grain_extract | sse42 | 0.036522 | 0.003140 | 11.63x | -91.40% |
| 512x512 | uint8 | 3 | 0.50 | grain_extract | avx2 | 0.036522 | 0.002845 | 12.84x | -92.21% |
| 512x512 | uint8 | 3 | 0.00 | grain_extract | scalar | 0.036388 | 0.002515 | 14.47x | -93.09% |
| 512x512 | uint8 | 3 | 0.00 | grain_extract | sse42 | 0.036388 | 0.002517 | 14.46x | -93.08% |
| 512x512 | uint8 | 3 | 0.00 | grain_extract | avx2 | 0.036388 | 0.002611 | 13.94x | -92.83% |
| 512x512 | uint8 | 3 | 1.00 | grain_extract | scalar | 0.036183 | 0.008655 | 4.18x | -76.08% |
| 512x512 | uint8 | 3 | 1.00 | grain_extract | sse42 | 0.036183 | 0.003286 | 11.01x | -90.92% |
| 512x512 | uint8 | 3 | 1.00 | grain_extract | avx2 | 0.036183 | 0.002869 | 12.61x | -92.07% |
| 512x512 | uint8 | 3 | 0.50 | grain_merge | scalar | 0.036480 | 0.008743 | 4.17x | -76.03% |
| 512x512 | uint8 | 3 | 0.50 | grain_merge | sse42 | 0.036480 | 0.003159 | 11.55x | -91.34% |
| 512x512 | uint8 | 3 | 0.50 | grain_merge | avx2 | 0.036480 | 0.002815 | 12.96x | -92.28% |
| 512x512 | uint8 | 3 | 0.00 | grain_merge | scalar | 0.036196 | 0.002545 | 14.22x | -92.97% |
| 512x512 | uint8 | 3 | 0.00 | grain_merge | sse42 | 0.036196 | 0.002515 | 14.39x | -93.05% |
| 512x512 | uint8 | 3 | 0.00 | grain_merge | avx2 | 0.036196 | 0.002473 | 14.64x | -93.17% |
| 512x512 | uint8 | 3 | 1.00 | grain_merge | scalar | 0.036108 | 0.008733 | 4.13x | -75.81% |
| 512x512 | uint8 | 3 | 1.00 | grain_merge | sse42 | 0.036108 | 0.003146 | 11.48x | -91.29% |
| 512x512 | uint8 | 3 | 1.00 | grain_merge | avx2 | 0.036108 | 0.002781 | 12.99x | -92.30% |
| 512x512 | uint8 | 3 | 0.50 | divide | scalar | 0.036733 | 0.007486 | 4.91x | -79.62% |
| 512x512 | uint8 | 3 | 0.50 | divide | sse42 | 0.036733 | 0.003182 | 11.55x | -91.34% |
| 512x512 | uint8 | 3 | 0.50 | divide | avx2 | 0.036733 | 0.002818 | 13.03x | -92.33% |
| 512x512 | uint8 | 3 | 0.00 | divide | scalar | 0.037447 | 0.002502 | 14.97x | -93.32% |
| 512x512 | uint8 | 3 | 0.00 | divide | sse42 | 0.037447 | 0.002483 | 15.08x | -93.37% |
| 512x512 | uint8 | 3 | 0.00 | divide | avx2 | 0.037447 | 0.002493 | 15.02x | -93.34% |
| 512x512 | uint8 | 3 | 1.00 | divide | scalar | 0.037044 | 0.007217 | 5.13x | -80.52% |
| 512x512 | uint8 | 3 | 1.00 | divide | sse42 | 0.037044 | 0.003158 | 11.73x | -91.48% |
| 512x512 | uint8 | 3 | 1.00 | divide | avx2 | 0.037044 | 0.002807 | 13.20x | -92.42% |
| 512x512 | uint8 | 3 | 0.50 | overlay | scalar | 0.042984 | 0.011756 | 3.66x | -72.65% |
| 512x512 | uint8 | 3 | 0.50 | overlay | sse42 | 0.042984 | 0.003192 | 13.47x | -92.57% |
| 512x512 | uint8 | 3 | 0.50 | overlay | avx2 | 0.042984 | 0.002828 | 15.20x | -93.42% |
| 512x512 | uint8 | 3 | 0.00 | overlay | scalar | 0.043079 | 0.002489 | 17.30x | -94.22% |
| 512x512 | uint8 | 3 | 0.00 | overlay | sse42 | 0.043079 | 0.002481 | 17.36x | -94.24% |
| 512x512 | uint8 | 3 | 0.00 | overlay | avx2 | 0.043079 | 0.002479 | 17.38x | -94.25% |
| 512x512 | uint8 | 3 | 1.00 | overlay | scalar | 0.044043 | 0.011972 | 3.68x | -72.82% |
| 512x512 | uint8 | 3 | 1.00 | overlay | sse42 | 0.044043 | 0.003373 | 13.06x | -92.34% |
| 512x512 | uint8 | 3 | 1.00 | overlay | avx2 | 0.044043 | 0.002933 | 15.02x | -93.34% |
| 512x512 | uint8 | 4 | 0.50 | normal | scalar | 0.023745 | 0.005202 | 4.56x | -78.09% |
| 512x512 | uint8 | 4 | 0.50 | normal | sse42 | 0.023745 | 0.000695 | 34.16x | -97.07% |
| 512x512 | uint8 | 4 | 0.50 | normal | avx2 | 0.023745 | 0.000626 | 37.95x | -97.37% |
| 512x512 | uint8 | 4 | 0.00 | normal | scalar | 0.023453 | 0.000049 | 476.37x | -99.79% |
| 512x512 | uint8 | 4 | 0.00 | normal | sse42 | 0.023453 | 0.000052 | 450.93x | -99.78% |
| 512x512 | uint8 | 4 | 0.00 | normal | avx2 | 0.023453 | 0.000045 | 516.78x | -99.81% |
| 512x512 | uint8 | 4 | 1.00 | normal | scalar | 0.023454 | 0.005247 | 4.47x | -77.63% |
| 512x512 | uint8 | 4 | 1.00 | normal | sse42 | 0.023454 | 0.000697 | 33.67x | -97.03% |
| 512x512 | uint8 | 4 | 1.00 | normal | avx2 | 0.023454 | 0.000630 | 37.22x | -97.31% |
| 512x512 | uint8 | 4 | 0.50 | soft_light | scalar | 0.034040 | 0.006527 | 5.21x | -80.82% |
| 512x512 | uint8 | 4 | 0.50 | soft_light | sse42 | 0.034040 | 0.000888 | 38.31x | -97.39% |
| 512x512 | uint8 | 4 | 0.50 | soft_light | avx2 | 0.034040 | 0.000825 | 41.28x | -97.58% |
| 512x512 | uint8 | 4 | 0.00 | soft_light | scalar | 0.033842 | 0.000045 | 748.05x | -99.87% |
| 512x512 | uint8 | 4 | 0.00 | soft_light | sse42 | 0.033842 | 0.000044 | 767.73x | -99.87% |
| 512x512 | uint8 | 4 | 0.00 | soft_light | avx2 | 0.033842 | 0.000044 | 768.29x | -99.87% |
| 512x512 | uint8 | 4 | 1.00 | soft_light | scalar | 0.034055 | 0.006618 | 5.15x | -80.57% |
| 512x512 | uint8 | 4 | 1.00 | soft_light | sse42 | 0.034055 | 0.000887 | 38.41x | -97.40% |
| 512x512 | uint8 | 4 | 1.00 | soft_light | avx2 | 0.034055 | 0.000824 | 41.34x | -97.58% |
| 512x512 | uint8 | 4 | 0.50 | lighten_only | scalar | 0.027618 | 0.006816 | 4.05x | -75.32% |
| 512x512 | uint8 | 4 | 0.50 | lighten_only | sse42 | 0.027618 | 0.000772 | 35.77x | -97.20% |
| 512x512 | uint8 | 4 | 0.50 | lighten_only | avx2 | 0.027618 | 0.000739 | 37.38x | -97.32% |
| 512x512 | uint8 | 4 | 0.00 | lighten_only | scalar | 0.026958 | 0.000053 | 505.80x | -99.80% |
| 512x512 | uint8 | 4 | 0.00 | lighten_only | sse42 | 0.026958 | 0.000052 | 516.31x | -99.81% |
| 512x512 | uint8 | 4 | 0.00 | lighten_only | avx2 | 0.026958 | 0.000056 | 484.14x | -99.79% |
| 512x512 | uint8 | 4 | 1.00 | lighten_only | scalar | 0.030627 | 0.007331 | 4.18x | -76.06% |
| 512x512 | uint8 | 4 | 1.00 | lighten_only | sse42 | 0.030627 | 0.000850 | 36.04x | -97.23% |
| 512x512 | uint8 | 4 | 1.00 | lighten_only | avx2 | 0.030627 | 0.000783 | 39.13x | -97.44% |
| 512x512 | uint8 | 4 | 0.50 | screen | scalar | 0.035865 | 0.006361 | 5.64x | -82.26% |
| 512x512 | uint8 | 4 | 0.50 | screen | sse42 | 0.035865 | 0.000909 | 39.46x | -97.47% |
| 512x512 | uint8 | 4 | 0.50 | screen | avx2 | 0.035865 | 0.000775 | 46.26x | -97.84% |
| 512x512 | uint8 | 4 | 0.00 | screen | scalar | 0.028354 | 0.000046 | 621.58x | -99.84% |
| 512x512 | uint8 | 4 | 0.00 | screen | sse42 | 0.028354 | 0.000056 | 505.59x | -99.80% |
| 512x512 | uint8 | 4 | 0.00 | screen | avx2 | 0.028354 | 0.000045 | 632.38x | -99.84% |
| 512x512 | uint8 | 4 | 1.00 | screen | scalar | 0.027972 | 0.006233 | 4.49x | -77.72% |
| 512x512 | uint8 | 4 | 1.00 | screen | sse42 | 0.027972 | 0.000841 | 33.27x | -96.99% |
| 512x512 | uint8 | 4 | 1.00 | screen | avx2 | 0.027972 | 0.000787 | 35.52x | -97.18% |
| 512x512 | uint8 | 4 | 0.50 | dodge | scalar | 0.027978 | 0.006564 | 4.26x | -76.54% |
| 512x512 | uint8 | 4 | 0.50 | dodge | sse42 | 0.027978 | 0.000949 | 29.48x | -96.61% |
| 512x512 | uint8 | 4 | 0.50 | dodge | avx2 | 0.027978 | 0.000828 | 33.77x | -97.04% |
| 512x512 | uint8 | 4 | 0.00 | dodge | scalar | 0.028653 | 0.000055 | 521.45x | -99.81% |
| 512x512 | uint8 | 4 | 0.00 | dodge | sse42 | 0.028653 | 0.000055 | 525.70x | -99.81% |
| 512x512 | uint8 | 4 | 0.00 | dodge | avx2 | 0.028653 | 0.000066 | 431.93x | -99.77% |
| 512x512 | uint8 | 4 | 1.00 | dodge | scalar | 0.032164 | 0.006602 | 4.87x | -79.47% |
| 512x512 | uint8 | 4 | 1.00 | dodge | sse42 | 0.032164 | 0.000944 | 34.07x | -97.06% |
| 512x512 | uint8 | 4 | 1.00 | dodge | avx2 | 0.032164 | 0.000820 | 39.20x | -97.45% |
| 512x512 | uint8 | 4 | 0.50 | addition | scalar | 0.027064 | 0.007971 | 3.40x | -70.55% |
| 512x512 | uint8 | 4 | 0.50 | addition | sse42 | 0.027064 | 0.001047 | 25.85x | -96.13% |
| 512x512 | uint8 | 4 | 0.50 | addition | avx2 | 0.027064 | 0.000778 | 34.77x | -97.12% |
| 512x512 | uint8 | 4 | 0.00 | addition | scalar | 0.027140 | 0.000049 | 550.01x | -99.82% |
| 512x512 | uint8 | 4 | 0.00 | addition | sse42 | 0.027140 | 0.000047 | 583.17x | -99.83% |
| 512x512 | uint8 | 4 | 0.00 | addition | avx2 | 0.027140 | 0.000046 | 589.86x | -99.83% |
| 512x512 | uint8 | 4 | 1.00 | addition | scalar | 0.027049 | 0.009348 | 2.89x | -65.44% |
| 512x512 | uint8 | 4 | 1.00 | addition | sse42 | 0.027049 | 0.001046 | 25.87x | -96.13% |
| 512x512 | uint8 | 4 | 1.00 | addition | avx2 | 0.027049 | 0.000780 | 34.69x | -97.12% |
| 512x512 | uint8 | 4 | 0.50 | darken_only | scalar | 0.027085 | 0.006858 | 3.95x | -74.68% |
| 512x512 | uint8 | 4 | 0.50 | darken_only | sse42 | 0.027085 | 0.000794 | 34.10x | -97.07% |
| 512x512 | uint8 | 4 | 0.50 | darken_only | avx2 | 0.027085 | 0.000784 | 34.55x | -97.11% |
| 512x512 | uint8 | 4 | 0.00 | darken_only | scalar | 0.032720 | 0.000083 | 396.00x | -99.75% |
| 512x512 | uint8 | 4 | 0.00 | darken_only | sse42 | 0.032720 | 0.000062 | 531.23x | -99.81% |
| 512x512 | uint8 | 4 | 0.00 | darken_only | avx2 | 0.032720 | 0.000054 | 609.34x | -99.84% |
| 512x512 | uint8 | 4 | 1.00 | darken_only | scalar | 0.033020 | 0.007077 | 4.67x | -78.57% |
| 512x512 | uint8 | 4 | 1.00 | darken_only | sse42 | 0.033020 | 0.000794 | 41.60x | -97.60% |
| 512x512 | uint8 | 4 | 1.00 | darken_only | avx2 | 0.033020 | 0.000768 | 43.00x | -97.67% |
| 512x512 | uint8 | 4 | 0.50 | multiply | scalar | 0.027894 | 0.006336 | 4.40x | -77.28% |
| 512x512 | uint8 | 4 | 0.50 | multiply | sse42 | 0.027894 | 0.000853 | 32.69x | -96.94% |
| 512x512 | uint8 | 4 | 0.50 | multiply | avx2 | 0.027894 | 0.000754 | 37.01x | -97.30% |
| 512x512 | uint8 | 4 | 0.00 | multiply | scalar | 0.029920 | 0.000049 | 611.83x | -99.84% |
| 512x512 | uint8 | 4 | 0.00 | multiply | sse42 | 0.029920 | 0.000088 | 340.20x | -99.71% |
| 512x512 | uint8 | 4 | 0.00 | multiply | avx2 | 0.029920 | 0.000050 | 600.03x | -99.83% |
| 512x512 | uint8 | 4 | 1.00 | multiply | scalar | 0.030346 | 0.006455 | 4.70x | -78.73% |
| 512x512 | uint8 | 4 | 1.00 | multiply | sse42 | 0.030346 | 0.000835 | 36.35x | -97.25% |
| 512x512 | uint8 | 4 | 1.00 | multiply | avx2 | 0.030346 | 0.000756 | 40.14x | -97.51% |
| 512x512 | uint8 | 4 | 0.50 | hard_light | scalar | 0.037962 | 0.010628 | 3.57x | -72.00% |
| 512x512 | uint8 | 4 | 0.50 | hard_light | sse42 | 0.037962 | 0.000997 | 38.07x | -97.37% |
| 512x512 | uint8 | 4 | 0.50 | hard_light | avx2 | 0.037962 | 0.000798 | 47.54x | -97.90% |
| 512x512 | uint8 | 4 | 0.00 | hard_light | scalar | 0.036588 | 0.000048 | 767.70x | -99.87% |
| 512x512 | uint8 | 4 | 0.00 | hard_light | sse42 | 0.036588 | 0.000051 | 716.00x | -99.86% |
| 512x512 | uint8 | 4 | 0.00 | hard_light | avx2 | 0.036588 | 0.000045 | 809.16x | -99.88% |
| 512x512 | uint8 | 4 | 1.00 | hard_light | scalar | 0.036749 | 0.010341 | 3.55x | -71.86% |
| 512x512 | uint8 | 4 | 1.00 | hard_light | sse42 | 0.036749 | 0.000950 | 38.67x | -97.41% |
| 512x512 | uint8 | 4 | 1.00 | hard_light | avx2 | 0.036749 | 0.000783 | 46.94x | -97.87% |
| 512x512 | uint8 | 4 | 0.50 | difference | scalar | 0.035225 | 0.006220 | 5.66x | -82.34% |
| 512x512 | uint8 | 4 | 0.50 | difference | sse42 | 0.035225 | 0.000844 | 41.75x | -97.61% |
| 512x512 | uint8 | 4 | 0.50 | difference | avx2 | 0.035225 | 0.000744 | 47.37x | -97.89% |
| 512x512 | uint8 | 4 | 0.00 | difference | scalar | 0.037970 | 0.000044 | 863.58x | -99.88% |
| 512x512 | uint8 | 4 | 0.00 | difference | sse42 | 0.037970 | 0.000044 | 865.95x | -99.88% |
| 512x512 | uint8 | 4 | 0.00 | difference | avx2 | 0.037970 | 0.000073 | 518.39x | -99.81% |
| 512x512 | uint8 | 4 | 1.00 | difference | scalar | 0.035258 | 0.006492 | 5.43x | -81.59% |
| 512x512 | uint8 | 4 | 1.00 | difference | sse42 | 0.035258 | 0.000841 | 41.92x | -97.61% |
| 512x512 | uint8 | 4 | 1.00 | difference | avx2 | 0.035258 | 0.000758 | 46.54x | -97.85% |
| 512x512 | uint8 | 4 | 0.50 | subtract | scalar | 0.033773 | 0.005954 | 5.67x | -82.37% |
| 512x512 | uint8 | 4 | 0.50 | subtract | sse42 | 0.033773 | 0.001065 | 31.71x | -96.85% |
| 512x512 | uint8 | 4 | 0.50 | subtract | avx2 | 0.033773 | 0.000875 | 38.58x | -97.41% |
| 512x512 | uint8 | 4 | 0.00 | subtract | scalar | 0.027810 | 0.000059 | 469.30x | -99.79% |
| 512x512 | uint8 | 4 | 0.00 | subtract | sse42 | 0.027810 | 0.000059 | 472.44x | -99.79% |
| 512x512 | uint8 | 4 | 0.00 | subtract | avx2 | 0.027810 | 0.000084 | 331.23x | -99.70% |
| 512x512 | uint8 | 4 | 1.00 | subtract | scalar | 0.032943 | 0.006019 | 5.47x | -81.73% |
| 512x512 | uint8 | 4 | 1.00 | subtract | sse42 | 0.032943 | 0.001047 | 31.46x | -96.82% |
| 512x512 | uint8 | 4 | 1.00 | subtract | avx2 | 0.032943 | 0.000869 | 37.91x | -97.36% |
| 512x512 | uint8 | 4 | 0.50 | grain_extract | scalar | 0.028985 | 0.007856 | 3.69x | -72.90% |
| 512x512 | uint8 | 4 | 0.50 | grain_extract | sse42 | 0.028985 | 0.000861 | 33.65x | -97.03% |
| 512x512 | uint8 | 4 | 0.50 | grain_extract | avx2 | 0.028985 | 0.000815 | 35.55x | -97.19% |
| 512x512 | uint8 | 4 | 0.00 | grain_extract | scalar | 0.034620 | 0.000063 | 549.21x | -99.82% |
| 512x512 | uint8 | 4 | 0.00 | grain_extract | sse42 | 0.034620 | 0.000046 | 748.99x | -99.87% |
| 512x512 | uint8 | 4 | 0.00 | grain_extract | avx2 | 0.034620 | 0.000048 | 716.20x | -99.86% |
| 512x512 | uint8 | 4 | 1.00 | grain_extract | scalar | 0.027931 | 0.007720 | 3.62x | -72.36% |
| 512x512 | uint8 | 4 | 1.00 | grain_extract | sse42 | 0.027931 | 0.000875 | 31.93x | -96.87% |
| 512x512 | uint8 | 4 | 1.00 | grain_extract | avx2 | 0.027931 | 0.000807 | 34.62x | -97.11% |
| 512x512 | uint8 | 4 | 0.50 | grain_merge | scalar | 0.033088 | 0.007670 | 4.31x | -76.82% |
| 512x512 | uint8 | 4 | 0.50 | grain_merge | sse42 | 0.033088 | 0.000845 | 39.15x | -97.45% |
| 512x512 | uint8 | 4 | 0.50 | grain_merge | avx2 | 0.033088 | 0.000803 | 41.20x | -97.57% |
| 512x512 | uint8 | 4 | 0.00 | grain_merge | scalar | 0.028199 | 0.000045 | 631.75x | -99.84% |
| 512x512 | uint8 | 4 | 0.00 | grain_merge | sse42 | 0.028199 | 0.000058 | 483.28x | -99.79% |
| 512x512 | uint8 | 4 | 0.00 | grain_merge | avx2 | 0.028199 | 0.000044 | 634.38x | -99.84% |
| 512x512 | uint8 | 4 | 1.00 | grain_merge | scalar | 0.027789 | 0.007571 | 3.67x | -72.76% |
| 512x512 | uint8 | 4 | 1.00 | grain_merge | sse42 | 0.027789 | 0.000834 | 33.32x | -97.00% |
| 512x512 | uint8 | 4 | 1.00 | grain_merge | avx2 | 0.027789 | 0.000790 | 35.16x | -97.16% |
| 512x512 | uint8 | 4 | 0.50 | divide | scalar | 0.028336 | 0.006427 | 4.41x | -77.32% |
| 512x512 | uint8 | 4 | 0.50 | divide | sse42 | 0.028336 | 0.000861 | 32.91x | -96.96% |
| 512x512 | uint8 | 4 | 0.50 | divide | avx2 | 0.028336 | 0.000786 | 36.06x | -97.23% |
| 512x512 | uint8 | 4 | 0.00 | divide | scalar | 0.028811 | 0.000055 | 524.65x | -99.81% |
| 512x512 | uint8 | 4 | 0.00 | divide | sse42 | 0.028811 | 0.000063 | 457.90x | -99.78% |
| 512x512 | uint8 | 4 | 0.00 | divide | avx2 | 0.028811 | 0.000044 | 656.18x | -99.85% |
| 512x512 | uint8 | 4 | 1.00 | divide | scalar | 0.029632 | 0.006640 | 4.46x | -77.59% |
| 512x512 | uint8 | 4 | 1.00 | divide | sse42 | 0.029632 | 0.000894 | 33.14x | -96.98% |
| 512x512 | uint8 | 4 | 1.00 | divide | avx2 | 0.029632 | 0.000797 | 37.17x | -97.31% |
| 512x512 | uint8 | 4 | 0.50 | overlay | scalar | 0.035896 | 0.010404 | 3.45x | -71.02% |
| 512x512 | uint8 | 4 | 0.50 | overlay | sse42 | 0.035896 | 0.000903 | 39.74x | -97.48% |
| 512x512 | uint8 | 4 | 0.50 | overlay | avx2 | 0.035896 | 0.000803 | 44.72x | -97.76% |
| 512x512 | uint8 | 4 | 0.00 | overlay | scalar | 0.034562 | 0.000048 | 715.24x | -99.86% |
| 512x512 | uint8 | 4 | 0.00 | overlay | sse42 | 0.034562 | 0.000054 | 639.35x | -99.84% |
| 512x512 | uint8 | 4 | 0.00 | overlay | avx2 | 0.034562 | 0.000048 | 724.51x | -99.86% |
| 512x512 | uint8 | 4 | 1.00 | overlay | scalar | 0.034522 | 0.010361 | 3.33x | -69.99% |
| 512x512 | uint8 | 4 | 1.00 | overlay | sse42 | 0.034522 | 0.000931 | 37.07x | -97.30% |
| 512x512 | uint8 | 4 | 1.00 | overlay | avx2 | 0.034522 | 0.000804 | 42.95x | -97.67% |
| 512x512 | float32 | 3 | 0.50 | normal | scalar | 0.029636 | 0.002367 | 12.52x | -92.01% |
| 512x512 | float32 | 3 | 0.50 | normal | sse42 | 0.029636 | 0.000926 | 31.99x | -96.87% |
| 512x512 | float32 | 3 | 0.50 | normal | avx2 | 0.029636 | 0.000616 | 48.14x | -97.92% |
| 512x512 | float32 | 3 | 0.00 | normal | scalar | 0.031087 | 0.001073 | 28.97x | -96.55% |
| 512x512 | float32 | 3 | 0.00 | normal | sse42 | 0.031087 | 0.000797 | 39.00x | -97.44% |
| 512x512 | float32 | 3 | 0.00 | normal | avx2 | 0.031087 | 0.001083 | 28.70x | -96.52% |
| 512x512 | float32 | 3 | 1.00 | normal | scalar | 0.036363 | 0.000714 | 50.95x | -98.04% |
| 512x512 | float32 | 3 | 1.00 | normal | sse42 | 0.036363 | 0.000531 | 68.53x | -98.54% |
| 512x512 | float32 | 3 | 1.00 | normal | avx2 | 0.036363 | 0.000591 | 61.55x | -98.38% |
| 512x512 | float32 | 3 | 0.50 | soft_light | scalar | 0.047936 | 0.003135 | 15.29x | -93.46% |
| 512x512 | float32 | 3 | 0.50 | soft_light | sse42 | 0.047936 | 0.000654 | 73.27x | -98.64% |
| 512x512 | float32 | 3 | 0.50 | soft_light | avx2 | 0.047936 | 0.000522 | 91.91x | -98.91% |
| 512x512 | float32 | 3 | 0.00 | soft_light | scalar | 0.044895 | 0.000825 | 54.41x | -98.16% |
| 512x512 | float32 | 3 | 0.00 | soft_light | sse42 | 0.044895 | 0.000371 | 121.11x | -99.17% |
| 512x512 | float32 | 3 | 0.00 | soft_light | avx2 | 0.044895 | 0.000371 | 121.16x | -99.17% |
| 512x512 | float32 | 3 | 1.00 | soft_light | scalar | 0.047523 | 0.003003 | 15.82x | -93.68% |
| 512x512 | float32 | 3 | 1.00 | soft_light | sse42 | 0.047523 | 0.000582 | 81.65x | -98.78% |
| 512x512 | float32 | 3 | 1.00 | soft_light | avx2 | 0.047523 | 0.000500 | 95.00x | -98.95% |
| 512x512 | float32 | 3 | 0.50 | lighten_only | scalar | 0.034085 | 0.003333 | 10.23x | -90.22% |
| 512x512 | float32 | 3 | 0.50 | lighten_only | sse42 | 0.034085 | 0.000607 | 56.14x | -98.22% |
| 512x512 | float32 | 3 | 0.50 | lighten_only | avx2 | 0.034085 | 0.000534 | 63.86x | -98.43% |
| 512x512 | float32 | 3 | 0.00 | lighten_only | scalar | 0.036531 | 0.000765 | 47.76x | -97.91% |
| 512x512 | float32 | 3 | 0.00 | lighten_only | sse42 | 0.036531 | 0.000503 | 72.68x | -98.62% |
| 512x512 | float32 | 3 | 0.00 | lighten_only | avx2 | 0.036531 | 0.000392 | 93.22x | -98.93% |
| 512x512 | float32 | 3 | 1.00 | lighten_only | scalar | 0.037231 | 0.003540 | 10.52x | -90.49% |
| 512x512 | float32 | 3 | 1.00 | lighten_only | sse42 | 0.037231 | 0.000649 | 57.39x | -98.26% |
| 512x512 | float32 | 3 | 1.00 | lighten_only | avx2 | 0.037231 | 0.000723 | 51.52x | -98.06% |
| 512x512 | float32 | 3 | 0.50 | screen | scalar | 0.045132 | 0.003194 | 14.13x | -92.92% |
| 512x512 | float32 | 3 | 0.50 | screen | sse42 | 0.045132 | 0.000745 | 60.55x | -98.35% |
| 512x512 | float32 | 3 | 0.50 | screen | avx2 | 0.045132 | 0.000599 | 75.39x | -98.67% |
| 512x512 | float32 | 3 | 0.00 | screen | scalar | 0.040082 | 0.000671 | 59.76x | -98.33% |
| 512x512 | float32 | 3 | 0.00 | screen | sse42 | 0.040082 | 0.000579 | 69.20x | -98.56% |
| 512x512 | float32 | 3 | 0.00 | screen | avx2 | 0.040082 | 0.000447 | 89.61x | -98.88% |
| 512x512 | float32 | 3 | 1.00 | screen | scalar | 0.036561 | 0.002715 | 13.47x | -92.57% |
| 512x512 | float32 | 3 | 1.00 | screen | sse42 | 0.036561 | 0.000556 | 65.79x | -98.48% |
| 512x512 | float32 | 3 | 1.00 | screen | avx2 | 0.036561 | 0.000475 | 77.04x | -98.70% |
| 512x512 | float32 | 3 | 0.50 | dodge | scalar | 0.034943 | 0.002936 | 11.90x | -91.60% |
| 512x512 | float32 | 3 | 0.50 | dodge | sse42 | 0.034943 | 0.000538 | 64.90x | -98.46% |
| 512x512 | float32 | 3 | 0.50 | dodge | avx2 | 0.034943 | 0.000541 | 64.61x | -98.45% |
| 512x512 | float32 | 3 | 0.00 | dodge | scalar | 0.034853 | 0.000674 | 51.71x | -98.07% |
| 512x512 | float32 | 3 | 0.00 | dodge | sse42 | 0.034853 | 0.000367 | 95.09x | -98.95% |
| 512x512 | float32 | 3 | 0.00 | dodge | avx2 | 0.034853 | 0.000390 | 89.48x | -98.88% |
| 512x512 | float32 | 3 | 1.00 | dodge | scalar | 0.037254 | 0.002882 | 12.93x | -92.27% |
| 512x512 | float32 | 3 | 1.00 | dodge | sse42 | 0.037254 | 0.000581 | 64.12x | -98.44% |
| 512x512 | float32 | 3 | 1.00 | dodge | avx2 | 0.037254 | 0.000584 | 63.79x | -98.43% |
| 512x512 | float32 | 3 | 0.50 | addition | scalar | 0.035951 | 0.006628 | 5.42x | -81.56% |
| 512x512 | float32 | 3 | 0.50 | addition | sse42 | 0.035951 | 0.000579 | 62.09x | -98.39% |
| 512x512 | float32 | 3 | 0.50 | addition | avx2 | 0.035951 | 0.000556 | 64.68x | -98.45% |
| 512x512 | float32 | 3 | 0.00 | addition | scalar | 0.034454 | 0.000816 | 42.20x | -97.63% |
| 512x512 | float32 | 3 | 0.00 | addition | sse42 | 0.034454 | 0.000462 | 74.57x | -98.66% |
| 512x512 | float32 | 3 | 0.00 | addition | avx2 | 0.034454 | 0.000357 | 96.45x | -98.96% |
| 512x512 | float32 | 3 | 1.00 | addition | scalar | 0.033838 | 0.009849 | 3.44x | -70.89% |
| 512x512 | float32 | 3 | 1.00 | addition | sse42 | 0.033838 | 0.000636 | 53.18x | -98.12% |
| 512x512 | float32 | 3 | 1.00 | addition | avx2 | 0.033838 | 0.000486 | 69.69x | -98.57% |
| 512x512 | float32 | 3 | 0.50 | darken_only | scalar | 0.036435 | 0.003423 | 10.64x | -90.61% |
| 512x512 | float32 | 3 | 0.50 | darken_only | sse42 | 0.036435 | 0.000571 | 63.85x | -98.43% |
| 512x512 | float32 | 3 | 0.50 | darken_only | avx2 | 0.036435 | 0.000792 | 46.00x | -97.83% |
| 512x512 | float32 | 3 | 0.00 | darken_only | scalar | 0.039181 | 0.000786 | 49.85x | -97.99% |
| 512x512 | float32 | 3 | 0.00 | darken_only | sse42 | 0.039181 | 0.000503 | 77.88x | -98.72% |
| 512x512 | float32 | 3 | 0.00 | darken_only | avx2 | 0.039181 | 0.000470 | 83.39x | -98.80% |
| 512x512 | float32 | 3 | 1.00 | darken_only | scalar | 0.039675 | 0.003594 | 11.04x | -90.94% |
| 512x512 | float32 | 3 | 1.00 | darken_only | sse42 | 0.039675 | 0.000757 | 52.41x | -98.09% |
| 512x512 | float32 | 3 | 1.00 | darken_only | avx2 | 0.039675 | 0.000717 | 55.36x | -98.19% |
| 512x512 | float32 | 3 | 0.50 | multiply | scalar | 0.043701 | 0.002715 | 16.10x | -93.79% |
| 512x512 | float32 | 3 | 0.50 | multiply | sse42 | 0.043701 | 0.000853 | 51.21x | -98.05% |
| 512x512 | float32 | 3 | 0.50 | multiply | avx2 | 0.043701 | 0.000691 | 63.22x | -98.42% |
| 512x512 | float32 | 3 | 0.00 | multiply | scalar | 0.040699 | 0.001370 | 29.70x | -96.63% |
| 512x512 | float32 | 3 | 0.00 | multiply | sse42 | 0.040699 | 0.000699 | 58.23x | -98.28% |
| 512x512 | float32 | 3 | 0.00 | multiply | avx2 | 0.040699 | 0.000629 | 64.71x | -98.45% |
| 512x512 | float32 | 3 | 1.00 | multiply | scalar | 0.038357 | 0.002792 | 13.74x | -92.72% |
| 512x512 | float32 | 3 | 1.00 | multiply | sse42 | 0.038357 | 0.000782 | 49.04x | -97.96% |
| 512x512 | float32 | 3 | 1.00 | multiply | avx2 | 0.038357 | 0.000663 | 57.85x | -98.27% |
| 512x512 | float32 | 3 | 0.50 | hard_light | scalar | 0.053099 | 0.007897 | 6.72x | -85.13% |
| 512x512 | float32 | 3 | 0.50 | hard_light | sse42 | 0.053099 | 0.001718 | 30.90x | -96.76% |
| 512x512 | float32 | 3 | 0.50 | hard_light | avx2 | 0.053099 | 0.000791 | 67.10x | -98.51% |
| 512x512 | float32 | 3 | 0.00 | hard_light | scalar | 0.054133 | 0.001356 | 39.91x | -97.49% |
| 512x512 | float32 | 3 | 0.00 | hard_light | sse42 | 0.054133 | 0.000577 | 93.75x | -98.93% |
| 512x512 | float32 | 3 | 0.00 | hard_light | avx2 | 0.054133 | 0.000693 | 78.11x | -98.72% |
| 512x512 | float32 | 3 | 1.00 | hard_light | scalar | 0.052803 | 0.007986 | 6.61x | -84.88% |
| 512x512 | float32 | 3 | 1.00 | hard_light | sse42 | 0.052803 | 0.000971 | 54.39x | -98.16% |
| 512x512 | float32 | 3 | 1.00 | hard_light | avx2 | 0.052803 | 0.000696 | 75.87x | -98.68% |
| 512x512 | float32 | 3 | 0.50 | difference | scalar | 0.049789 | 0.003223 | 15.45x | -93.53% |
| 512x512 | float32 | 3 | 0.50 | difference | sse42 | 0.049789 | 0.000895 | 55.64x | -98.20% |
| 512x512 | float32 | 3 | 0.50 | difference | avx2 | 0.049789 | 0.000667 | 74.61x | -98.66% |
| 512x512 | float32 | 3 | 0.00 | difference | scalar | 0.048538 | 0.000710 | 68.39x | -98.54% |
| 512x512 | float32 | 3 | 0.00 | difference | sse42 | 0.048538 | 0.000444 | 109.42x | -99.09% |
| 512x512 | float32 | 3 | 0.00 | difference | avx2 | 0.048538 | 0.000462 | 105.09x | -99.05% |
| 512x512 | float32 | 3 | 1.00 | difference | scalar | 0.042908 | 0.002621 | 16.37x | -93.89% |
| 512x512 | float32 | 3 | 1.00 | difference | sse42 | 0.042908 | 0.000648 | 66.23x | -98.49% |
| 512x512 | float32 | 3 | 1.00 | difference | avx2 | 0.042908 | 0.000492 | 87.21x | -98.85% |
| 512x512 | float32 | 3 | 0.50 | subtract | scalar | 0.036473 | 0.003255 | 11.21x | -91.08% |
| 512x512 | float32 | 3 | 0.50 | subtract | sse42 | 0.036473 | 0.000595 | 61.25x | -98.37% |
| 512x512 | float32 | 3 | 0.50 | subtract | avx2 | 0.036473 | 0.000564 | 64.63x | -98.45% |
| 512x512 | float32 | 3 | 0.00 | subtract | scalar | 0.035796 | 0.000725 | 49.37x | -97.97% |
| 512x512 | float32 | 3 | 0.00 | subtract | sse42 | 0.035796 | 0.000446 | 80.25x | -98.75% |
| 512x512 | float32 | 3 | 0.00 | subtract | avx2 | 0.035796 | 0.000630 | 56.83x | -98.24% |
| 512x512 | float32 | 3 | 1.00 | subtract | scalar | 0.036637 | 0.003848 | 9.52x | -89.50% |
| 512x512 | float32 | 3 | 1.00 | subtract | sse42 | 0.036637 | 0.001029 | 35.60x | -97.19% |
| 512x512 | float32 | 3 | 1.00 | subtract | avx2 | 0.036637 | 0.000844 | 43.43x | -97.70% |
| 512x512 | float32 | 3 | 0.50 | grain_extract | scalar | 0.039107 | 0.004715 | 8.29x | -87.94% |
| 512x512 | float32 | 3 | 0.50 | grain_extract | sse42 | 0.039107 | 0.000756 | 51.70x | -98.07% |
| 512x512 | float32 | 3 | 0.50 | grain_extract | avx2 | 0.039107 | 0.000594 | 65.84x | -98.48% |
| 512x512 | float32 | 3 | 0.00 | grain_extract | scalar | 0.043381 | 0.000802 | 54.12x | -98.15% |
| 512x512 | float32 | 3 | 0.00 | grain_extract | sse42 | 0.043381 | 0.000521 | 83.19x | -98.80% |
| 512x512 | float32 | 3 | 0.00 | grain_extract | avx2 | 0.043381 | 0.000422 | 102.74x | -99.03% |
| 512x512 | float32 | 3 | 1.00 | grain_extract | scalar | 0.036042 | 0.004422 | 8.15x | -87.73% |
| 512x512 | float32 | 3 | 1.00 | grain_extract | sse42 | 0.036042 | 0.000639 | 56.43x | -98.23% |
| 512x512 | float32 | 3 | 1.00 | grain_extract | avx2 | 0.036042 | 0.000500 | 72.04x | -98.61% |
| 512x512 | float32 | 3 | 0.50 | grain_merge | scalar | 0.035073 | 0.004403 | 7.97x | -87.45% |
| 512x512 | float32 | 3 | 0.50 | grain_merge | sse42 | 0.035073 | 0.000597 | 58.77x | -98.30% |
| 512x512 | float32 | 3 | 0.50 | grain_merge | avx2 | 0.035073 | 0.000602 | 58.30x | -98.28% |
| 512x512 | float32 | 3 | 0.00 | grain_merge | scalar | 0.035051 | 0.000654 | 53.57x | -98.13% |
| 512x512 | float32 | 3 | 0.00 | grain_merge | sse42 | 0.035051 | 0.000364 | 96.37x | -98.96% |
| 512x512 | float32 | 3 | 0.00 | grain_merge | avx2 | 0.035051 | 0.000377 | 93.00x | -98.92% |
| 512x512 | float32 | 3 | 1.00 | grain_merge | scalar | 0.035542 | 0.004582 | 7.76x | -87.11% |
| 512x512 | float32 | 3 | 1.00 | grain_merge | sse42 | 0.035542 | 0.000696 | 51.09x | -98.04% |
| 512x512 | float32 | 3 | 1.00 | grain_merge | avx2 | 0.035542 | 0.000559 | 63.59x | -98.43% |
| 512x512 | float32 | 3 | 0.50 | divide | scalar | 0.036829 | 0.002974 | 12.38x | -91.92% |
| 512x512 | float32 | 3 | 0.50 | divide | sse42 | 0.036829 | 0.001021 | 36.07x | -97.23% |
| 512x512 | float32 | 3 | 0.50 | divide | avx2 | 0.036829 | 0.000475 | 77.51x | -98.71% |
| 512x512 | float32 | 3 | 0.00 | divide | scalar | 0.035238 | 0.000883 | 39.92x | -97.49% |
| 512x512 | float32 | 3 | 0.00 | divide | sse42 | 0.035238 | 0.000419 | 84.10x | -98.81% |
| 512x512 | float32 | 3 | 0.00 | divide | avx2 | 0.035238 | 0.000370 | 95.30x | -98.95% |
| 512x512 | float32 | 3 | 1.00 | divide | scalar | 0.034906 | 0.002944 | 11.86x | -91.57% |
| 512x512 | float32 | 3 | 1.00 | divide | sse42 | 0.034906 | 0.000537 | 64.98x | -98.46% |
| 512x512 | float32 | 3 | 1.00 | divide | avx2 | 0.034906 | 0.000496 | 70.38x | -98.58% |
| 512x512 | float32 | 3 | 0.50 | overlay | scalar | 0.041415 | 0.006894 | 6.01x | -83.35% |
| 512x512 | float32 | 3 | 0.50 | overlay | sse42 | 0.041415 | 0.000572 | 72.46x | -98.62% |
| 512x512 | float32 | 3 | 0.50 | overlay | avx2 | 0.041415 | 0.000637 | 65.03x | -98.46% |
| 512x512 | float32 | 3 | 0.00 | overlay | scalar | 0.042382 | 0.000667 | 63.55x | -98.43% |
| 512x512 | float32 | 3 | 0.00 | overlay | sse42 | 0.042382 | 0.000449 | 94.37x | -98.94% |
| 512x512 | float32 | 3 | 0.00 | overlay | avx2 | 0.042382 | 0.000395 | 107.35x | -99.07% |
| 512x512 | float32 | 3 | 1.00 | overlay | scalar | 0.041686 | 0.006919 | 6.02x | -83.40% |
| 512x512 | float32 | 3 | 1.00 | overlay | sse42 | 0.041686 | 0.000643 | 64.78x | -98.46% |
| 512x512 | float32 | 3 | 1.00 | overlay | avx2 | 0.041686 | 0.000553 | 75.39x | -98.67% |
| 512x512 | float32 | 4 | 0.50 | normal | scalar | 0.021964 | 0.002692 | 8.16x | -87.74% |
| 512x512 | float32 | 4 | 0.50 | normal | sse42 | 0.021964 | 0.000801 | 27.41x | -96.35% |
| 512x512 | float32 | 4 | 0.50 | normal | avx2 | 0.021964 | 0.000689 | 31.88x | -96.86% |
| 512x512 | float32 | 4 | 0.00 | normal | scalar | 0.021605 | 0.000557 | 38.81x | -97.42% |
| 512x512 | float32 | 4 | 0.00 | normal | sse42 | 0.021605 | 0.000288 | 74.93x | -98.67% |
| 512x512 | float32 | 4 | 0.00 | normal | avx2 | 0.021605 | 0.000300 | 71.90x | -98.61% |
| 512x512 | float32 | 4 | 1.00 | normal | scalar | 0.022705 | 0.002738 | 8.29x | -87.94% |
| 512x512 | float32 | 4 | 1.00 | normal | sse42 | 0.022705 | 0.000749 | 30.32x | -96.70% |
| 512x512 | float32 | 4 | 1.00 | normal | avx2 | 0.022705 | 0.000738 | 30.75x | -96.75% |
| 512x512 | float32 | 4 | 0.50 | soft_light | scalar | 0.032561 | 0.003347 | 9.73x | -89.72% |
| 512x512 | float32 | 4 | 0.50 | soft_light | sse42 | 0.032561 | 0.000798 | 40.80x | -97.55% |
| 512x512 | float32 | 4 | 0.50 | soft_light | avx2 | 0.032561 | 0.000809 | 40.23x | -97.51% |
| 512x512 | float32 | 4 | 0.00 | soft_light | scalar | 0.032386 | 0.000616 | 52.58x | -98.10% |
| 512x512 | float32 | 4 | 0.00 | soft_light | sse42 | 0.032386 | 0.000280 | 115.74x | -99.14% |
| 512x512 | float32 | 4 | 0.00 | soft_light | avx2 | 0.032386 | 0.000465 | 69.67x | -98.56% |
| 512x512 | float32 | 4 | 1.00 | soft_light | scalar | 0.033570 | 0.003039 | 11.05x | -90.95% |
| 512x512 | float32 | 4 | 1.00 | soft_light | sse42 | 0.033570 | 0.000818 | 41.02x | -97.56% |
| 512x512 | float32 | 4 | 1.00 | soft_light | avx2 | 0.033570 | 0.000775 | 43.30x | -97.69% |
| 512x512 | float32 | 4 | 0.50 | lighten_only | scalar | 0.025674 | 0.003749 | 6.85x | -85.40% |
| 512x512 | float32 | 4 | 0.50 | lighten_only | sse42 | 0.025674 | 0.000757 | 33.91x | -97.05% |
| 512x512 | float32 | 4 | 0.50 | lighten_only | avx2 | 0.025674 | 0.000781 | 32.87x | -96.96% |
| 512x512 | float32 | 4 | 0.00 | lighten_only | scalar | 0.025554 | 0.000560 | 45.63x | -97.81% |
| 512x512 | float32 | 4 | 0.00 | lighten_only | sse42 | 0.025554 | 0.000264 | 96.73x | -98.97% |
| 512x512 | float32 | 4 | 0.00 | lighten_only | avx2 | 0.025554 | 0.000339 | 75.37x | -98.67% |
| 512x512 | float32 | 4 | 1.00 | lighten_only | scalar | 0.025370 | 0.003504 | 7.24x | -86.19% |
| 512x512 | float32 | 4 | 1.00 | lighten_only | sse42 | 0.025370 | 0.000756 | 33.58x | -97.02% |
| 512x512 | float32 | 4 | 1.00 | lighten_only | avx2 | 0.025370 | 0.000768 | 33.03x | -96.97% |
| 512x512 | float32 | 4 | 0.50 | screen | scalar | 0.026660 | 0.003025 | 8.81x | -88.65% |
| 512x512 | float32 | 4 | 0.50 | screen | sse42 | 0.026660 | 0.000818 | 32.59x | -96.93% |
| 512x512 | float32 | 4 | 0.50 | screen | avx2 | 0.026660 | 0.000828 | 32.20x | -96.89% |
| 512x512 | float32 | 4 | 0.00 | screen | scalar | 0.026701 | 0.000545 | 49.01x | -97.96% |
| 512x512 | float32 | 4 | 0.00 | screen | sse42 | 0.026701 | 0.000306 | 87.32x | -98.85% |
| 512x512 | float32 | 4 | 0.00 | screen | avx2 | 0.026701 | 0.000270 | 98.98x | -98.99% |
| 512x512 | float32 | 4 | 1.00 | screen | scalar | 0.026448 | 0.002945 | 8.98x | -88.86% |
| 512x512 | float32 | 4 | 1.00 | screen | sse42 | 0.026448 | 0.000831 | 31.84x | -96.86% |
| 512x512 | float32 | 4 | 1.00 | screen | avx2 | 0.026448 | 0.000935 | 28.29x | -96.47% |
| 512x512 | float32 | 4 | 0.50 | dodge | scalar | 0.025941 | 0.003192 | 8.13x | -87.70% |
| 512x512 | float32 | 4 | 0.50 | dodge | sse42 | 0.025941 | 0.000965 | 26.87x | -96.28% |
| 512x512 | float32 | 4 | 0.50 | dodge | avx2 | 0.025941 | 0.000809 | 32.06x | -96.88% |
| 512x512 | float32 | 4 | 0.00 | dodge | scalar | 0.026250 | 0.000536 | 48.99x | -97.96% |
| 512x512 | float32 | 4 | 0.00 | dodge | sse42 | 0.026250 | 0.000344 | 76.22x | -98.69% |
| 512x512 | float32 | 4 | 0.00 | dodge | avx2 | 0.026250 | 0.000278 | 94.57x | -98.94% |
| 512x512 | float32 | 4 | 1.00 | dodge | scalar | 0.026583 | 0.003528 | 7.53x | -86.73% |
| 512x512 | float32 | 4 | 1.00 | dodge | sse42 | 0.026583 | 0.000962 | 27.64x | -96.38% |
| 512x512 | float32 | 4 | 1.00 | dodge | avx2 | 0.026583 | 0.000780 | 34.07x | -97.07% |
| 512x512 | float32 | 4 | 0.50 | addition | scalar | 0.025528 | 0.005588 | 4.57x | -78.11% |
| 512x512 | float32 | 4 | 0.50 | addition | sse42 | 0.025528 | 0.001059 | 24.11x | -95.85% |
| 512x512 | float32 | 4 | 0.50 | addition | avx2 | 0.025528 | 0.000945 | 27.01x | -96.30% |
| 512x512 | float32 | 4 | 0.00 | addition | scalar | 0.025426 | 0.000525 | 48.42x | -97.93% |
| 512x512 | float32 | 4 | 0.00 | addition | sse42 | 0.025426 | 0.000282 | 90.32x | -98.89% |
| 512x512 | float32 | 4 | 0.00 | addition | avx2 | 0.025426 | 0.000634 | 40.11x | -97.51% |
| 512x512 | float32 | 4 | 1.00 | addition | scalar | 0.025302 | 0.007214 | 3.51x | -71.49% |
| 512x512 | float32 | 4 | 1.00 | addition | sse42 | 0.025302 | 0.000871 | 29.04x | -96.56% |
| 512x512 | float32 | 4 | 1.00 | addition | avx2 | 0.025302 | 0.000819 | 30.89x | -96.76% |
| 512x512 | float32 | 4 | 0.50 | darken_only | scalar | 0.026224 | 0.003672 | 7.14x | -86.00% |
| 512x512 | float32 | 4 | 0.50 | darken_only | sse42 | 0.026224 | 0.000743 | 35.30x | -97.17% |
| 512x512 | float32 | 4 | 0.50 | darken_only | avx2 | 0.026224 | 0.000839 | 31.26x | -96.80% |
| 512x512 | float32 | 4 | 0.00 | darken_only | scalar | 0.026766 | 0.000630 | 42.48x | -97.65% |
| 512x512 | float32 | 4 | 0.00 | darken_only | sse42 | 0.026766 | 0.000294 | 91.16x | -98.90% |
| 512x512 | float32 | 4 | 0.00 | darken_only | avx2 | 0.026766 | 0.000308 | 86.86x | -98.85% |
| 512x512 | float32 | 4 | 1.00 | darken_only | scalar | 0.026007 | 0.003348 | 7.77x | -87.13% |
| 512x512 | float32 | 4 | 1.00 | darken_only | sse42 | 0.026007 | 0.000909 | 28.61x | -96.50% |
| 512x512 | float32 | 4 | 1.00 | darken_only | avx2 | 0.026007 | 0.000817 | 31.82x | -96.86% |
| 512x512 | float32 | 4 | 0.50 | multiply | scalar | 0.027552 | 0.003122 | 8.83x | -88.67% |
| 512x512 | float32 | 4 | 0.50 | multiply | sse42 | 0.027552 | 0.000844 | 32.63x | -96.94% |
| 512x512 | float32 | 4 | 0.50 | multiply | avx2 | 0.027552 | 0.000856 | 32.20x | -96.89% |
| 512x512 | float32 | 4 | 0.00 | multiply | scalar | 0.028558 | 0.000590 | 48.42x | -97.93% |
| 512x512 | float32 | 4 | 0.00 | multiply | sse42 | 0.028558 | 0.000382 | 74.83x | -98.66% |
| 512x512 | float32 | 4 | 0.00 | multiply | avx2 | 0.028558 | 0.000306 | 93.26x | -98.93% |
| 512x512 | float32 | 4 | 1.00 | multiply | scalar | 0.032243 | 0.004246 | 7.59x | -86.83% |
| 512x512 | float32 | 4 | 1.00 | multiply | sse42 | 0.032243 | 0.001103 | 29.23x | -96.58% |
| 512x512 | float32 | 4 | 1.00 | multiply | avx2 | 0.032243 | 0.000941 | 34.28x | -97.08% |
| 512x512 | float32 | 4 | 0.50 | hard_light | scalar | 0.036698 | 0.007803 | 4.70x | -78.74% |
| 512x512 | float32 | 4 | 0.50 | hard_light | sse42 | 0.036698 | 0.000971 | 37.81x | -97.36% |
| 512x512 | float32 | 4 | 0.50 | hard_light | avx2 | 0.036698 | 0.001150 | 31.91x | -96.87% |
| 512x512 | float32 | 4 | 0.00 | hard_light | scalar | 0.036374 | 0.000606 | 60.01x | -98.33% |
| 512x512 | float32 | 4 | 0.00 | hard_light | sse42 | 0.036374 | 0.000285 | 127.77x | -99.22% |
| 512x512 | float32 | 4 | 0.00 | hard_light | avx2 | 0.036374 | 0.000384 | 94.67x | -98.94% |
| 512x512 | float32 | 4 | 1.00 | hard_light | scalar | 0.036135 | 0.007678 | 4.71x | -78.75% |
| 512x512 | float32 | 4 | 1.00 | hard_light | sse42 | 0.036135 | 0.000925 | 39.06x | -97.44% |
| 512x512 | float32 | 4 | 1.00 | hard_light | avx2 | 0.036135 | 0.000923 | 39.13x | -97.44% |
| 512x512 | float32 | 4 | 0.50 | difference | scalar | 0.033423 | 0.003019 | 11.07x | -90.97% |
| 512x512 | float32 | 4 | 0.50 | difference | sse42 | 0.033423 | 0.000960 | 34.80x | -97.13% |
| 512x512 | float32 | 4 | 0.50 | difference | avx2 | 0.033423 | 0.000935 | 35.76x | -97.20% |
| 512x512 | float32 | 4 | 0.00 | difference | scalar | 0.033351 | 0.000560 | 59.53x | -98.32% |
| 512x512 | float32 | 4 | 0.00 | difference | sse42 | 0.033351 | 0.000312 | 107.05x | -99.07% |
| 512x512 | float32 | 4 | 0.00 | difference | avx2 | 0.033351 | 0.000321 | 103.85x | -99.04% |
| 512x512 | float32 | 4 | 1.00 | difference | scalar | 0.033759 | 0.002906 | 11.62x | -91.39% |
| 512x512 | float32 | 4 | 1.00 | difference | sse42 | 0.033759 | 0.000860 | 39.26x | -97.45% |
| 512x512 | float32 | 4 | 1.00 | difference | avx2 | 0.033759 | 0.000843 | 40.06x | -97.50% |
| 512x512 | float32 | 4 | 0.50 | subtract | scalar | 0.025841 | 0.003811 | 6.78x | -85.25% |
| 512x512 | float32 | 4 | 0.50 | subtract | sse42 | 0.025841 | 0.000886 | 29.16x | -96.57% |
| 512x512 | float32 | 4 | 0.50 | subtract | avx2 | 0.025841 | 0.000819 | 31.54x | -96.83% |
| 512x512 | float32 | 4 | 0.00 | subtract | scalar | 0.025174 | 0.000657 | 38.30x | -97.39% |
| 512x512 | float32 | 4 | 0.00 | subtract | sse42 | 0.025174 | 0.000290 | 86.68x | -98.85% |
| 512x512 | float32 | 4 | 0.00 | subtract | avx2 | 0.025174 | 0.000280 | 89.79x | -98.89% |
| 512x512 | float32 | 4 | 1.00 | subtract | scalar | 0.025820 | 0.003715 | 6.95x | -85.61% |
| 512x512 | float32 | 4 | 1.00 | subtract | sse42 | 0.025820 | 0.000824 | 31.33x | -96.81% |
| 512x512 | float32 | 4 | 1.00 | subtract | avx2 | 0.025820 | 0.000788 | 32.76x | -96.95% |
| 512x512 | float32 | 4 | 0.50 | grain_extract | scalar | 0.026379 | 0.004427 | 5.96x | -83.22% |
| 512x512 | float32 | 4 | 0.50 | grain_extract | sse42 | 0.026379 | 0.000804 | 32.81x | -96.95% |
| 512x512 | float32 | 4 | 0.50 | grain_extract | avx2 | 0.026379 | 0.000796 | 33.14x | -96.98% |
| 512x512 | float32 | 4 | 0.00 | grain_extract | scalar | 0.025827 | 0.000635 | 40.64x | -97.54% |
| 512x512 | float32 | 4 | 0.00 | grain_extract | sse42 | 0.025827 | 0.000268 | 96.32x | -98.96% |
| 512x512 | float32 | 4 | 0.00 | grain_extract | avx2 | 0.025827 | 0.000297 | 86.98x | -98.85% |
| 512x512 | float32 | 4 | 1.00 | grain_extract | scalar | 0.025750 | 0.004435 | 5.81x | -82.78% |
| 512x512 | float32 | 4 | 1.00 | grain_extract | sse42 | 0.025750 | 0.000837 | 30.77x | -96.75% |
| 512x512 | float32 | 4 | 1.00 | grain_extract | avx2 | 0.025750 | 0.000826 | 31.16x | -96.79% |
| 512x512 | float32 | 4 | 0.50 | grain_merge | scalar | 0.026609 | 0.004507 | 5.90x | -83.06% |
| 512x512 | float32 | 4 | 0.50 | grain_merge | sse42 | 0.026609 | 0.000845 | 31.48x | -96.82% |
| 512x512 | float32 | 4 | 0.50 | grain_merge | avx2 | 0.026609 | 0.001360 | 19.56x | -94.89% |
| 512x512 | float32 | 4 | 0.00 | grain_merge | scalar | 0.025526 | 0.000654 | 39.02x | -97.44% |
| 512x512 | float32 | 4 | 0.00 | grain_merge | sse42 | 0.025526 | 0.000286 | 89.29x | -98.88% |
| 512x512 | float32 | 4 | 0.00 | grain_merge | avx2 | 0.025526 | 0.000275 | 92.74x | -98.92% |
| 512x512 | float32 | 4 | 1.00 | grain_merge | scalar | 0.026052 | 0.004448 | 5.86x | -82.93% |
| 512x512 | float32 | 4 | 1.00 | grain_merge | sse42 | 0.026052 | 0.000935 | 27.87x | -96.41% |
| 512x512 | float32 | 4 | 1.00 | grain_merge | avx2 | 0.026052 | 0.000757 | 34.39x | -97.09% |
| 512x512 | float32 | 4 | 0.50 | divide | scalar | 0.026574 | 0.003417 | 7.78x | -87.14% |
| 512x512 | float32 | 4 | 0.50 | divide | sse42 | 0.026574 | 0.000837 | 31.76x | -96.85% |
| 512x512 | float32 | 4 | 0.50 | divide | avx2 | 0.026574 | 0.000781 | 34.01x | -97.06% |
| 512x512 | float32 | 4 | 0.00 | divide | scalar | 0.026413 | 0.000533 | 49.52x | -97.98% |
| 512x512 | float32 | 4 | 0.00 | divide | sse42 | 0.026413 | 0.000294 | 89.73x | -98.89% |
| 512x512 | float32 | 4 | 0.00 | divide | avx2 | 0.026413 | 0.000279 | 94.57x | -98.94% |
| 512x512 | float32 | 4 | 1.00 | divide | scalar | 0.026254 | 0.003185 | 8.24x | -87.87% |
| 512x512 | float32 | 4 | 1.00 | divide | sse42 | 0.026254 | 0.000857 | 30.65x | -96.74% |
| 512x512 | float32 | 4 | 1.00 | divide | avx2 | 0.026254 | 0.000924 | 28.42x | -96.48% |
| 512x512 | float32 | 4 | 0.50 | overlay | scalar | 0.033545 | 0.007163 | 4.68x | -78.65% |
| 512x512 | float32 | 4 | 0.50 | overlay | sse42 | 0.033545 | 0.000877 | 38.24x | -97.38% |
| 512x512 | float32 | 4 | 0.50 | overlay | avx2 | 0.033545 | 0.000784 | 42.77x | -97.66% |
| 512x512 | float32 | 4 | 0.00 | overlay | scalar | 0.033006 | 0.000606 | 54.43x | -98.16% |
| 512x512 | float32 | 4 | 0.00 | overlay | sse42 | 0.033006 | 0.000435 | 75.95x | -98.68% |
| 512x512 | float32 | 4 | 0.00 | overlay | avx2 | 0.033006 | 0.000281 | 117.45x | -99.15% |
| 512x512 | float32 | 4 | 1.00 | overlay | scalar | 0.033288 | 0.007227 | 4.61x | -78.29% |
| 512x512 | float32 | 4 | 1.00 | overlay | sse42 | 0.033288 | 0.000821 | 40.53x | -97.53% |
| 512x512 | float32 | 4 | 1.00 | overlay | avx2 | 0.033288 | 0.000809 | 41.16x | -97.57% |
| 1024x1024 | uint8 | 3 | 0.50 | normal | scalar | 0.100895 | 0.025941 | 3.89x | -74.29% |
| 1024x1024 | uint8 | 3 | 0.50 | normal | sse42 | 0.100895 | 0.011313 | 8.92x | -88.79% |
| 1024x1024 | uint8 | 3 | 0.50 | normal | avx2 | 0.100895 | 0.011015 | 9.16x | -89.08% |
| 1024x1024 | uint8 | 3 | 0.50 | soft_light | scalar | 0.134943 | 0.028514 | 4.73x | -78.87% |
| 1024x1024 | uint8 | 3 | 0.50 | soft_light | sse42 | 0.134943 | 0.012639 | 10.68x | -90.63% |
| 1024x1024 | uint8 | 3 | 0.50 | soft_light | avx2 | 0.134943 | 0.011191 | 12.06x | -91.71% |
| 1024x1024 | uint8 | 3 | 0.50 | lighten_only | scalar | 0.106948 | 0.031183 | 3.43x | -70.84% |
| 1024x1024 | uint8 | 3 | 0.50 | lighten_only | sse42 | 0.106948 | 0.012454 | 8.59x | -88.35% |
| 1024x1024 | uint8 | 3 | 0.50 | lighten_only | avx2 | 0.106948 | 0.011166 | 9.58x | -89.56% |
| 1024x1024 | uint8 | 3 | 0.50 | screen | scalar | 0.120596 | 0.028797 | 4.19x | -76.12% |
| 1024x1024 | uint8 | 3 | 0.50 | screen | sse42 | 0.120596 | 0.012766 | 9.45x | -89.41% |
| 1024x1024 | uint8 | 3 | 0.50 | screen | avx2 | 0.120596 | 0.011323 | 10.65x | -90.61% |
| 1024x1024 | uint8 | 3 | 0.50 | dodge | scalar | 0.119609 | 0.029297 | 4.08x | -75.51% |
| 1024x1024 | uint8 | 3 | 0.50 | dodge | sse42 | 0.119609 | 0.012742 | 9.39x | -89.35% |
| 1024x1024 | uint8 | 3 | 0.50 | dodge | avx2 | 0.119609 | 0.011526 | 10.38x | -90.36% |
| 1024x1024 | uint8 | 3 | 0.50 | addition | scalar | 0.114068 | 0.044153 | 2.58x | -61.29% |
| 1024x1024 | uint8 | 3 | 0.50 | addition | sse42 | 0.114068 | 0.018078 | 6.31x | -84.15% |
| 1024x1024 | uint8 | 3 | 0.50 | addition | avx2 | 0.114068 | 0.013282 | 8.59x | -88.36% |
| 1024x1024 | uint8 | 3 | 0.50 | darken_only | scalar | 0.121125 | 0.032461 | 3.73x | -73.20% |
| 1024x1024 | uint8 | 3 | 0.50 | darken_only | sse42 | 0.121125 | 0.012626 | 9.59x | -89.58% |
| 1024x1024 | uint8 | 3 | 0.50 | darken_only | avx2 | 0.121125 | 0.011556 | 10.48x | -90.46% |
| 1024x1024 | uint8 | 3 | 0.50 | multiply | scalar | 0.118224 | 0.029421 | 4.02x | -75.11% |
| 1024x1024 | uint8 | 3 | 0.50 | multiply | sse42 | 0.118224 | 0.013164 | 8.98x | -88.86% |
| 1024x1024 | uint8 | 3 | 0.50 | multiply | avx2 | 0.118224 | 0.012601 | 9.38x | -89.34% |
| 1024x1024 | uint8 | 3 | 0.50 | hard_light | scalar | 0.171536 | 0.052825 | 3.25x | -69.20% |
| 1024x1024 | uint8 | 3 | 0.50 | hard_light | sse42 | 0.171536 | 0.013300 | 12.90x | -92.25% |
| 1024x1024 | uint8 | 3 | 0.50 | hard_light | avx2 | 0.171536 | 0.012897 | 13.30x | -92.48% |
| 1024x1024 | uint8 | 3 | 0.50 | difference | scalar | 0.136830 | 0.027840 | 4.91x | -79.65% |
| 1024x1024 | uint8 | 3 | 0.50 | difference | sse42 | 0.136830 | 0.012209 | 11.21x | -91.08% |
| 1024x1024 | uint8 | 3 | 0.50 | difference | avx2 | 0.136830 | 0.011299 | 12.11x | -91.74% |
| 1024x1024 | uint8 | 3 | 0.50 | subtract | scalar | 0.107973 | 0.026498 | 4.07x | -75.46% |
| 1024x1024 | uint8 | 3 | 0.50 | subtract | sse42 | 0.107973 | 0.012590 | 8.58x | -88.34% |
| 1024x1024 | uint8 | 3 | 0.50 | subtract | avx2 | 0.107973 | 0.011188 | 9.65x | -89.64% |
| 1024x1024 | uint8 | 3 | 0.50 | grain_extract | scalar | 0.107538 | 0.034598 | 3.11x | -67.83% |
| 1024x1024 | uint8 | 3 | 0.50 | grain_extract | sse42 | 0.107538 | 0.012513 | 8.59x | -88.36% |
| 1024x1024 | uint8 | 3 | 0.50 | grain_extract | avx2 | 0.107538 | 0.011168 | 9.63x | -89.61% |
| 1024x1024 | uint8 | 3 | 0.50 | grain_merge | scalar | 0.110132 | 0.036160 | 3.05x | -67.17% |
| 1024x1024 | uint8 | 3 | 0.50 | grain_merge | sse42 | 0.110132 | 0.012568 | 8.76x | -88.59%