https://github.com/samhaswon/normal_grain_merge

A combined version of the blend modes normal and grain merge.
https://github.com/samhaswon/normal_grain_merge

Last synced: 5 months ago
JSON representation

A combined version of the blend modes normal and grain merge.

Host: GitHub
URL: https://github.com/samhaswon/normal_grain_merge
Owner: samhaswon
License: mit
Created: 2025-09-07T00:35:21.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-10-15T01:02:39.000Z (8 months ago)
Last Synced: 2025-11-21T16:16:05.108Z (6 months ago)
Language: C
Size: 3.16 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # normal_grain_merge

This implements a combined version of the blend modes normal and grain merge.

Grain merge is performed on *s* and *t* with the result normal-merged with *b*.

Subscripts indicate channels, with alpha (α) channels broadcast to three channels.

$$

(((\mathrm{t_{rgb}} + \mathrm{s_{rgb}} - 0.5) * \mathrm{t_\alpha} + \mathrm{t_{rgb}} * (1 - \mathrm{t_\alpha})) * (1 - 0.3) + \mathrm{s_{rgb}} * 0.3) * \mathrm{t_\alpha} + \mathrm{b_{rgb}} * (1 - \mathrm{t_\alpha})

$$

## Installation

```shell

pip install normal-grain-merge

```

## Usage

```py

import numpy as np

from normal_grain_merge import normal_grain_merge, KernelKind

# Example arrays

base = np.zeros((100, 100, 3), dtype=np.uint8)

texture = np.zeros((100, 100, 3), dtype=np.uint8)

skin = np.zeros((100, 100, 4), dtype=np.uint8)

im_alpha = np.zeros((100, 100), dtype=np.uint8)

result_scalar = normal_grain_merge(base, texture, skin, im_alpha, KernelKind.KERNEL_SCALAR.value)

print(result_scalar.shape, result_scalar.dtype)

```

There are three kernels implemented in this module as defined in `KernelKind`.

- `KERNEL_AUTO`: Automatically chooses the kernel, preferring AVX2

- `KERNEL_SCALAR`: Portable scalar implementation.

- `KERNEL_SSE42`: SSE4.2 intrinsics kernel. Likely better on AMD CPUs.

- `KERNEL_AVX2`: AVX2 intrinsics kernel. Likely better on Intel CPUs.

### Parameters

All input matrices should have the same height and width.

#### `base`

RGB or RGBA, dropping the alpha channel if it exists.

The base image for application.

#### `texture`

RGB or RGBA, applying the alpha if it exists.

This is the texture to be applied.

#### `skin`

RGBA, the segmented portion of base to texture.

The "skin" of the object the texture is to be applied to.

#### `im_alpha`

The alpha of parameter `skin`. 

This is mostly a holdover from the Python implementation to deal with NumPy.

#### `kernel`

One of `KernelKind`.

## Performance

The entire reason for me writing this was NumPy being slow when this operation is in the hot path.

So, I decided to write a SIMD version that does the type casting outside NumPy with only the intermediate values being in FP32.

How much of a speedup is this? All numbers are from a Ryzen 7 4800H running Ubuntu 24.04 and Python 3.12.3.

| Method/Kernel     | Average Iteration Time (RGB) | Average Iteration Time (RGBA) |

|-------------------|------------------------------|-------------------------------|

| C scalar kernel   | 0.016109s                    | 0.016679s                     |

| C SSE4.2 kernel   | 0.002446s                    | 0.002478s                     |

| C AVX2 kernel     | 0.002336s                    | 0.002520s                     |

| NumPy version     | 0.160623s                    | 0.258044s                     |

| Old NumPy version | 0.248160s                    | 0.232046s                     |

| Method Comparison  | Speedup (RGB) | Speedup (RGBA) |

|--------------------|---------------|----------------|

| NumPy -> scalar    | 89.9709%      | 93.5363%       |

| NumPy -> SSE4.2    | 98.4769%      | 99.0397%       |

| NumPy -> AVX2      | 98.5454%      | 99.0235%       |

| Old np -> SSE4.2   | 99.0142%      | 98.9321%       |

| Old np -> AVX2     | 99.0585%      | 98.9141%       |

| C scalar -> SSE4.2 | 84.8135%      | 85.1437%       |

| C scalar -> AVX2   | 85.4964%      | 84.8923%       |

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/samhaswon/normal_grain_merge

Awesome Lists containing this project

README