https://github.com/aybe/firconvolution
Faster FIR filter convolution for Unity.
https://github.com/aybe/firconvolution
convolution filter unity3d
Last synced: 9 months ago
JSON representation
Faster FIR filter convolution for Unity.
- Host: GitHub
- URL: https://github.com/aybe/firconvolution
- Owner: aybe
- License: mit
- Created: 2023-05-14T14:09:09.000Z (about 3 years ago)
- Default Branch: develop
- Last Pushed: 2023-09-23T21:49:34.000Z (over 2 years ago)
- Last Synced: 2025-07-30T02:59:48.949Z (11 months ago)
- Topics: convolution, filter, unity3d
- Language: C#
- Homepage:
- Size: 11.8 MB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG
- License: LICENSE
Awesome Lists containing this project
README
# FIRConvolution
Faster FIR filter convolution for Unity.

## Description
This project is a collection of 12 algorithms for FIR filter convolution, with a focus on half-band filtering.
Most of the algorithms are vectorized, leveraging SIMD extensions through the Unity Burst compiler.
Check the sample scene to see them perform filtering from within `MonoBehaviour.OnAudioFilterRead`.
## Installation
**Consumer audience:**
Add the package to your Unity project using the following Git URL:
`https://github.com/aybe/FIRConvolution.git?path=Assets/FIRConvolution`
**Developer audience:**
The project uses symbolic links, for Windows, you can set them up [this way](https://stackoverflow.com/a/59761201).
(this is to alleviate the deficiencies of Unity [sample package authoring](https://docs.unity3d.com/Manual/cus-samples.html))
Included, an [MSTest project setup](Projects) that tests directly against the Unity code.
(this neat trick allows to test code in a much friendlier test environment)
## Performance
### Motivation
Implementing a fast FIR filter convolution purely using managed code ended up being an impossible task, because as soon as one tries to use a high-quality filter with many taps; the audio DSP CPU usage immediately ranges between 30% to 50%. This, no matter how hard you'd apply various optimizations in order to try speed up the processing time.
### Profiling environment
The candidate is a high-quality half-band FIR filter, with 461 taps and for 1 channel @ 44100 Hz.
Trying to mimick the typical use with 10 measurements and 1000 iterations for 1024 samples.
Both managed and native implementations are tested to give the audience an overall comparison.
### Profiling results
The results are surprising, some algorithms perform better than some others ought to be worse.
Overall, an algorithm is fit to use without noticeable impact when it spends less than 10 seconds.
**Legend:**
- `[Scalar|Vector]`
- `Scalar` : 1 sample at a time
- `Vector` : 4 samples at a time
- `[Full|Half]`
- `Full` : full-band filter
- `Half` : half-band filter
- `[Full|Half]` (only for half-band filter)
- `Full` : iterating the taps loop fully, i.e. using 50% of the taps
- in a half-band filter, half of the taps are zeros and thus can be ignored
- `Half` : iterating the taps loop first half, i.e. using 25% of the taps
- in addition to above, leveraging taps symmetry to halve the iterations
- `[Inner|Outer|OuterInner]`
- `Inner` : taps loop vectorized
- `Outer` : samples loop vectorized
- `OuterInner` : both loops vectorized
**Managed, alphabetically:**

**Managed, fastest to slowest:**

**Native, alphabetically:**

**Native, fastest to slowest:**

**Versus, alphabetically:**

**Versus, fastest to slowest:**

### Conclusions
On the managed side:
- scalar half-band
- almost twice as fast as full-band variant, this is totally expected
- half loop variant gain is marginal although halved tap iterations
- vectorized
- outer loop variant is the slowest in most cases, it's the opposite for native
- SIMD isn't used while in vanilla .NET it is when inspecting generated code
On the native side:
- scalar half-band
- little gain, likely due to short input and overhead
- vectorized, outer/inner variant
- full-band, performs worse than the other variants
- half-band, performs better but the gain is marginal
Overall, considering native implementations:
- full-band: outer variant is the fastest
- outer/inner variant ought to be the fastest but really isn't in the end
- half-band: half loop, outer/inner variant is the fastest
- when a substantially longer/trickier algorithm ends up being faster
## Notes
Porting it for vanilla .NET should be easy, it already works in the MSTest project:
1. extend the pattern of [shim Unity types](Projects/FIRConvolution/Fakes) to `float2`, `float4` and `math.dot`
2. use the aligned memory allocator [for vanilla .NET](Assets/FIRConvolution/Runtime/MemoryAllocatorNet.cs) instead of the Unity one
(this is in order to avoid the gray area of using Unity assemblies outside Unity)
## Credits
https://fiiir.com (filter design)
https://thewolfsound.com/fir-filter-with-simd (filter vectorization)
https://github.com/Rabadash8820/UnityAssemblies (Unity bindings)