{"id":23255343,"url":"https://github.com/michelerenzullo/fastboxblur","last_synced_at":"2025-08-20T12:32:58.112Z","repository":{"id":94421767,"uuid":"542343847","full_name":"michelerenzullo/FastBoxBlur","owner":"michelerenzullo","description":"Fast Box Blur using a sliding accumulator and with reflected borders policy","archived":false,"fork":false,"pushed_at":"2024-11-06T13:44:54.000Z","size":8804,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-11-06T14:41:01.166Z","etag":null,"topics":["accumulator","blur","box","cpp","image-processing","opencv","padding"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/michelerenzullo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-28T00:37:23.000Z","updated_at":"2024-10-30T04:32:53.000Z","dependencies_parsed_at":"2023-03-24T23:19:20.375Z","dependency_job_id":null,"html_url":"https://github.com/michelerenzullo/FastBoxBlur","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michelerenzullo%2FFastBoxBlur","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michelerenzullo%2FFastBoxBlur/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michelerenzullo%2FFastBoxBlur/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michelerenzullo%2FFastBoxBlur/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/michelerenzullo","download_url":"https://codeload.github.com/michelerenzullo/FastBoxBlur/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230424184,"owners_count":18223545,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["accumulator","blur","box","cpp","image-processing","opencv","padding"],"created_at":"2024-12-19T11:20:12.156Z","updated_at":"2024-12-19T11:20:13.204Z","avatar_url":"https://github.com/michelerenzullo.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n  \n\n# Fast Box Blur - reflection padding\n\n  \n\n  \n\nHeader only C++ implementation of a Fast Box Blur in linear time. It's designed to be a full portable, light and faster replacement of `cv::blur` , thus is accurate on image boundaries, emulating a reflection (mirrored) padding without increasing the memory for it.\n\nThe main code is based on [FastGaussianBlur](https://github.com/bfraboni/FastGaussianBlur) of @bfraboni and on a blog post by Ivan Kutskir: [blog](http://blog.ivank.net/fastest-gaussian-blur.html). Which refers to a presentation by Wojciech Jarosz: [slides](http://elynxsdk.free.fr/ext-docs/Blur/Fast_box_blur.pdf). The code uses STB_IMAGE and STB_IMAGE_WRITE by stb for image manipulation: [stb github](https://github.com/nothings/stb).\n\n  \n\n  \n\n## Algorithm\n\n  \n\n  \n  \n\nA 2D box blur is a separable convolution, hence it is most of the time performed using first an horizontal 1D box blur pass and then a vertical 1D box blur pass. Usually the process of N box blur passes should alternate between these horizontal and vertical passes.\n\n  \n\nHowever thanks to box blur properties the horizontal and vertical passes can be performed in any order without changing the result.\n\n  \n\nHence for performance purposes I came up with the following algorithm:\n\n  \n\n1. apply N times horizontal box blur (horizontal passes)\n\n  \n\n2. flip the image buffer (transposition)\n\n  \n\n3. apply N times horizontal box blur (vertical passes)\n\n  \n\n4. flip the image buffer (transposition)\n\n  \n\n  \n\nSteps 1. and 3. are performed with the `horizontal_blur` function, which is a fast 1D box blur pass with a sliding accumulator.\n\n  \n\nSteps 2. and 4. are performed with the `flip_block` function, which is a fast image buffer transposition, processed per block such that it better preserves cache coherency.\n\n  \n\n  \n\n**Note:** This is the main difference with @bfraboni repository. The fast box blur algorithm is accurate on image boundaries, it emulates a reflection (mirrored) padding therefore radius has to be \u003c of width in order to read correctly inside the image buffer, I set a maximum.\n\n  \n\nExample of maximum radius\n\nkernel = 9, radius = 4, width(or height) = 5\n\ne d c b | a b c d e | d c b a\n\nThe reflection padding is inside the image buffer!\n\n ### WIP - Experimental\n \nIf defined `DOUBLE_ACCUMULATOR` an alternative function is available \n```c++\n template \u003ctypename  T, int  C\u003e \n void  horizontal_blur_kernel_reflect_double(const  T  *in, T  *out, const  int  w, const  int  h, const  int  ksize)\n```\n[@TJCoding](https://github.com/TJCoding) original idea was to avoid the re-iteration of the function for each pass doing 2 ( or ideally N ) accumulations at once.  \n\nThis has been achieved using:\n1) A \"rough and easy\" circular buffer implementation, that stores the 1st pass output sums in a deque\n2) A Lookup table used to get the correct index at bounds. \n\nBoth implementations have their limits, the first is unefficient, popping and pushing at every iteration, the latter might be good but the modulo operator with `% lut.size()` slow down the improvement, a possible solution might be to create a bigger lookup table and avoid the modulo at all but the memory usage will increase.\nSo in conclusion it's slower than the original algorithm but I left it for documentation purposes, maybe we can optimize further or have new ideas, I'm open to it.\n  \n  \n  \n\nFor further details please refer to:\n\n  \n\n- http://blog.ivank.net/fastest-gaussian-blur.html\n\n  \n\n- https://www.peterkovesi.com/papers/FastGaussianSmoothing.pdf\n\n  \n\n  \n\n## Implementation\n\n  \n\n  \n\nThe implementation is defined in the `fast_box_blur.h` header that contains the fastest templated cache coherent version I could make.\n\n  \n\nThe main exposed function is defined as:\n\n  \n\n```c++\n\n  \n\ntemplate\u003ctypename  T\u003e\n\nvoid  fastboxblur(T  *  in, const  int  w, const  int  h, const  int  channels, const  int  ksize, const  int  passes = 1);\n\n  \n\n```\n\n  \n\nwhere the arguments are:\n\n  \n\n-  `in` is a reference to the source buffer ptr, inplace transformed\n\n  \n\n-  `w` is the image width,\n\n  \n\n-  `h` is the image height,\n\n  \n\n-  `channels` is the image number of channels,\n\n  \n\n-  `ksize` is the desired box car blur,\n\n  \n\n-  `passes` is the number of box car blur passes to perform.\n\n  \n\n  \n\n  \n\nA SIMD vectorized or a GPU version of this algorithm could be significantly faster (but may be painful for the developper for arbitrary channels number / data sizes).\n\n  \n  \n\n  \n  \n\n  \n\n## Usage\n\n  \n\n  \n\nRun the program with the following command:\n\n  \n\n  \n\n`./fastboxblur \u003cinput_filename\u003e \u003coutput_filename\u003e \u003cksize\u003e \u003cpasses = 1\u003e`\n\n  \n\n  \n\n- input_image_filename should be any of [.jpg, .png, .bmp, .tga, .psd, .gif, .hdr, .pic, .pnm].\n\n  \n\n- output_image_filename should be any of [.png, .jpg, .bmp] (unknown extensions will be saved as .png by default).\n\n  \n\n- ksize is the desired box car blur.\n\n  \n\n- passes is an optional argument that controls the number of box blur passes (should be positive). Default is 1.\n\n  \n\n  \n\n## Performance\n\n  \n\nThe algorithm is designed to be a portable, easy, and faster replacement of `cv::blur` , using an i7-10750H, it's around 2x - 3x time faster.\n\nBenchmark with 45 images 3 channels from 1500 x 1000 px to 11400 x 7600 px, box car size = (2 * width - 1)\n\n  \n\n![](data/bench.png)\n\n  \n\n  \n\n## Acknowledgments\n\n  \n\n  \n\nSpecial thanks to @bfraboni for our insightful discussions and his main repository [Fast Gaussian Blur](https://github.com/bfraboni/FastGaussianBlur).\n\n  \n\n  \n\n## Licence\n\n  \n\n  \n\nYou may use, distribute and modify this code under the terms of the MIT license. For further details please refer to : https://mit-license.org/\n\n  \n\n  \n\n## References\n\n  \n\n  \n  \n\n- [Fast O(1) bilateral filtering using trigonometric range kernels](http://bigwww.epfl.ch/chaudhury/Fast%20bilateral%20filtering.pdf)\n\n  \n  \n\n- [Filtering by repeated integration](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.72.4795)\n\n  \n\n- [Fast Filter Spreading and its Applications](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-54.pdf)\n\n  \n  \n\n- [Fast image convolutions](http://elynxsdk.free.fr/ext-docs/Blur/Fast_box_blur.pdf)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichelerenzullo%2Ffastboxblur","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmichelerenzullo%2Ffastboxblur","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichelerenzullo%2Ffastboxblur/lists"}