{"id":24808870,"url":"https://github.com/tgautam03/xfilters","last_synced_at":"2025-10-10T16:07:58.508Z","repository":{"id":275037452,"uuid":"915521612","full_name":"tgautam03/xFilters","owner":"tgautam03","description":"GPU (CUDA) accelerated filters using 2D convolution for high resolution images.","archived":false,"fork":false,"pushed_at":"2025-02-01T02:54:03.000Z","size":60992,"stargazers_count":9,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-08-20T22:54:50.350Z","etag":null,"topics":["2d-convolution","c","cpp","cuda","cuda-programming","gpu-acceleration","gpu-computing","gpu-programming","image-filters","image-processing"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tgautam03.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-01-12T04:08:02.000Z","updated_at":"2025-07-21T19:29:37.000Z","dependencies_parsed_at":"2025-08-20T22:58:46.715Z","dependency_job_id":null,"html_url":"https://github.com/tgautam03/xFilters","commit_stats":null,"previous_names":["tgautam03/xfilters"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tgautam03/xFilters","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tgautam03%2FxFilters","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tgautam03%2FxFilters/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tgautam03%2FxFilters/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tgautam03%2FxFilters/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tgautam03","download_url":"https://codeload.github.com/tgautam03/xFilters/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tgautam03%2FxFilters/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279004679,"owners_count":26083748,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-10T02:00:06.843Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["2d-convolution","c","cpp","cuda","cuda-programming","gpu-acceleration","gpu-computing","gpu-programming","image-filters","image-processing"],"created_at":"2025-01-30T10:27:28.170Z","updated_at":"2025-10-10T16:07:58.480Z","avatar_url":"https://github.com/tgautam03.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# xFilters\n**Convolution** is a popular array operation used in signal processing, digital recording, image/video processing, and computer vision. This repository provides **2D convolution algorithm** written from scratch in **C++ (for CPU)** and **CUDA C++ (for GPU)**, which can be used to apply **filters** to **high resolution** images. \n\n**Tested on NVIDIA RTX 3090 using Ubuntu 24.04.1 LTS with nvidia-driver-560 and CUDA 12.6.**\n\n\u003e Images are first converted to grayscale, and then the filter is applied.\n\n**Table of contents**\n\n0. Naive 2D convolution on a CPU.\n1. Naive 2D convolution on a GPU.\n2. 2D convolution on a GPU using constant memory for filter matrix.\n3. 2D convolution on a GPU using constant memory for filter matrix and tiling for shared memory usage.\n4. Naive 2D convolution on a GPU (using pinned memory).\n5. 2D convolution on a GPU using constant memory for filter matrix (using pinned memory).\n6. 2D convolution on a GPU using constant memory for filter matrix and tiling for shared memory usage (using pinned memory).\n\n## Example Run\n**CPU/GPU Filter**\n1. In the terminal run: `make filters_cpu` or `make filters_gpu`\n2. You will be asked to enter the location of the image. For example, `data/8k.jpg`.\n3. You will be asked to type the filter name. Supported filters are as follows:\n\n    ### Supported Filters\n    #### Sharpen\n    \u003cimg src=\"https://raw.githubusercontent.com/tgautam03/xFilters/refs/heads/master/data/8k.jpg\" width=\"390\" height=\"250\"\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/tgautam03/xFilters/refs/heads/master/data/Sharpen_filtered_img.png\" width=\"390\" height=\"250\"\u003e\n\n    #### High-pass (edge detection)\n    \u003cimg src=\"https://raw.githubusercontent.com/tgautam03/xFilters/refs/heads/master/data/8k.jpg\" width=\"390\" height=\"250\"\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/tgautam03/xFilters/refs/heads/master/data/High-pass_filtered_img.png\" width=\"390\" height=\"250\"\u003e\n\n    #### Low-pass \n    \u003cimg src=\"https://raw.githubusercontent.com/tgautam03/xFilters/refs/heads/master/data/8k.jpg\" width=\"390\" height=\"250\"\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/tgautam03/xFilters/refs/heads/master/data/Low-pass_filtered_img.png\" width=\"390\" height=\"250\"\u003e\n\n    #### Gaussian (image blurring)\n    \u003cimg src=\"https://raw.githubusercontent.com/tgautam03/xFilters/refs/heads/master/data/8k.jpg\" width=\"390\" height=\"250\"\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/tgautam03/xFilters/refs/heads/master/data/Gaussian_filtered_img.png\" width=\"390\" height=\"250\"\u003e\n\n    #### Derivative of Gaussian (edge detection)\n    \u003cimg src=\"https://raw.githubusercontent.com/tgautam03/xFilters/refs/heads/master/data/8k.jpg\" width=\"390\" height=\"250\"\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/tgautam03/xFilters/refs/heads/master/data/d_Gaussian_filtered_img.png\" width=\"390\" height=\"250\"\u003e\n\n## Benchmarks\n\n### Runtime Overview (time in seconds)\n\n||CPU|GPU (Naive)|GPU (Constant Memory)|GPU (Constant Memory + Tiling)|GPU (Pinned Memory)|GPU (Constant + Pinned Memory)|GPU (Constant + Pinned Memory + tiling)|\n|-|-|-|-|-|-|-|-|\n|Allocating Memory|--- | 0.00044032 | 0.000191488 | 0.000313344 | 0.000217088 | 0.000176064 | 0.000154464 |\n|Moving input to Memory|--- | 0.0028009 | 0.00271984 | 0.00283443 | 0.00265677 | 0.00267555 | 0.0026567 |\n|Moving filter to Memory|--- | 8.736e-06 | 0.000128704 | 0.0002504 | 9.632e-06 | 0.000199776 | 0.000105152 |\n|Kernel execution| 0.0607285 | 5.2029e-05 | 5.16403e-05 | 5.53062e-05 | 4.50765e-05 | 4.3735e-05 | 5.37395e-05 |\n|Moving output to Memory| --- | 0.00601299 | 0.00601722 | 0.0065999 | 0.00249299 | 0.00250381 | 0.0024945 |\n|Total| 0.0607285|  0.00931497 | 0.00910889 | 0.0100534 | 0.00542156 | 0.00559894 | 0.00546456 |\n \n\n### Naive CPU\n```bash\nmake 00_cpu_conv2d_benchmark.out \n```\n```\nLoaded image with Width: 2048 and Height: 1328\n\nApplying filter... \nTime for kernel execution (seconds): 0.0607285\n\n--------------------- \nBenchmarking details: \n--------------------- \nFPS (total): 16.4667\nGFLOPS (kernel): 1.2432\n------------------------------------ \n```\n\n### Naive GPU\n```bash\nmake 01_gpu_conv2d_benchmark.out\n```\n```\nLoaded image with Width: 2048 and Height: 1328\n\nAllocating GPU memory... \nTime for GPU memory allocation (seconds): 0.00044032\n\nMoving input to GPU memory... \nTime for input data transfer (seconds): 0.0028009\n\nMoving filter to GPU memory... \nTime for filter data transfer (seconds): 8.736e-06\n\nApplying filter... \nTime for kernel execution (seconds): 5.20294e-05\n\nMoving result to CPU memory... \nTime for output data transfer (seconds): 0.00601299\n\n--------------------- \nBenchmarking details: \n--------------------- \nTime (total): 0.00931497\nFPS (total): 107.354\n\nTime (kernel): 5.20294e-05\nFPS (kernel): 19219.9\nGFLOPS (kernel): 1451.05\n------------------------------------ \n```\n\n### GPU using constant memory\n```bash\nmake 02_gpu_conv2d_constMem_benchmark.out\n```\n```\nLoaded image with Width: 2048 and Height: 1328\n\nAllocating GPU memory... \nTime for GPU memory allocation (seconds): 0.000191488\n\nMoving input to GPU memory... \nTime for input data transfer (seconds): 0.00271984\n\nMoving filter to GPU memory... \nTime for filter data transfer (seconds): 0.000128704\n\nApplying filter... \nTime for kernel execution (seconds): 5.16403e-05\n\nMoving result to CPU memory... \nTime for output data transfer (seconds): 0.00601722\n\n--------------------- \nBenchmarking details: \n--------------------- \nTime (total): 0.00910889\nFPS (total): 109.783\n\nTime (kernel): 5.16403e-05\nFPS (kernel): 19364.7\nGFLOPS (kernel): 1461.99\n------------------------------------ \n```\n\n### GPU using constant memory and tiling\n```bash\nmake 03_gpu_conv2d_tiled_benchmark.out \n```\n```\nLoaded image with Width: 2048 and Height: 1328\n\nAllocating GPU memory... \nTime for GPU memory allocation (seconds): 0.000313344\n\nMoving input to GPU memory... \nTime for input data transfer (seconds): 0.00283443\n\nMoving filter to GPU memory... \nTime for filter data transfer (seconds): 0.0002504\n\nApplying filter... \nTime for kernel execution (seconds): 5.53062e-05\n\nMoving result to CPU memory... \nTime for output data transfer (seconds): 0.0065999\n\n--------------------- \nBenchmarking details: \n--------------------- \nTime (total): 0.0100534\nFPS (total): 99.469\n\nTime (kernel): 5.53062e-05\nFPS (kernel): 18081.1\nGFLOPS (kernel): 1365.08\n------------------------------------ \n```\n\n### Naive GPU (pinned memory)\n```bash\nmake 04_gpu_conv2d_pinnedMem_benchmark.out\n```\n```\nLoaded image with Width: 2048 and Height: 1328\n\nAllocating GPU memory... \nTime for GPU memory allocation (seconds): 0.000217088\n\nMoving input to GPU memory... \nTime for input data transfer (seconds): 0.00265677\n\nMoving filter to GPU memory... \nTime for filter data transfer (seconds): 9.632e-06\n\nApplying filter... \nTime for kernel execution (seconds): 4.50765e-05\n\nMoving result to CPU memory... \nTime for output data transfer (seconds): 0.00249299\n\n--------------------- \nBenchmarking details: \n--------------------- \nTime (total): 0.00542156\nFPS (total): 184.449\n\nTime (kernel): 4.50765e-05\nFPS (kernel): 22184.5\nGFLOPS (kernel): 1674.88\n------------------------------------ \n```\n\n### GPU using constant memory (pinned memory)\n```bash\nmake 05_gpu_conv2d_pinnedConstMem_benchmark.out \n```\n```\nLoaded image with Width: 2048 and Height: 1328\n\nAllocating GPU memory... \nTime for GPU memory allocation (seconds): 0.000176064\n\nMoving input to GPU memory... \nTime for input data transfer (seconds): 0.00267555\n\nMoving filter to GPU memory... \nTime for filter data transfer (seconds): 0.000199776\n\nApplying filter... \nTime for kernel execution (seconds): 4.3735e-05\n\nMoving result to CPU memory... \nTime for output data transfer (seconds): 0.00250381\n\n--------------------- \nBenchmarking details: \n--------------------- \nTime (total): 0.00559894\nFPS (total): 178.605\n\nTime (kernel): 4.3735e-05\nFPS (kernel): 22865\nGFLOPS (kernel): 1726.25\n------------------------------------ \n```\n\n### GPU using constant memory and tiling (pinned memory)\n```bash\nmake 06_gpu_conv2d_pinnedTiled_benchmark.out\n```\n```\nLoaded image with Width: 2048 and Height: 1328\n\nAllocating GPU memory... \nTime for GPU memory allocation (seconds): 0.000154464\n\nMoving input to GPU memory... \nTime for input data transfer (seconds): 0.0026567\n\nMoving filter to GPU memory... \nTime for filter data transfer (seconds): 0.000105152\n\nApplying filter... \nTime for kernel execution (seconds): 5.37395e-05\n\nMoving result to CPU memory... \nTime for output data transfer (seconds): 0.0024945\n\n--------------------- \nBenchmarking details: \n--------------------- \nTime (total): 0.00546456\nFPS (total): 182.997\n\nTime (kernel): 5.37395e-05\nFPS (kernel): 18608.3\nGFLOPS (kernel): 1404.88\n------------------------------------ \n```\n\n## References\n- Image load/save done using [stb single-file public domain libraries for C/C++](https://github.com/nothings/stb). Check out [lib](https://github.com/tgautam03/xFilters/tree/master/lib) for the specific source code.\n\n- Example images in [data](https://github.com/tgautam03/xFilters/tree/master/data):\n    - [Image by Eberhard Grossgasteiger](https://www.pexels.com/photo/mountain-at-night-under-a-starry-sky-1624496/)\n    - [Image by Pok Rie](https://www.pexels.com/photo/seawaves-on-sands-982263/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftgautam03%2Fxfilters","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftgautam03%2Fxfilters","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftgautam03%2Fxfilters/lists"}