{"id":51330644,"url":"https://github.com/tenxlenx/gpudct","last_synced_at":"2026-07-01T22:32:25.023Z","repository":{"id":195273272,"uuid":"684013928","full_name":"tenxlenx/GpuDct","owner":"tenxlenx","description":"A library to extract DCT hashes with CUDA","archived":false,"fork":false,"pushed_at":"2025-10-21T04:35:40.000Z","size":213,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-21T06:16:25.921Z","etag":null,"topics":["computer-vision","cpp","cuda","image-feature","image-processing","image-similarity","perceptual-hashing"],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tenxlenx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-08-28T09:13:32.000Z","updated_at":"2025-10-21T05:34:27.000Z","dependencies_parsed_at":"2023-09-17T08:54:33.113Z","dependency_job_id":null,"html_url":"https://github.com/tenxlenx/GpuDct","commit_stats":null,"previous_names":["tenxlenx/gpudct"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tenxlenx/GpuDct","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tenxlenx%2FGpuDct","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tenxlenx%2FGpuDct/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tenxlenx%2FGpuDct/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tenxlenx%2FGpuDct/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tenxlenx","download_url":"https://codeload.github.com/tenxlenx/GpuDct/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tenxlenx%2FGpuDct/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35025980,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-01T02:00:05.325Z","response_time":130,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","cpp","cuda","image-feature","image-processing","image-similarity","perceptual-hashing"],"created_at":"2026-07-01T22:32:23.105Z","updated_at":"2026-07-01T22:32:25.012Z","avatar_url":"https://github.com/tenxlenx.png","language":"Cuda","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GpuDct: CUDA DCT Hashing Library\n\nGpuDct is a CUDA C++20 library that computes 64-bit perceptual hashes from square images using fused Discrete Cosine Transform (DCT) kernels. Each kernel evaluates the full T * A * T' pipeline, extracts the 8x8 low-frequency block on device, and emits a median-threshold signature without extra launches or host round trips.\n\n## Highlights\n- Fused single-pass kernels for 32, 64, 128, and 256 sized images with constant-memory transforms\n- Stream-ordered temporary allocations via CUDA memory pools (no hot-path malloc)\n- In-kernel 8x8 hashing and median selection yielding a 64-bit binary fingerprint\n- Batch and multi-stream helpers for high-throughput pipelines\n- Benchmarks instrumented with CUDA events for precise GPU time attribution\n- CMake package configured for CUDA + C++20, friendly with FetchContent and install exports\n\n## Requirements\n- NVIDIA GPU with compute capability 7.5 or newer (tune `CMAKE_CUDA_ARCHITECTURES` as needed)\n- CUDA Toolkit 12.x (tested) with `nvcc`\n- CMake 3.18 or newer\n- Host compiler with full C++20 support (GCC 11+, Clang 14+, MSVC 19.3+)\n- No bundled image-processing dependencies. Provide your own contiguous buffers from any loader you prefer (stb_image, OpenCV, etc.).\n\n## Quick Start\n\n```bash\ncmake -S . -B build -DCMAKE_BUILD_TYPE=Release\ncmake --build build -j$(nproc)\n```\n\nOutputs include `libGpuDct.a` and sample binaries under `build/examples/`. Sanity check performance with:\n\n```bash\n./build/examples/gpu_dct_benchmark        # defaults to 32x32\n./build/examples/gpu_dct_benchmark 256    # alternate size\n```\n\n## Basic Tutorial\n\nThe primary entry point is `gpu_dct::GpuDct\u003cT\u003e`. Supported image sizes are 32, 64, 128, and 256.\n\n### 1. Single image hashing from host memory\n\n```cpp\n#include \u003cgpu_dct.cuh\u003e\n#include \u003cvector\u003e\n#include \u003ccstdint\u003e\n#include \u003ciostream\u003e\n\nint main() {\n    constexpr int N = 32;\n    gpu_dct::GpuDct\u003cfloat\u003e dct(N);\n\n    std::vector\u003cfloat\u003e image(N * N);\n    for (size_t i = 0; i \u003c image.size(); ++i) {\n        image[i] = static_cast\u003cfloat\u003e(i % 256);\n    }\n\n    const uint64_t hash = dct.dct_host(image.data());\n    std::cout \u003c\u003c \"hash: 0x\" \u003c\u003c std::hex \u003c\u003c hash \u003c\u003c std::dec \u003c\u003c \"\\n\";\n    return 0;\n}\n```\n\n`dct_host` is synchronous and optionally accepts a CUDA stream to integrate with existing GPU work.\n\n### 2. Batched host processing\n\n```cpp\nconstexpr int N = 64;\nconstexpr int batch = 16;\ngpu_dct::GpuDct\u003cfloat\u003e dct(N);\n\nstd::vector\u003cfloat\u003e images(static_cast\u003csize_t\u003e(N) * N * batch);\nstd::vector\u003cuint64_t\u003e hashes(batch);\n\ndct.batch_dct_host(images.data(), hashes.data(), batch);\n```\n\nThe helper stages data through stream-ordered pools, launches fused kernels for the entire batch, and returns once hashes are copied back.\n\n### 3. Device-to-device workflows and multi-stream execution\n\n```cpp\n#include \u003carray\u003e\n\ngpu_dct::GpuDct\u003cfloat\u003e dct(128);\nconstexpr int batch = 64;\n\nfloat* d_images = nullptr;\nuint64_t* d_hashes = nullptr;\ncudaMalloc(\u0026d_images, 128 * 128 * batch * sizeof(float));\ncudaMalloc(\u0026d_hashes, batch * sizeof(uint64_t));\n\n// populate d_images on device...\n\ndct.batch_dct_device(d_images, d_hashes, batch);\n\nstd::array\u003ccudaStream_t, 4\u003e streams{};\nfor (auto\u0026 s : streams) {\n    cudaStreamCreate(\u0026s);\n}\n\ndct.batch_dct_device_multistream(d_images, d_hashes, batch, streams);\n\nfor (auto s : streams) {\n    cudaStreamDestroy(s);\n}\n\ncudaFree(d_images);\ncudaFree(d_hashes);\n```\n\nHashes remain on the device, enabling additional GPU-side comparisons before any host transfer.\n\n### 4. Hashing a real image\n\nDownload any public grayscale or RGB square image and feed it through the helper utility:\n\n```bash\ncmake --build build -j$(nproc)\n./build/examples/gpu_dct_hash_image path/to/lena.jpg 256\n```\n\nThe tool uses stb_image to decode the asset, converts it to grayscale, downsamples to the requested DCT size (32, 64, 128, or 256), and prints the 64-bit perceptual hash so you can cross-check against other implementations.\n\n### Feeding data from image libraries (optional)\n\nGpuDct only expects a contiguous buffer of pixel intensities, so you can lift data from whatever host-side library you already use without additional dependencies. For example, with OpenCV:\n\n```cpp\ncv::Mat gray = cv::imread(path, cv::IMREAD_GRAYSCALE);\nif (!gray.data || gray.rows != N || gray.cols != N) {\n    throw std::runtime_error(\"unexpected image dimensions\");\n}\n\nstd::vector\u003cfloat\u003e image(gray.rows * gray.cols);\nstd::transform(gray.begin\u003cuint8_t\u003e(), gray.end\u003cuint8_t\u003e(), image.begin(),\n               [](uint8_t v) { return static_cast\u003cfloat\u003e(v); });\n\ngpu_dct::GpuDct\u003cfloat\u003e dct(N);\nconst uint64_t hash = dct.dct_host(image.data());\n```\n\nAny loader that produces a contiguous block (stb_image, libpng, custom CUDA pipelines) can be wired up the same way.\n\n## Using GpuDct in another CMake project\n\n```cmake\ninclude(FetchContent)\nFetchContent_Declare(\n    GpuDct\n    GIT_REPOSITORY https://github.com/tenxlenx/GpuDct.git\n    GIT_TAG main\n)\n\nFetchContent_MakeAvailable(GpuDct)\n\nadd_executable(hash_demo main.cpp)\ntarget_link_libraries(hash_demo PRIVATE GpuDct CUDA::cudart)\nset_property(TARGET hash_demo PROPERTY CXX_STANDARD 20)\n```\n\nOverride `CMAKE_CUDA_ARCHITECTURES` in the parent project to match deployment hardware.\n\n## Benchmarking\n\n`examples/gpu_dct_benchmark` exercises single images, batched runs, and multi-stream scenarios with CUDA event profiling on every test. CLI usage:\n\n```\n./gpu_dct_benchmark                # 32x32, default iterations\n./gpu_dct_benchmark 128            # choose image size\n./gpu_dct_benchmark 64 --streams 4 # adjust streams or iterations\n```\n\nThe tool reports per-image latency, throughput, and data type comparisons for quick regression checks.\n\n## Troubleshooting\n- Mismatch between compiled and runtime GPU architectures: set `CMAKE_CUDA_ARCHITECTURES` explicitly.\n- Out-of-memory during large batches: raise the CUDA malloc heap limit or reduce concurrent streams.\n- Integrating with pre-existing CUDA streams: pass your stream to constructors or method overloads to preserve ordering.\n\n## License\n\nMIT. See `LICENSE` for details.\n\nThe repository vendors `stb_image.h` (public-domain / MIT dual licensed) in `third_party/` for sample image decoding.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftenxlenx%2Fgpudct","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftenxlenx%2Fgpudct","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftenxlenx%2Fgpudct/lists"}