{"id":20884823,"url":"https://github.com/morgwai/gpu-samples","last_synced_at":"2025-06-29T00:37:34.377Z","repository":{"id":68078864,"uuid":"413797226","full_name":"morgwai/gpu-samples","owner":"morgwai","description":"some GPU processing using JOCL (openCL) and Aparapi","archived":false,"fork":false,"pushed_at":"2021-10-19T07:48:50.000Z","size":170,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-12T17:50:21.398Z","etag":null,"topics":["aparapi","concurrency","concurrent-programming","gpu","gpu-programming","java","multithreading","pram"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/morgwai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-05T11:53:32.000Z","updated_at":"2023-09-20T15:56:00.000Z","dependencies_parsed_at":null,"dependency_job_id":"7548c1a5-9bf3-4fce-82d4-ce1cdcda47a8","html_url":"https://github.com/morgwai/gpu-samples","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/morgwai/gpu-samples","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/morgwai%2Fgpu-samples","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/morgwai%2Fgpu-samples/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/morgwai%2Fgpu-samples/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/morgwai%2Fgpu-samples/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/morgwai","download_url":"https://codeload.github.com/morgwai/gpu-samples/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/morgwai%2Fgpu-samples/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262518105,"owners_count":23323301,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aparapi","concurrency","concurrent-programming","gpu","gpu-programming","java","multithreading","pram"],"created_at":"2024-11-18T08:11:42.449Z","updated_at":"2025-06-29T00:37:34.348Z","avatar_url":"https://github.com/morgwai.png","language":"Java","readme":"# GPU samples\n\nParallel reduction and [pointer jumping](https://en.wikipedia.org/wiki/Pointer_jumping) algorithms summarizing values from an array adapted to run on a GPU using [Aparapi](https://aparapi.com/) and [JOCL](http://www.jocl.org/) (frontends to [openCL](https://www.khronos.org/opencl/)).\n\n\n## building and running comparison of various sync methods in openCL parallel reduction\n\nFirst, make sure that you have an openCL driver for your GPU installed: [Nvidia](https://developer.nvidia.com/cuda-downloads), [AMD Linux](https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux-21-30) (AMD on windows should be available by default, hopefully).\n\n```bash\n./mvnw package\njava -jar target/pointer-jumping-gpu-1.0-SNAPSHOT-jar-with-dependencies.jar\n```\nThis will run parallel reduction kernels using 3 different approaches to synchronization on\narrays of various sizes from 32k to 128M elements, 50 times for each size. On my machine it\ntakes about 5 minutes. For each size it will output average time for each sync method.\n\nThese are times I got on my integrated Intel GPU:\n\n32k element array:\n\n```\nBARRIER average:     403076\n   SIMD average:     295953\n HYBRID average:     269073\n    CPU average:      62924\n```\n128k:\n\n```\nBARRIER average:     768170\n   SIMD average:     483343\n HYBRID average:     433704\n    CPU average:     175977\n```\n256k:\n\n```\nBARRIER average:    1018578\n   SIMD average:     793267\n HYBRID average:     738423\n    CPU average:     367999\n```\n512k:\n\n```\nBARRIER average:    1191166\n   SIMD average:    1019678\n HYBRID average:     828609\n    CPU average:     780270\n```\n1M:\n\n```\nBARRIER average:    1759843\n   SIMD average:    1580668\n HYBRID average:    1366559\n    CPU average:    1288948\n```\n2M:\n\n```\nBARRIER average:    3406786\n   SIMD average:    3070155\n HYBRID average:    2398054\n    CPU average:    2674748\n```\n3M:\n\n```\nBARRIER average:    4166284\n   SIMD average:    4192948\n HYBRID average:    3480526\n    CPU average:    3575055\n```\n4M-4k:\n\n```\nBARRIER average:    6573353 (1 recursive step on HYBRID)\n   SIMD average:    6758205\n HYBRID average:    5653419\n    CPU average:    5582159\n```\n4M:\n\n```\nBARRIER average:   13797841\n   SIMD average:   13367851\n HYBRID average:   12600975\n    CPU average:    5427631\n```\n32M:\n\n```\nBARRIER average:  102840013\n   SIMD average:  103991061\n HYBRID average:   95481061\n    CPU average:   41226782\n```\n128M:\n\n```\nBARRIER average:  363563970\n   SIMD average:  387534517\n HYBRID average:  344870087\n    CPU average:  160136923\n```\n255M:\n\n```\nBARRIER average:  878887550 (1 recursive step on HYBRID)\n   SIMD average:  819652415\n HYBRID average:  730983353\n    CPU average:  323803437\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmorgwai%2Fgpu-samples","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmorgwai%2Fgpu-samples","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmorgwai%2Fgpu-samples/lists"}