{"id":26953551,"url":"https://github.com/tornikeo/sample-openmp-in-cuda","last_synced_at":"2025-08-01T17:37:53.081Z","repository":{"id":282395066,"uuid":"948446417","full_name":"tornikeo/sample-openmp-in-cuda","owner":"tornikeo","description":"Sample of using OpenMP and CUDA: single GPU, multiple CPU","archived":false,"fork":false,"pushed_at":"2025-03-17T10:49:11.000Z","size":90,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-14T10:22:52.147Z","etag":null,"topics":["cuda","meson","openmp"],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tornikeo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-03-14T10:57:54.000Z","updated_at":"2025-03-17T13:24:44.000Z","dependencies_parsed_at":null,"dependency_job_id":"9321b55e-ff6e-4658-86b0-0346bd563bda","html_url":"https://github.com/tornikeo/sample-openmp-in-cuda","commit_stats":null,"previous_names":["tornikeo/sample-openmp-in-cuda"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tornikeo/sample-openmp-in-cuda","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tornikeo%2Fsample-openmp-in-cuda","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tornikeo%2Fsample-openmp-in-cuda/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tornikeo%2Fsample-openmp-in-cuda/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tornikeo%2Fsample-openmp-in-cuda/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tornikeo","download_url":"https://codeload.github.com/tornikeo/sample-openmp-in-cuda/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tornikeo%2Fsample-openmp-in-cuda/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268268938,"owners_count":24223156,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-01T02:00:08.611Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","meson","openmp"],"created_at":"2025-04-03T01:30:11.417Z","updated_at":"2025-08-01T17:37:53.073Z","avatar_url":"https://github.com/tornikeo.png","language":"Cuda","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Minimal OpenMP + CUDA sample in C++\n\nShows a sample of using OpenMP with CUDA, with multiple CPUs batching their requests to query the GPU at the same time. In short:\n\n1. Parallel CPU threads create jobs\n2. Main thread concatenates jobs and sends it to GPU\n3. GPU exection finishes\n4. Main thread unconcats job results back to threads \n4. Parallel CPU threads finalize job\n\nRefer to [starter repo](https://github.com/tornikeo/minimal-vscode-cuda-meson) on setting this up with vscode + meson.\n\n# Compile and run this\n\n```sh\nmeson setup builddir\nmeson compile -C builddir\nmeson test -C builddir --verbose\n```\nShould output:\n\n```sh\n# \u003cbuild steps\u003e ...\n\nStart\nDevice Number: 0\n  Device name: NVIDIA GeForce GTX 1050 Ti with Max-Q Design\n  Memory Clock Rate (KHz): 3504000\n  Memory Bus Width (bits): 128\n  Peak Memory Bandwidth (GB/s): 112.128000\n\nThread 0 is working...\nThread 7 is working...\nThread 2 is working...\nThread 5 is working...\nThread 6 is working...\nThread 3 is working...\nThread 1 is working...\nThread 4 is working...\nNumbers going into GPU:\n0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \n1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 \n2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 \n3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 \n4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 \n5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 \n6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 \n7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 \nLaunching GPU...\nWe expect the kernel to sum the values in each row...\nThread 0 verified sum: 0 ✅\nThread 1 verified sum: 32 ✅\nThread 2 verified sum: 64 ✅\nThread 7 verified sum: 224 ✅\nThread 4 verified sum: 128 ✅\nThread 6 verified sum: 192 ✅\nThread 5 verified sum: 160 ✅\nThread 3 verified sum: 96 ✅\n# ...\n```\n\n# Prerequisites\n\n- cudatoolkit, cudatoolkit-dev (e.g from micromamba or conda)\n- g++-11 (build-essential)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftornikeo%2Fsample-openmp-in-cuda","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftornikeo%2Fsample-openmp-in-cuda","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftornikeo%2Fsample-openmp-in-cuda/lists"}