{"id":17203910,"url":"https://github.com/nachovizzo/saxpy_openacc_cpp","last_synced_at":"2025-09-04T15:38:13.422Z","repository":{"id":143265649,"uuid":"321721679","full_name":"nachovizzo/saxpy_openacc_cpp","owner":"nachovizzo","description":"My way of thinking about OpenACC, C++, and Parallel computing in general","archived":false,"fork":false,"pushed_at":"2020-12-22T01:23:59.000Z","size":17,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-01-30T09:11:17.783Z","etag":null,"topics":["cpp","cuda","gpu","openacc"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nachovizzo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-12-15T16:16:11.000Z","updated_at":"2023-12-26T07:07:16.000Z","dependencies_parsed_at":"2023-05-10T10:32:29.627Z","dependency_job_id":null,"html_url":"https://github.com/nachovizzo/saxpy_openacc_cpp","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nachovizzo%2Fsaxpy_openacc_cpp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nachovizzo%2Fsaxpy_openacc_cpp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nachovizzo%2Fsaxpy_openacc_cpp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nachovizzo%2Fsaxpy_openacc_cpp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nachovizzo","download_url":"https://codeload.github.com/nachovizzo/saxpy_openacc_cpp/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245437098,"owners_count":20615210,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","cuda","gpu","openacc"],"created_at":"2024-10-15T02:19:54.930Z","updated_at":"2025-03-25T09:42:33.897Z","avatar_url":"https://github.com/nachovizzo.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Saxpy C vs C++ Using OpenACC\n\n- [Saxpy C vs C++ Using OpenACC](#saxpy-c-vs-c-using-openacc)\n  - [Why C++](#why-c)\n  - [What do I need to build this project?](#what-do-i-need-to-build-this-project)\n  - [What if my compiler doesn't support OpenACC?](#what-if-my-compiler-doesnt-support-openacc)\n  - [Bbbbbbbut I was told `C` is faster than `C++`](#bbbbbbbut-i-was-told-c-is-faster-than-c)\n  - [What is so exciting about pragma based directive parallelism?](#what-is-so-exciting-about-pragma-based-directive-parallelism)\n    - [Don't believe my words, look this](#dont-believe-my-words-look-this)\n      - [Run the code with a GPU](#run-the-code-with-a-gpu)\n      - [Run the code with a **multicore** CPU](#run-the-code-with-a-multicore-cpu)\n      - [Run the code with a **singlecore** CPU](#run-the-code-with-a-singlecore-cpu)\n      - [Virtually with any kind of accelerator](#virtually-with-any-kind-of-accelerator)\n  - [ToDo](#todo)\n\nThis is just a small example of how **I** think you should be using C++ over\nC even when working with HPC applications, using OpenACC. I strongly believe\nthat one should do the things that would like the world to be doing, and for\nme, this is one of them :).\n\n## Why C++\n\nWell, you better check my [C++ Course FAQ](https://www.ipb.uni-bonn.de/teaching/cpp-2020/faq/)\n\n## What do I need to build this project?\n\nA compiler that supports the OpenACC standard... sadly, the only one you\nmight be able to use is the PGI compilers(now part of the NVIDIA-HPC-SDK).\n\nAfter doing that you can build this project like this(assuming you have\ninstalled the nvidia-hpc-sdk):\n\n```sh\nmkdir -p build \u0026\u0026 cd build\ncmake -DCMAKE_CXX_COMPILER=nvc++ -DCMAKE_C_COMPILER=nvc -DENABLE_OPENACC=ON ..\nmake all\n```\n\nThe output should look similar to\n\n```sh\nsaxpy:\n      8, Generating implicit copy(y[:n]) [if not already present]\n         Generating implicit copyin(x[:n]) [if not already present]\n     10, Loop is parallelizable\n         Generating Tesla code\n         10, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */\n```\n\n## What if my compiler doesn't support OpenACC?\n\nWell... then there is no reason for you to be here :)\n\n## Bbbbbbbut I was told `C` is faster than `C++`\n\nAnd do you still believe it? C might be faster than C++ under some particular\nassumptions, one of them is that you are living in 1990 and notin 2020... if\nyou don't believe me you can benchmark this application by your own:\n\n**C++ Version:**\n\n```sh\ntime                 121.0 ms   (98.66 ms .. 149.2 ms)\n                     0.953 R²   (0.855 R² .. 0.999 R²)\nmean                 138.5 ms   (130.5 ms .. 152.5 ms)\nstd dev              16.58 ms   (8.188 ms .. 24.11 ms)\nvariance introduced by outliers: 36% (moderately inflated)\n```\n\n**C Version:**\n\n```sh\nbenchmarking ./saxpy_c\ntime                 134.0 ms   (113.3 ms .. 157.7 ms)\n                     0.962 R²   (0.891 R² .. 0.997 R²)\nmean                 130.3 ms   (122.9 ms .. 139.2 ms)\nstd dev              12.42 ms   (8.789 ms .. 17.49 ms)\nvariance introduced by outliers: 24% (moderately inflated)\n```\n\nAnyway, I don't have anything particular against `C` per say, and if you love\nit or you are still convinced that `C++` add \"extra overhead\" just because it\nsupports the word `class`, or because you don't believe in the \"0 cost\nabstraction\" principle we all `C++` programmers believe, then, it's fine. I'm\nsorry if I hurt your feelings.\n\n## What is so exciting about pragma based directive parallelism?\n\nGo and check out any modern high-performance library from the modern world,\nspecially those who need to heavily use the GPU's always have 2 versions of\nthe library functionality, one for CPU, one for GPU(Just check PyTorch,\nOpen3D, etc). Of course, this is time consuming to mantain and therefore costly.\nWhat is worse is that the GPU based parallelism is usually written in CUDA\nwhich I think is an horrible extension to the C/C++ languages, plus, it's\n100% tightned to a private company, which, it is no good. On the other hand,\nthe CPU implementation often rely on `OpenMP` directives... but not always,\nwhich make this CPU code highly un-efficient. Specially when comparing it\nagainst it's CUDA counterpart.\n\nWhat is so beautiful about open standards like `OpenMP` and `OpenACC` is that\nallows you to express parallel code in a much nicer way, and without forcing\nyou to marry to any particular vendor. Nowadays, the only \"good\" compiler\nthat fully supports `OpenACC` is the one from the NVIDIA-SDK. So, we are\nstill on the same circle, but if we developers push for a better programming\nstyle, the support for great compilers like `gcc` and `clang` will come in no\ntime. Nowadays everyone still use CUDA, so that's the problem (from my\nperspective).\n\n### Don't believe my words, look this\n\nSo, the same piece of code, only 1 module to mantain. Let's take\n[saxpy.cpp](saxpy.cpp) as the victim here. You first write your code, and then\nexpress how you would like to run this in parallel with almost no intrusion to\nthe original code. And then you can:\n\n#### Run the code with a GPU\n\n```sh\ncmake -DCMAKE_CXX_COMPILER=nvc++ -DCMAKE_C_COMPILER=nvc -DENABLE_OPENACC=ON ..\nmake all\nACC_DEVICE_TYPE=nvidia ./saxpy_cpp # Runs on your NVIDIA GPU\n```\n\n#### Run the code with a **multicore** CPU\n\n```sh\ncmake -DCMAKE_CXX_COMPILER=nvc++ -DCMAKE_C_COMPILER=nvc -DENABLE_OPENACC=ON ..\nmake all\nACC_DEVICE_TYPE=host ./saxpy_cpp # Runs on multicore, a la OpenMP\n```\n\n#### Run the code with a **singlecore** CPU\n\n```sh\ncmake -DCMAKE_CXX_COMPILER=nvc++ -DCMAKE_C_COMPILER=nvc -DENABLE_OPENACC=OFF ..\nmake all\n./saxpy_cpp # Runs on singlecore, look at -DENABLE_OPENACC=OFF ..\n```\n\n#### Virtually with any kind of accelerator\n\nFPGA, DSP, AMD GPU etc, Though I've never tried this myself.\n\n## ToDo\n\n- [ ] Add `CUDA` example.\n- [ ] Add `cuBLAS` example.\n- [x] Add `std::par` C++17 example\n- [ ] Add `OpenMP 5.0` example.\n- [ ] Add pybind11 bindings\n- [ ] Add Cython bindings\n- [ ] Add Jupyter Notebook to benchmark all examples\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnachovizzo%2Fsaxpy_openacc_cpp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnachovizzo%2Fsaxpy_openacc_cpp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnachovizzo%2Fsaxpy_openacc_cpp/lists"}