{"id":13533712,"url":"https://github.com/ParRes/Kernels","last_synced_at":"2025-04-01T22:30:33.916Z","repository":{"id":11006199,"uuid":"13330867","full_name":"ParRes/Kernels","owner":"ParRes","description":"This is a set of simple programs that can be used to explore the features of a parallel platform.","archived":false,"fork":false,"pushed_at":"2024-10-29T13:39:50.000Z","size":14715,"stargazers_count":410,"open_issues_count":31,"forks_count":108,"subscribers_count":39,"default_branch":"default","last_synced_at":"2024-10-29T16:12:16.969Z","etag":null,"topics":["c","c-plus-plus","coarray-fortran","fortran2008","hpc","julia","kokkos","mpi","openacc","opencl","openmp","parallel","parallel-programming","pgas","python3","shmem","sycl","threading"],"latest_commit_sha":null,"homepage":"https://groups.google.com/forum/#!forum/parallel-research-kernels","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ParRes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":"SUPPORT.md","governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2013-10-04T18:00:10.000Z","updated_at":"2024-10-29T13:39:57.000Z","dependencies_parsed_at":"2023-02-14T07:16:16.894Z","dependency_job_id":"e58d0704-2b98-4c13-a09b-8f9d89b3cc67","html_url":"https://github.com/ParRes/Kernels","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ParRes%2FKernels","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ParRes%2FKernels/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ParRes%2FKernels/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ParRes%2FKernels/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ParRes","download_url":"https://codeload.github.com/ParRes/Kernels/tar.gz/refs/heads/default","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246720416,"owners_count":20822898,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","c-plus-plus","coarray-fortran","fortran2008","hpc","julia","kokkos","mpi","openacc","opencl","openmp","parallel","parallel-programming","pgas","python3","shmem","sycl","threading"],"created_at":"2024-08-01T07:01:22.399Z","updated_at":"2025-04-01T22:30:33.686Z","avatar_url":"https://github.com/ParRes.png","language":"C","readme":"![PRK logo.](https://github.com/ParRes/Kernels/blob/default/logo/PRK%20logo.png)\n\n[![license](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://github.com/ParRes/Kernels/blob/master/COPYING)\n[![GitHub contributors](https://img.shields.io/github/contributors/ParRes/Kernels.svg)]()\n[![GitHub language count](https://img.shields.io/github/languages/count/ParRes/Kernels.svg)]()\n[![GitHub top language](https://img.shields.io/github/languages/top/ParRes/Kernels.svg)]()\n\n# Overview\n\nThis suite contains a number of kernel operations, called Parallel\nResearch Kernels, plus a simple build system intended for a Linux-compatible environment.\nMost of the code relies on open standard programming models and thus can be\nexecuted on many computing systems.\n\nThese programs should not be used as benchmarks.  They are operations to \nexplore features of a hardware platform, but they do not define \nfixed problems that can be used to rank systems.  Furthermore \nthey have not been optimized for the features of any particular system.\n\n# Build Instructions\n\nTo build the codes the user needs to make certain changes by editing text\nfiles. Assuming the source tree is untarred in directory `$PRK`, the\nfollowing file needs to be copied to `$PRK/common/make.defs` and edited.\n\n`$PRK/common/make.defs.in` -- This file specifies the names of the C\ncompiler (`CC`), and of the MPI (Message Passing Interface) compiler `MPICC`\nor compile script. If MPI is not going to be used, the user can ignore\nthe value of `MPICC`. The compilers should already be in your path. That\nis, if you define `CC=icc`, then typing `which icc` should show a\nvalid path where that compiler is installed.\nSpecial instructions for building and running codes using Charm++, Grappa, \nOpenSHMEM, or Fine-Grain MPI are in `README.special`.\n\nWe provide examples of working examples for a number of programming environments.\nSome of these are tested more than others.\nIf you are looking for the simplest option, try `make.defs.gcc`.\n\n| File (in `./common/`) | Environment |  \n|----------------------|-------------------------|  \n| `make.defs.cray`     | Cray toolchain (rarely tested). |\n| `make.defs.cuda`     | GCC with the CUDA compiler (only used in C++/CUDA implementation). |\n| `make.defs.gcc`      | GCC compiler toolchain, which supports essentially all implementations (tested often). |\n| `make.defs.freebsd`  | FreeBSD (rarely tested). |\n| `make.defs.ibmbg`    | IBM Blue Gene/Q compiler toolchain (deprecated). |\n| `make.defs.ibmp9nv`  | IBM compilers for POWER9 and NVIDIA Volta platforms (rarely tested). |\n| `make.defs.intel`    | Intel Parallel Studio toolchain, which supports most implementations (tested often). |\n| `make.defs.llvm`     | LLVM compiler toolchain, which supports most implementations (tested often). |\n| `make.defs.musl`     | GCC compiler toolchain with MUSL as the C standard library, which was required to use C11 threads. |\n| `make.defs.nvhpc`    | [NVIDIA HPC SDK](https://developer.nvidia.com/nvidia-hpc-sdk-downloads), which supports most implementations (tested often). |\n| `make.defs.oneapi`   | Intel [oneAPI](https://software.intel.com/oneapi/hpc-kit). |\n| `make.defs.pgi`      | PGI compiler toolchain (infrequently tested). |\n| `make.defs.hip`      | HIP compiler toolchain (infrequently tested). |\n\nSome of the C++ implementations require you to install Boost, RAJA, Kokkos, Parallel STL, respectively,\nand then modify `make.defs` appropriately.  Please see the documentation in the\n[documentation](https://github.com/ParRes/Kernels/tree/default/doc) (`doc`) subdirectory.\n\nYou can refer to the `travis` subdirectory for install scripts that can be readily modified\nto install any of the dependencies in your local environment.\n\n# Supported Programming Models\n\nThe suite of kernels currently has complete parallel implementations in \n[OpenMP](http://openmp.org/), \n[MPI](http://www.mpi-forum.org/), Adaptive MPI and \n[Fine-Grain MPI](http://www.cs.ubc.ca/~humaira/fgmpi.html). \nThere is also a SERIAL reference implementation. \n\nThe suite is currently being extended to include \n[Charm++](http://charm.cs.illinois.edu/research/charm),\nMPI+OpenMP, \n[OpenSHMEM](http://openshmem.org/), UPC, and\n[Grappa](http://grappa.io/), \nFortran with coarrays,\nas well as three new variations of MPI: \n  1. MPI with one-sided communications (MPIRMA) \n  2. MPI with direct use of shared memory inside coherency domains (MPISHM)\n  3. MPI with OpenMP inside coherency domains (MPIOPENMP)\nThese extensions are not yet complete.\n\nMore recently, we have implemented many single-node programming models in modern languages.\n\n## Modern C++\n\ny = yes\n\ni = in-progress, incomplete, incorrect, or incredibly slow\n\nf = see footnotes\n\n| Parallelism          | p2p | stencil | transpose | nstream | sparse | dgemm | PIC |\n|----------------------|-----|---------|-----------|---------|--------|-------|-----|\n| None                 |  y  |    y    |     y     |    y    |    y   |   y   |  y  |\n| C++11 threads, async |     |         |     y     |         |        |       |     |\n| OpenMP               |  y  |    y    |     y     |    y    |        |       |     |\n| OpenMP tasks         |  y  |    y    |     y     |    y    |        |       |     |\n| OpenMP target        |  y  |    y    |     y     |    y    |        |       |     |\n| OpenCL 1.x           |  i  |    y    |     y     |    y    |        |       |     |\n| SYCL                 |  i  |    y    |     y     |    y    |        |   y   |  y  |\n| Boost.Compute        |     |         |           |    y    |        |       |     |\n| Parallel STL         |  y  |    y    |     y     |    y    |        |       |     |\n| Thrust               |     |         |     i     |    y    |        |       |     |\n| TBB                  |  y  |    y    |     y     |    y    |        |       |     |\n| Kokkos               |  y  |    y    |     y     |    y    |        |       |     |\n| RAJA                 |  y  |    y    |     y     |    y    |        |       |     |\n| CUDA                 |  i  |    y    |     y     |    y    |        |       |     |\n| CUBLAS               |     |         |     y     |    y    |        |   y   |     |\n| HIP                  |  i  |    y    |     y     |    y    |        |       |     |\n| HIPBLAS              |     |         |     y     |    y    |        |   y   |     |\n| CBLAS                |     |         |     y     |         |        |   y   |     |\n| OpenACC              |  y  |         |           |         |        |       |     |\n| MPI (RMA)            |     |         |           |    y    |        |       |     |\n\n* [SYCL](http://sycl.tech/)\n* [Boost.Compute](http://boostorg.github.io/compute/)\n* [TBB](https://www.threadingbuildingblocks.org/)\n* [Kokkos](https://github.com/kokkos/kokkos)\n* [RAJA](https://github.com/LLNL/RAJA)\n\n## Modern C\n\n| Parallelism          | p2p | stencil | transpose | nstream | sparse |\n|----------------------|-----|---------|-----------|---------|--------|\n| None                 |  y  |    y    |     y     |    y    |        |\n| C11 threads          |     |         |     y     |         |        |\n| OpenMP               |  y  |    y    |     y     |    y    |        |\n| OpenMP tasks         |  y  |    y    |     y     |    y    |        |\n| OpenMP target        |  y  |    y    |     y     |    y    |        |\n| Cilk                 |     |    y    |     y     |         |        |\n| ISPC                 |     |         |     y     |         |        |\n| MPI                  |     |         |           |    y    |        |\n| PETSc                |     |         |     i     |    y    |        |\n\nThere are versions of nstream with OpenMP that support memory allocation\nusing [mmap](http://man7.org/linux/man-pages/man2/mmap.2.html)\nand [memkind](https://github.com/memkind/memkind), which can be used\nfor testing novel memory systems, including persistent memory.\n\n* [ISPC](https://ispc.github.io/)\n\n## Modern Fortran\n\n| Parallelism          | p2p | stencil | transpose | nstream | sparse | dgemm |\n|----------------------|-----|---------|-----------|---------|--------|-------|\n| None                 |  y  |    y    |     y     |    y    |        |   y   |\n| Intrinsics           |     |         |     y     |    y    |        |   y   |\n| coarrays             |  y  |    y    |     y     |         |        |       |\n| Global Arrays        |     |         |     y     |    y    |        |       |\n| OpenMP               |  y  |    y    |     y     |    y    |        |   y   |\n| OpenMP tasks         |  y  |    y    |     y     |    y    |        |       |\n| OpenMP target        |  y  |    y    |     y     |    y    |        |       |\n| OpenACC              |     |    y    |     y     |    y    |        |       |\n\nBy intrinsics, we mean the language built-in features, such as colon notation or the `TRANSPOSE` intrinsic.\nWe use `DO CONCURRENT` in a few places.\n\n## Other languages\n\nx = externally supported (in the Chapel repo)\n\n| Parallelism          | p2p | stencil | transpose | nstream | sparse | dgemm |\n|----------------------|-----|---------|-----------|---------|--------|-------|\n| Python 3             |  y  |    y    |     y     |    y    |    y   |   y   |\n| Python 3 w/ Numpy    |  y  |    y    |     y     |    y    |    y   |   y   |\n| Python 3 w/ mpi4py   |     |    y    |     y     |    y    |        |       |\n| Julia                |  y  |    y    |     y     |         |        |       |\n| Octave (Matlab)      |  y  |    y    |     y     |         |        |       |\n| Rust                 |  y  |    y    |     y     |         |        |       |\n| Go                   |     |         |     y     |    y    |        |   y   |\n| C#                   |     |         |     y     |    y    |        |       |\n| Chapel               |  x  |    x    |     x     |         |        |       |\n| Java                 |  y  |    y    |     y     |    y    |        |       |\n| Lua                  |     |         |           |    y    |        |       |\n\n## Global make\n\nPlease run `make help` in the top directory for the latest information.\n\nTo build all available kernels of a certain version, type in the root\ndirectory:\n\n| Command              | Effect |  \n|----------------------|-------------------------|  \n| `make all`           | builds all kernels. |  \n| `make allserial`     | builds all serial kernels. |  \n| `make allopenmp`     | builds all OpenMP kernels. |  \n| `make allmpi`        | builds all conventional two-sided MPI kernels. |  \n| `make allmpi1`       | builds all MPI kernels. |  \n| `make allfgmpi`      | builds all Fine-Grain MPI kernels. | \n| `make allampi`       | builds all Adaptive MPI kernels. |  \n| `make allmpiopenmp`  | builds all hybrid MPI+OpenMP kernels. |  \n| `make allmpirma`     | builds all MPI-3 kernels with one-sided communications. |  \n| `make allmpishm`     | builds all kernels with MPI-3 shared memory. | \n| `make allshmem`      | builds all OpenSHMEM kernels. |  \n| `make allupc`        | builds all Unified Parallel C (UPC) kernels. |  \n| `make allcharm++`    | builds all Charm++ kernels. |  \n| `make allgrappa`     | builds all Grappa kernels. |  \n| `make allfortran`    | builds all Fortran kernels. |\n| `make allc1x`        | builds all C99/C11 kernels. |\n| `make allcxx`        | builds all C++11 kernels. |\n\nThe global make process uses a single set of optimization flags for all\nkernels. For more control, the user should consider individual makes\n(see below), carefully choosing the right parameters in each Makefile.\nIf a a single set of optimization flags different from the default is\ndesired, the command line can be adjusted:\n`make all\u003cversion\u003e default_opt_flags=\u003clist of optimization flags\u003e` \n\nThe global make process uses some defaults for the Branch kernel\n(see Makefile in that directory). These can be overridden by adjusting\nthe command line: \n`make all\u003cversion\u003e matrix_rank=\u003cn\u003e number_of_functions=\u003cm\u003e`\nNote that no new values for `matrix_rank` or `number_of_functions` will\nbe used unless a `make veryclean` has been issued.\n\n## Individual make\n\nDescend into the desired sub-tree and `cd` to the kernel(s) of interest. \nEach kernel has its own Makefile. There are a number of parameters \nthat determine the behavior of the kernel that need to be known at \ncompile time. These are explained succinctly in the Makefile itself. Edit \nthe Makefile to activate certain parameters, and/or to set their values.\n\nTyping `make` without parameters in each leaf directory will prompt\nthe user for the correct parameter syntax. Once the code has been\nbuilt, typing the name of the executable without any parameters will \nprompt the user for the correct parameter syntax.\n\n# Running test suite\n\nAfter the desired kernels have been built, they can be tested by\nexecuting scripts in the 'scripts' subdirectory from the root of the\nkernels package. Currently two types of run scripts are supported.\nscripts/small: tests only very small examples that should complete in \n               just a few seconds. This merely tests functionality\n               of kernels and installed runtimes\nscripts/wide:  tests examples that will take up most memory on a \n               single node with 64 GB of memory. \n\nOnly a few parameters can be changed globally; for rigorous testing, \nthe user should run each kernel individually, carefully choosing the \nright parameters. This may involve editing the individual Makefiles \nand rerunning the kernels.\n\n# Example build and runs\n\n```sh\nmake all default_opt_flags=\"-O2\" \"matrix_rank=7\" \"number_of_functions=200\" \n./scripts/small/runopenmp\n./scripts/small/runmpi1\n./scripts/wide/runserial\n./scripts/small/runcharm++\n./scripts/wide/runmpiopenmp\n```\n\nTo exercise all kernels, type\n```sh\n./scripts/small/runall\n./scripts/wide/runall\n```\n\n# Quality Control\n\nWe have a rather massive test matrix running in Travis CI.\nUnfortunately, the Travis CI environment may vary with time and occasionally differs\nfrom what we are running locally, which makes debugging tricky.\nIf the status of the project is not passing, please inspect the [details](https://travis-ci.org/ParRes/Kernels),\nbecause this may not be an indication of an issue with our project, but rather\nsomething in Travis CI.\n\n# License\n\nSee [COPYING](https://github.com/ParRes/Kernels/blob/master/COPYING) for licensing information.\n\n## Note on stream\n\nNote that while our `nstream` operations are based on the well\nknown STREAM benchmark by John D. McCalpin, we modified the source \ncode and do not follow the run-rules associated with this benchmark.\nHence, according to the rules defined in the STREAM license (see \nclause 3b), you must never report the results of our nstream \noperations as official \"STREAM Benchmark\" results. The results must \nbe clearly labled whenever they are published.  Examples of proper \nlabelling include: \n\n      \"tuned STREAM benchmark results\" \n      \"based on a variant of the STREAM benchmark code\" \n\nOther comparable, clear, and reasonable labelling is acceptable.\n","funding_links":[],"categories":["Software"],"sub_categories":["Trends"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FParRes%2FKernels","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FParRes%2FKernels","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FParRes%2FKernels/lists"}