https://github.com/pkestene/mandelbrot_kokkos
https://github.com/pkestene/mandelbrot_kokkos
cuda gpu gpu-computing kokkos mandelbrot openmp performance-portability
Last synced: 7 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/pkestene/mandelbrot_kokkos
- Owner: pkestene
- Created: 2016-11-03T21:03:27.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2017-05-10T01:55:12.000Z (over 8 years ago)
- Last Synced: 2025-02-10T22:33:56.772Z (8 months ago)
- Topics: cuda, gpu, gpu-computing, kokkos, mandelbrot, openmp, performance-portability
- Language: C++
- Homepage:
- Size: 14.6 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
A minimalistic kokkos example to compute Mandelbrot set and illustrate asynchronous memory copy (i.e. overlap between a computationnal functor and a deep copy operation).
Three versions provided:
* basic : Mandelbrot set is computed in a single Kokkos functor. Can be used
with either OpenMP or Cuda backend.* basic_mdrange : same version a basic, just for illustrating the use of Kokkos::Experimental::MDRange
* pipeline0 performs computations piece by piece
* pipeline1 (only meaningfull when used with CUDA+OpenMP) performs computations piece by piece, but the loop over the pieces
is parallelized using an OpenMP Kokkos functor, so that the different pieces
can be computed in different Cuda streams.# What is kokkos ?
A modern C++ based programming model for HPC applications designed for portability across multiple hardware architectures (multicore, GPU, KNL, Power8, ...) and also providing as efficient as possible performance.
# Build the basic version
0. Need to have installed [kokkos](https://github.com/kokkos/kokkos)
* Kokkos backend can be CUDA, OpenMP, ...
* Compiler can be nvcc_wrapper, g++, xlc++, ...1. Set env variable KOKKOS_PATH to the root directory where Kokkos is installed
2. cd basic; make
3. run
./mandelbrot.omp (or ./mandelbrot.cuda)
With default parameters (image of size 8192x8192), some performance of the basic version:
* Nvidia K80 : 1.5 seconds
* Power8 (g++ 4.8.5)
* 20 threads : 50.1 seconds
* 40 threads : 27.7 seconds
* 60 threads : 16.9 seconds
* 160 threads : 10.5 seconds
NB: version pipeline1 require kokkos to be configured with both CUDA and OPENMP
backends, and lambda function enabled. Example command line configuration:generate_makefile.bash --with-cuda --arch=${YOUR_CUDA_ARCH} --prefix=${SOMEWHERE} --with-cuda-options=enable_lambda --with-openmp