https://github.com/pkestene/mandelbrot_kokkos

cuda gpu gpu-computing kokkos mandelbrot openmp performance-portability

Last synced: 7 months ago
JSON representation

Host: GitHub
URL: https://github.com/pkestene/mandelbrot_kokkos
Owner: pkestene
Created: 2016-11-03T21:03:27.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2017-05-10T01:55:12.000Z (over 8 years ago)
Last Synced: 2025-02-10T22:33:56.772Z (8 months ago)
Topics: cuda, gpu, gpu-computing, kokkos, mandelbrot, openmp, performance-portability
Language: C++
Homepage:
Size: 14.6 KB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

A minimalistic kokkos example to compute Mandelbrot set and illustrate asynchronous memory copy (i.e. overlap between a computationnal functor and a deep copy operation).

Three versions provided:

* basic : Mandelbrot set is computed in a single Kokkos functor. Can be used
with either OpenMP or Cuda backend.

* basic_mdrange : same version a basic, just for illustrating the use of Kokkos::Experimental::MDRange

* pipeline0 performs computations piece by piece

* pipeline1 (only meaningfull when used with CUDA+OpenMP) performs computations piece by piece, but the loop over the pieces
is parallelized using an OpenMP Kokkos functor, so that the different pieces
can be computed in different Cuda streams.

# What is kokkos ?

A modern C++ based programming model for HPC applications designed for portability across multiple hardware architectures (multicore, GPU, KNL, Power8, ...) and also providing as efficient as possible performance.

# Build the basic version

0. Need to have installed [kokkos](https://github.com/kokkos/kokkos)

* Kokkos backend can be CUDA, OpenMP, ...
* Compiler can be nvcc_wrapper, g++, xlc++, ...

1. Set env variable KOKKOS_PATH to the root directory where Kokkos is installed

2. cd basic; make

3. run

./mandelbrot.omp (or ./mandelbrot.cuda)

With default parameters (image of size 8192x8192), some performance of the basic version:
* Nvidia K80 : 1.5 seconds
* Power8 (g++ 4.8.5)
* 20 threads : 50.1 seconds
* 40 threads : 27.7 seconds
* 60 threads : 16.9 seconds
* 160 threads : 10.5 seconds

NB: version pipeline1 require kokkos to be configured with both CUDA and OPENMP
backends, and lambda function enabled. Example command line configuration:

generate_makefile.bash --with-cuda --arch=${YOUR_CUDA_ARCH} --prefix=${SOMEWHERE} --with-cuda-options=enable_lambda --with-openmp

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pkestene/mandelbrot_kokkos

Awesome Lists containing this project

README