https://github.com/larsgeb/fd-wave-modelling-gpu

Forward 2D elastic wave equation modelling using either OpenMP or OpenACC. Compiles with PGI compiler.
https://github.com/larsgeb/fd-wave-modelling-gpu

gpu-acceleration nvidia-cuda openacc openmp seismic-waves wave-propagation

Last synced: 12 months ago
JSON representation

Forward 2D elastic wave equation modelling using either OpenMP or OpenACC. Compiles with PGI compiler.

Host: GitHub
URL: https://github.com/larsgeb/fd-wave-modelling-gpu
Owner: larsgeb
License: bsd-3-clause
Created: 2018-12-30T16:14:23.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2019-08-16T07:15:46.000Z (over 6 years ago)
Last Synced: 2025-02-10T21:19:31.998Z (about 1 year ago)
Topics: gpu-acceleration, nvidia-cuda, openacc, openmp, seismic-waves, wave-propagation
Language: C++
Size: 28.3 KB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # fd-wave-modelling-gpu

Forward 2D elastic wave equation modelling using either OpenMP or OpenACC. Compiles with PGI compiler.

Compilation is fairly easy with CMake. Just make sure to point towards your C++ and C compiler in the CMakeLists.txt. Compilation is done by:

```

    $ cmake . -DFLOATS=OFF    // or -DFLOATS=ON

    $ make gpuWave

    $ make cpuWave

```

Running the GPU code is straightforward:

```

    $./gpuWave

```

Running the CPU code requires setting the OMP_NUM_THREADS environment variable to correspond to your preference (usually the amount of physical, 

not logical, cores in your pc). In my case, I use a Intel i7-8850H, 12 threads, 6 cores. Although I could use 12 threads, it probably won't be any 

faster as the process would be using all available physical cores anyway. If I want

 to use 6 threads for 

just one run:

```

    $ OMP_NUM_THREADS=6 ./cpu.program

 ```

 ## Main controls on speed

GPU's are very fast in some very specific cases. They are fastest when there is a lot of work (computations) to be done, with limited memory 

copies to the host machine. Conditional statements typically decrease GPU performance. However, porting the wave propagation code required minimal

alteration from the CPU code. One source code now can be compiled to both targets.

  

GPU's are fastest when the blocks they work on are not too small such that they must shift positions often, but also not too big such that only a

few blocks fit in the computational domain. Very small physical problems will therefore likely be faster on CPU code.

   

The type of computation performed also affects running time. GPU's are ideal for float operations, but are on par with CPU's on double operations. 

See also the benchmarks below. 

 

For extended computation, GPU seems to have better performance even on floats. 

 

 

 ## Benchmark

 Benchmark on a Dell Precision 5530 using a Quadro P2000 (4GB) vs. an Intel i7-8850H, 16GB ram. The wave problem solved had dimensions:

 ```

    nt = 250

    nx = 4096

    nz = 1024

```

 The dimension nt only affects time linearly, and does typically not affect memory usage when not storing wavefields.

 

 The number shown at the end of the computation is the summation over 1 array of the wavefield vx, to ensure deterministic computations. Re-rerunning should give the same result, CPU/GPU should give the same result, double vs. float should not give the same result..

 

 

 **Using floats:**

 

 ```

$ ./gpuWave && ./cpuWave 

OpenACC acceleration enabled from cmake, code should run on GPU.

Code compiled with f (d for double, accurate, f for float, fast)

Seconds elapsed for wave simulation: 2.27787

-3.28162e-17

OpenACC acceleration not enabled from cmake, code should run on CPU.

Code compiled with f (d for double, accurate, f for float, fast)

Seconds elapsed for wave simulation: 5.87679

-3.28162e-17

```

**Using doubles:**

```

$ ./gpuWave && ./cpuWave 

OpenACC acceleration enabled from cmake, code should run on GPU.

Code compiled with d (d for double, accurate, f for float, fast)

Seconds elapsed for wave simulation: 7.25039

-3.2829e-17

OpenACC acceleration not enabled from cmake, code should run on CPU.

Code compiled with d (d for double, accurate, f for float, fast)

Seconds elapsed for wave simulation: 7.20166

-3.2829e-17

 

```

As expected, different precisions give different deterministic results, to within 1%.

Running nvidia-smi during a GPU run shows full utilization of cores, not nearly full utilization of memory:

```

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |

|-------------------------------+----------------------+----------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|===============================+======================+======================|

|   0  Quadro P2000        Off  | 00000000:01:00.0 Off |                  N/A |

| N/A   54C    P0    N/A /  N/A |    747MiB /  4042MiB |    100%      Default |

+-------------------------------+----------------------+----------------------+

                                                                               

+-----------------------------------------------------------------------------+

| Processes:                                                       GPU Memory |

|  GPU       PID   Type   Process name                             Usage      |

|=============================================================================|

|    0     xxxxx      G   ---- other processes ----                    166MiB |

|    0     xxxxx      G   ---- other processes ----                     84MiB |

|    0     xxxxx      G   ---- other processes ----                      4MiB |

|    0     xxxxx      G   ---- other processes ----                     44MiB |

|    0     xxxxx      G   ---- other processes ----                     35MiB |

|    0     28689      C   ./gpuWave                                    401MiB |

+-----------------------------------------------------------------------------+

```

**Large computations: GPU outperforms CPU on doubles**

Rerunning with:

 ```

    nt = 2500   // This changed

    nx = 4096

    nz = 1024

```

Gives:

```

$ ./gpuWave && ./cpuWave 

OpenACC acceleration enabled from cmake, code should run on GPU.

Code compiled with d (d for double, accurate, f for float, fast)

Seconds elapsed for wave simulation: 64.0353

-5.00058e-20

OpenACC acceleration not enabled from cmake, code should run on CPU.

Code compiled with d (d for double, accurate, f for float, fast)

Seconds elapsed for wave simulation: 78.9906

-5.00058e-20

```

Faster on GPU!

Also on floats of course:

```

$ ./gpuWave && ./cpuWave 

OpenACC acceleration enabled from cmake, code should run on GPU.

Code compiled with f (d for double, accurate, f for float, fast)

Seconds elapsed for wave simulation: 20.8963

-7.10227e-20

OpenACC acceleration not enabled from cmake, code should run on CPU.

Code compiled with f (d for double, accurate, f for float, fast)

Seconds elapsed for wave simulation: 62.6243

-7.10227e-20

```

Mind the strong deviation in deterministic sums.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/larsgeb/fd-wave-modelling-gpu

Awesome Lists containing this project

README