{"id":16073324,"url":"https://github.com/larsgeb/fd-wave-modelling-gpu","last_synced_at":"2025-04-05T10:26:39.476Z","repository":{"id":106139847,"uuid":"163600492","full_name":"larsgeb/fd-wave-modelling-gpu","owner":"larsgeb","description":"Forward 2D elastic wave equation modelling using either OpenMP or OpenACC. Compiles with PGI compiler.","archived":false,"fork":false,"pushed_at":"2019-08-16T07:15:46.000Z","size":29,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-10T21:19:31.998Z","etag":null,"topics":["gpu-acceleration","nvidia-cuda","openacc","openmp","seismic-waves","wave-propagation"],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/larsgeb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-12-30T16:14:23.000Z","updated_at":"2022-06-16T06:25:07.000Z","dependencies_parsed_at":null,"dependency_job_id":"7e4cdb43-32f6-47bc-9307-75bfd736e31e","html_url":"https://github.com/larsgeb/fd-wave-modelling-gpu","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/larsgeb%2Ffd-wave-modelling-gpu","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/larsgeb%2Ffd-wave-modelling-gpu/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/larsgeb%2Ffd-wave-modelling-gpu/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/larsgeb%2Ffd-wave-modelling-gpu/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/larsgeb","download_url":"https://codeload.github.com/larsgeb/fd-wave-modelling-gpu/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247321400,"owners_count":20919984,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gpu-acceleration","nvidia-cuda","openacc","openmp","seismic-waves","wave-propagation"],"created_at":"2024-10-09T08:06:42.072Z","updated_at":"2025-04-05T10:26:39.456Z","avatar_url":"https://github.com/larsgeb.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# fd-wave-modelling-gpu\nForward 2D elastic wave equation modelling using either OpenMP or OpenACC. Compiles with PGI compiler.\n\nCompilation is fairly easy with CMake. Just make sure to point towards your C++ and C compiler in the CMakeLists.txt. Compilation is done by:\n\n```\n    $ cmake . -DFLOATS=OFF    // or -DFLOATS=ON\n    $ make gpuWave\n    $ make cpuWave\n```\n\nRunning the GPU code is straightforward:\n```\n    $./gpuWave\n```\nRunning the CPU code requires setting the OMP_NUM_THREADS environment variable to correspond to your preference (usually the amount of physical, \nnot logical, cores in your pc). In my case, I use a Intel i7-8850H, 12 threads, 6 cores. Although I could use 12 threads, it probably won't be any \nfaster as the process would be using all available physical cores anyway. If I want\n to use 6 threads for \njust one run:\n```\n    $ OMP_NUM_THREADS=6 ./cpu.program\n ```\n ## Main controls on speed\nGPU's are very fast in some very specific cases. They are fastest when there is a lot of work (computations) to be done, with limited memory \ncopies to the host machine. Conditional statements typically decrease GPU performance. However, porting the wave propagation code required minimal\nalteration from the CPU code. One source code now can be compiled to both targets.\n  \nGPU's are fastest when the blocks they work on are not too small such that they must shift positions often, but also not too big such that only a\nfew blocks fit in the computational domain. Very small physical problems will therefore likely be faster on CPU code.\n   \nThe type of computation performed also affects running time. GPU's are ideal for float operations, but are on par with CPU's on double operations. \nSee also the benchmarks below. \n \nFor extended computation, GPU seems to have better performance even on floats. \n \n \n ## Benchmark\n Benchmark on a Dell Precision 5530 using a Quadro P2000 (4GB) vs. an Intel i7-8850H, 16GB ram. The wave problem solved had dimensions:\n ```\n    nt = 250\n    nx = 4096\n    nz = 1024\n```\n The dimension nt only affects time linearly, and does typically not affect memory usage when not storing wavefields.\n \n The number shown at the end of the computation is the summation over 1 array of the wavefield vx, to ensure deterministic computations. Re-rerunning should give the same result, CPU/GPU should give the same result, double vs. float should not give the same result..\n \n \n **Using floats:**\n \n ```\n$ ./gpuWave \u0026\u0026 ./cpuWave \n\nOpenACC acceleration enabled from cmake, code should run on GPU.\nCode compiled with f (d for double, accurate, f for float, fast)\nSeconds elapsed for wave simulation: 2.27787\n-3.28162e-17\n\nOpenACC acceleration not enabled from cmake, code should run on CPU.\nCode compiled with f (d for double, accurate, f for float, fast)\nSeconds elapsed for wave simulation: 5.87679\n-3.28162e-17\n```\n**Using doubles:**\n```\n$ ./gpuWave \u0026\u0026 ./cpuWave \n\nOpenACC acceleration enabled from cmake, code should run on GPU.\nCode compiled with d (d for double, accurate, f for float, fast)\nSeconds elapsed for wave simulation: 7.25039\n-3.2829e-17\n\nOpenACC acceleration not enabled from cmake, code should run on CPU.\nCode compiled with d (d for double, accurate, f for float, fast)\nSeconds elapsed for wave simulation: 7.20166\n-3.2829e-17\n \n```\nAs expected, different precisions give different deterministic results, to within 1%.\n\n\nRunning nvidia-smi during a GPU run shows full utilization of cores, not nearly full utilization of memory:\n```\n+-----------------------------------------------------------------------------+\n| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |\n|-------------------------------+----------------------+----------------------+\n| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n|===============================+======================+======================|\n|   0  Quadro P2000        Off  | 00000000:01:00.0 Off |                  N/A |\n| N/A   54C    P0    N/A /  N/A |    747MiB /  4042MiB |    100%      Default |\n+-------------------------------+----------------------+----------------------+\n                                                                               \n+-----------------------------------------------------------------------------+\n| Processes:                                                       GPU Memory |\n|  GPU       PID   Type   Process name                             Usage      |\n|=============================================================================|\n|    0     xxxxx      G   ---- other processes ----                    166MiB |\n|    0     xxxxx      G   ---- other processes ----                     84MiB |\n|    0     xxxxx      G   ---- other processes ----                      4MiB |\n|    0     xxxxx      G   ---- other processes ----                     44MiB |\n|    0     xxxxx      G   ---- other processes ----                     35MiB |\n|    0     28689      C   ./gpuWave                                    401MiB |\n+-----------------------------------------------------------------------------+\n```\n\n\n**Large computations: GPU outperforms CPU on doubles**\n\nRerunning with:\n ```\n    nt = 2500   // This changed\n    nx = 4096\n    nz = 1024\n```\nGives:\n```\n$ ./gpuWave \u0026\u0026 ./cpuWave \n\nOpenACC acceleration enabled from cmake, code should run on GPU.\nCode compiled with d (d for double, accurate, f for float, fast)\nSeconds elapsed for wave simulation: 64.0353\n-5.00058e-20\n\nOpenACC acceleration not enabled from cmake, code should run on CPU.\nCode compiled with d (d for double, accurate, f for float, fast)\nSeconds elapsed for wave simulation: 78.9906\n-5.00058e-20\n```\nFaster on GPU!\n\nAlso on floats of course:\n```\n$ ./gpuWave \u0026\u0026 ./cpuWave \n\nOpenACC acceleration enabled from cmake, code should run on GPU.\nCode compiled with f (d for double, accurate, f for float, fast)\nSeconds elapsed for wave simulation: 20.8963\n-7.10227e-20\n\nOpenACC acceleration not enabled from cmake, code should run on CPU.\nCode compiled with f (d for double, accurate, f for float, fast)\nSeconds elapsed for wave simulation: 62.6243\n-7.10227e-20\n```\nMind the strong deviation in deterministic sums.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flarsgeb%2Ffd-wave-modelling-gpu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flarsgeb%2Ffd-wave-modelling-gpu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flarsgeb%2Ffd-wave-modelling-gpu/lists"}