https://github.com/fzj-jsc/tutorial-multi-gpu
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
https://github.com/fzj-jsc/tutorial-multi-gpu
cuda exascale-computing gpu hpc isc22 isc23 isc24 isc25 mpi multi-gpu nccl nvshmem sc21 sc22 sc23 sc24 sc25 supercomputing
Last synced: 3 days ago
JSON representation
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
- Host: GitHub
- URL: https://github.com/fzj-jsc/tutorial-multi-gpu
- Owner: FZJ-JSC
- License: mit
- Created: 2021-09-23T08:12:06.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2025-12-03T14:53:48.000Z (2 months ago)
- Last Synced: 2025-12-06T19:23:31.923Z (2 months ago)
- Topics: cuda, exascale-computing, gpu, hpc, isc22, isc23, isc24, isc25, mpi, multi-gpu, nccl, nvshmem, sc21, sc22, sc23, sc24, sc25, supercomputing
- Language: Cuda
- Homepage:
- Size: 182 MB
- Stars: 335
- Watchers: 13
- Forks: 68
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
- Zenodo: .zenodo.json
Awesome Lists containing this project
README
# SC25 Tutorial: Efficient Distributed GPU Programming for Exascale
[](https://doi.org/10.5281/zenodo.5745504)
Repository with talks and exercises of our Efficient GPU Programming for Exascale tutorial, to be held at [SC25](https://sc25.conference-program.com/presentation/?id=tut113&sess=sess252).
## Coordinates
* Date: 16 November 2025
* Occasion: SC25 Tutorial
* Tutors: Simon Garcia de Gonzalo (SNL), Andreas Herten (JSC), Lena Oden (Uni Hagen), David Appelhans (NVIDIA); with support by Markus Hrywniak (NVIDIA) and Jiri Kraus (NVIDIA)
## Setup
The tutorial is an interactive tutorial with introducing lectures and practical exercises to apply knowledge. The exercises have been derived from the Jacobi solver implementations available in [NVIDIA/multi-gpu-programming-models](https://github.com/NVIDIA/multi-gpu-programming-models).
Walk-through (only possible on-site at SC25!):
* Sign up at JuDoor
* Open Jupyter JSC: https://jupyter.jsc.fz-juelich.de
* Create new Jupyter instance on [JUWELS, using training26XX account, on **LoginNodeBooster**](https://jupyter.jsc.fz-juelich.de/workshops/sc25mg)
* Source course environment: `source $PROJECT_training26XX/env.sh`
* Sync material: `jsc-material-sync`
* Locally install NVIDIA Nsight Systems: https://developer.nvidia.com/nsight-systems
1. Lecture: Tutorial Overview, Introduction to System + Onboarding *Andreas*
2. Lecture: MPI-Distributed Computing with GPUs *Simon*
3. Hands-on: Multi-GPU Parallelization
4. Lecture: Performance / Debugging Tools *David*
5. Lecture: Optimization Techniques for Multi-GPU Applications *Simon*
6. Hands-on: Overlap Communication and Computation with MPI
7. Lecture: Overview of NCCL and NVSHMEN in MPI *Lena*
8. Hands-on: Using NCCL and NVSHMEM
9. Lecture: Device-initiated Communication with NVSHMEM *David*
10. Hands-on: Using Device-Initiated Communication with NVSHMEM
11. Lecture: Conclusion and Outline of Advanced Topics *Andreas*