https://github.com/fzj-jsc/tutorial-multi-gpu

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
https://github.com/fzj-jsc/tutorial-multi-gpu

cuda exascale-computing gpu hpc isc22 isc23 isc24 isc25 mpi multi-gpu nccl nvshmem sc21 sc22 sc23 sc24 sc25 supercomputing

Last synced: about 1 month ago
JSON representation

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

Host: GitHub
URL: https://github.com/fzj-jsc/tutorial-multi-gpu
Owner: FZJ-JSC
License: mit
Created: 2021-09-23T08:12:06.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2025-12-03T14:53:48.000Z (3 months ago)
Last Synced: 2025-12-06T19:23:31.923Z (3 months ago)
Topics: cuda, exascale-computing, gpu, hpc, isc22, isc23, isc24, isc25, mpi, multi-gpu, nccl, nvshmem, sc21, sc22, sc23, sc24, sc25, supercomputing
Language: Cuda
Homepage:
Size: 182 MB
Stars: 335
Watchers: 13
Forks: 68
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
- Zenodo: .zenodo.json

Awesome Lists containing this project

README

# SC25 Tutorial: Efficient Distributed GPU Programming for Exascale

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5745504.svg)](https://doi.org/10.5281/zenodo.5745504)

Repository with talks and exercises of our Efficient GPU Programming for Exascale tutorial, to be held at [SC25](https://sc25.conference-program.com/presentation/?id=tut113&sess=sess252).

## Coordinates

* Date: 16 November 2025
* Occasion: SC25 Tutorial
* Tutors: Simon Garcia de Gonzalo (SNL), Andreas Herten (JSC), Lena Oden (Uni Hagen), David Appelhans (NVIDIA); with support by Markus Hrywniak (NVIDIA) and Jiri Kraus (NVIDIA)

## Setup

The tutorial is an interactive tutorial with introducing lectures and practical exercises to apply knowledge. The exercises have been derived from the Jacobi solver implementations available in [NVIDIA/multi-gpu-programming-models](https://github.com/NVIDIA/multi-gpu-programming-models).

Walk-through (only possible on-site at SC25!):

* Sign up at JuDoor
* Open Jupyter JSC: https://jupyter.jsc.fz-juelich.de
* Create new Jupyter instance on [JUWELS, using training26XX account, on **LoginNodeBooster**](https://jupyter.jsc.fz-juelich.de/workshops/sc25mg)
* Source course environment: `source $PROJECT_training26XX/env.sh`
* Sync material: `jsc-material-sync`
* Locally install NVIDIA Nsight Systems: https://developer.nvidia.com/nsight-systems

1. Lecture: Tutorial Overview, Introduction to System + Onboarding *Andreas*
2. Lecture: MPI-Distributed Computing with GPUs *Simon*
3. Hands-on: Multi-GPU Parallelization
4. Lecture: Performance / Debugging Tools *David*
5. Lecture: Optimization Techniques for Multi-GPU Applications *Simon*
6. Hands-on: Overlap Communication and Computation with MPI
7. Lecture: Overview of NCCL and NVSHMEN in MPI *Lena*
8. Hands-on: Using NCCL and NVSHMEM
9. Lecture: Device-initiated Communication with NVSHMEM *David*
10. Hands-on: Using Device-Initiated Communication with NVSHMEM
11. Lecture: Conclusion and Outline of Advanced Topics *Andreas*

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fzj-jsc/tutorial-multi-gpu

Awesome Lists containing this project

README