https://github.com/marcotallone/collective-operations-latency

Modelling of MPI collective operations latencies: Broadcast and Reduce operations. UniTS, SDIC, 2023-2024
https://github.com/marcotallone/collective-operations-latency

broadcast collective-communication hpc hpc-cluster latency modelling mpi osu reduce

Last synced: 1 day ago
JSON representation

Modelling of MPI collective operations latencies: Broadcast and Reduce operations. UniTS, SDIC, 2023-2024

Host: GitHub
URL: https://github.com/marcotallone/collective-operations-latency
Owner: marcotallone
License: mit
Created: 2024-06-12T19:16:57.000Z (almost 2 years ago)
Default Branch: master
Last Pushed: 2024-06-14T06:30:20.000Z (almost 2 years ago)
Last Synced: 2025-03-01T03:29:01.901Z (over 1 year ago)
Topics: broadcast, collective-communication, hpc, hpc-cluster, latency, modelling, mpi, osu, reduce
Language: Jupyter Notebook
Homepage:
Size: 18.1 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

[![Forks][forks-shield]][forks-url]
[![Stargazers][stars-shield]][stars-url]
[![Issues][issues-shield]][issues-url]
[![MIT License][license-shield]][license-url]
[![LinkedIn][linkedin-shield]][linkedin-url]
[![Gmail][gmail-shield]][gmail-url]

Modeling of MPI Collective Operations

A case study of the MPI Broadcast and MPI Reduce collective operations.

Explore the docs »

Read the official report »

View Demo
·
Report Bug
·
Request Feature

📑 Table of Contents

About The Project
- Built With

Getting Started
- Prerequisites
- Installation

Usage

License

Contact

References

Acknowledgments

## About The Project

This repository contains collected data and analysis of the latencies for the `MPI_Bcast` and `MPI_Reduce` collective operations. This study has been conducted as part of the final exam for the High Performance Computing (HPC) corse held at the University of Trieste (**U**ni**TS**) during the academic year 2023-2024.\
The data have been mainly collected on `EPYC` nodes on the **ORFEO** cluster at AREA Science Park, Basovizza (TS) in January/February 2024 using the well known `OSU` benchmark and are available in the `datasets/` folder. These nodes are equipped with AMD EPYC 7H12 (Rome) processors.\
More in depth informations and further details are available in the [attached report](./mpi-collective-op-marco-tallone.pdf) you can find in this repository.\
This repository also contains a `Python` package named `epyc` containing a small simulative model for `EPYC` nodes that has been used to conduct the necessary computations for the analysis.
The module is essentially a collection of classes and methods that allows the user to simulate MPI core allocation on a real `EPYC` machine. The module, in fact, allows to create `Node` objects and to initialize a certain number of processes on them according to different mapping policies, as done by the `map-by` option of the `mpirun` command of the MPI library.\
The module is also able to simulate the latency of the `MPI_Broadcast` and `MPI_Reduce` collective operations on the `EPYC` nodes, using the data collected on the **ORFEO** cluster. The latency is predicted based on a point-to-point communication model. Once more, further details on the model are available on the report in this repository. The data have been collected through the submission of several jobs that can be found in the `jobs/` folder.\
The module also offers few utility functions to plot and to perform statistical analysis on the collected data collected for the latencies.\
The majority of the implemented functions and classes are documented, hence further info about inputs and usage can be obtained with the `help` function in Python.\
Some usage examples can be found in the `apps/` folder or in the Jupiter notebooks in the `notebooks/` folder.

### Built With

![C](https://img.shields.io/badge/C-CBF9DA?style=for-the-badge&logo=c&logoColor=darkblue)
![Python](https://img.shields.io/badge/Python-3DD2CC?style=for-the-badge&logo=python&logoColor=white)
![NeoVim](https://img.shields.io/badge/NeoVim-3E6B89?style=for-the-badge&logo=neovim&logoColor=white)
![Bash](https://img.shields.io/badge/Bash-122D42?style=for-the-badge&logo=gnu-bash&logoColor=white)

(back to top)

## Getting Started

If you want to use the implemented `epyc` module you can follow these steps. It is anyway recommended to take a look at the scripts in the `apps/` folder for eventual [usage examples](./apps/example.py).

(back to top)

### Prerequisites

Prerequisites needed to repeat the measurements and the data analysis:

* `Python 3.10` or higher
* `OSU MPI Benchmarks` installed on the target machine (version 7.3)
* `OpenMPI` library installed on the target machine
* Access to **ORFEO** cluster or any equivalent HPC platform with **SLURM** scheduler

### Installation

The module comes with a `setup.py` file in the root directory, hence it can be installed with the following command:

```bash
pip install -e .
```

from the root directory of the project.
After that, the module can be imported in any Python script or notebook with the following command:

```python
import epyc
```

Or, alternatively to also use the utilities functions one can import:

```python
from epyc import *
from utils import *
```

Alternatively the modules can be used by manually updating the `PYTHONPATH` environment before running the scripts or notebooks.

(back to top)

## Usage

The `example.py` script in the `apps/` folder contains some [usage examples](apps/example.py) of the implemented classes and methods. The script can be run with the following command:

```bash
python apps/examples.py
```

from the root directory. Running the script will produce the following output:

```terminal
Now these nodes are empty:

Node 0:
┌──────────────────┐ ┌──────────────────┐
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
└──────────────────┘ └──────────────────┘

Node 1:
┌──────────────────┐ ┌──────────────────┐
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
└──────────────────┘ └──────────────────┘
Now we initialize the nodes with 2 processes each and mapby node:

Node 0:
┌──────────────────┐ ┌──────────────────┐
│ ✅⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
└──────────────────┘ └──────────────────┘

Node 1:
┌──────────────────┐ ┌──────────────────┐
│ ✅⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
│ ⬛⬛⬛⬛⬛⬛⬛⬛ │ │ ⬛⬛⬛⬛⬛⬛⬛⬛ │
└──────────────────┘ └──────────────────┘
Now we re-allocate the processes with socket mapping and going up to 132 processes
Let's see where each process is:

Node 0:
┌─────────────────────────────────┐ ┌─────────────────────────────────┐
│ 0 2 4 6 8 10 12 14 │ │ 1 3 5 7 9 11 13 15 │
│ 16 18 20 22 24 26 28 30 │ │ 17 19 21 23 25 27 29 31 │
│ 32 34 36 38 40 42 44 46 │ │ 33 35 37 39 41 43 45 47 │
│ 48 50 52 54 56 58 60 62 │ │ 49 51 53 55 57 59 61 63 │
│ 64 66 68 70 72 74 76 78 │ │ 65 67 69 71 73 75 77 79 │
│ 80 82 84 86 88 90 92 94 │ │ 81 83 85 87 89 91 93 95 │
│ 96 98 100 102 104 106 108 110 │ │ 97 99 101 103 105 107 109 111 │
│ 112 114 116 118 120 122 124 126 │ │ 113 115 117 119 121 123 125 127 │
└─────────────────────────────────┘ └─────────────────────────────────┘

Node 1:
┌─────────────────────────────────┐ ┌─────────────────────────────────┐
│ 128 130 │ │ 129 131 │
│ │ │ │
│ │ │ │
│ │ │ │
│ │ │ │
│ │ │ │
│ │ │ │
│ │ │ │
└─────────────────────────────────┘ └─────────────────────────────────┘
We now re-allocated the node filling them completely with 256 processes and mapby core
In fact here is their status:

Node 0:
┌──────────────────┐ ┌──────────────────┐
│ ✅✅✅✅✅✅✅✅ │ │ ✅✅✅✅✅✅✅✅ │
│ ✅✅✅✅✅✅✅✅ │ │ ✅✅✅✅✅✅✅✅ │
│ ✅✅✅✅✅✅✅✅ │ │ ✅✅✅✅✅✅✅✅ │
│ ✅✅✅✅✅✅✅✅ │ │ ✅✅✅✅✅✅✅✅ │
│ ✅✅✅✅✅✅✅✅ │ │ ✅✅✅✅✅✅✅✅ │
│ ✅✅✅✅✅✅✅✅ │ │ ✅✅✅✅✅✅✅✅ │
│ ✅✅✅✅✅✅✅✅ │ │ ✅✅✅✅✅✅✅✅ │
│ ✅✅✅✅✅✅✅✅ │ │ ✅✅✅✅✅✅✅✅ │
└──────────────────┘ └──────────────────┘
Active cores: 128 / 128
Empty cores: 0 /128
Active sockets: 2 / 2
Empty sockets: 0 / 2

Node 1:
┌──────────────────┐ ┌──────────────────┐
│ ✅✅✅✅✅✅✅✅ │ │ ✅✅✅✅✅✅✅✅ │
│ ✅✅✅✅✅✅✅✅ │ │ ✅✅✅✅✅✅✅✅ │
│ ✅✅✅✅✅✅✅✅ │ │ ✅✅✅✅✅✅✅✅ │
│ ✅✅✅✅✅✅✅✅ │ │ ✅✅✅✅✅✅✅✅ │
│ ✅✅✅✅✅✅✅✅ │ │ ✅✅✅✅✅✅✅✅ │
│ ✅✅✅✅✅✅✅✅ │ │ ✅✅✅✅✅✅✅✅ │
│ ✅✅✅✅✅✅✅✅ │ │ ✅✅✅✅✅✅✅✅ │
│ ✅✅✅✅✅✅✅✅ │ │ ✅✅✅✅✅✅✅✅ │
└──────────────────┘ └──────────────────┘
Active cores: 128 / 128
Empty cores: 0 /128
Active sockets: 2 / 2
Empty sockets: 0 / 2
We can simulate different collective operations seeing their latency, for instance here for a message size of 1B:
- linear broadcast latency: 50.573211704623475 us
- chain broadcast latency: 55.030291447907615 us
- binary broadcast latency: 11.795976127944595 us
- linear reduce latency: 0.29013244483611006 us
- chain reduce latency: 34.846116497184674 us
- binary reduce latency: 1.785089908861254 us
```

(back to top)

## License

Distributed under the MIT License. See `LICENSE.txt` for more information.

(back to top)

## Contact

| Contact Me | |
| --- | --- |
| Mail | |
| LinkedIn | [LinkedIn Page](https://linkedin.com/in/marco-tallone-40312425b) |
| GitHub | [marcotallone](https://github.com/marcotallone) |

(back to top)

## References

- [OSU Benchmark](http://mvapich.cse.ohio-state.edu/benchmarks/)
- [MPI Forum](https://www.mpi-forum.org/)
- [AREA Science Park](https://www.area.trieste.it/it/infrastrutture/orfeo)
- [ORFEO Documentation](https://orfeo-doc.areasciencepark.it/)
- [HPC UniTS](https://github.com/Foundations-of-HPC/High-Performance-Computing-2023)

(back to top)

## Acknowledgments

* [Best-README-Template](https://github.com/othneildrew/Best-README-Template?tab=readme-ov-file)

(back to top)

[contributors-shield]: https://img.shields.io/github/contributors/marcotallone/collective-operations-latency.svg?style=for-the-badge
[contributors-url]: https://github.com/marcotallone/collective-operations-latency/graphs/contributors
[forks-shield]: https://img.shields.io/github/forks/marcotallone/collective-operations-latency.svg?style=for-the-badge
[forks-url]: https://github.com/marcotallone/collective-operations-latency/network/members
[stars-shield]: https://img.shields.io/github/stars/marcotallone/collective-operations-latency.svg?style=for-the-badge
[stars-url]: https://github.com/marcotallone/collective-operations-latency/stargazers
[issues-shield]: https://img.shields.io/github/issues/marcotallone/collective-operations-latency.svg?style=for-the-badge
[issues-url]: https://github.com/marcotallone/collective-operations-latency/issues
[license-shield]: https://img.shields.io/github/license/marcotallone/collective-operations-latency.svg?style=for-the-badge
[license-url]: https://github.com/marcotallone/collective-operations-latency/blob/master/LICENSE

[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-blue?style=for-the-badge&logo=linkedin&logoColor=white&colorB=0077B5
[linkedin-url]: https://linkedin.com/in/marco-tallone-40312425b

[gmail-shield]: https://img.shields.io/badge/-Gmail-red?style=for-the-badge&logo=gmail&logoColor=white&colorB=red
[gmail-url]: mailto:marcotallone85@gmail.com

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/marcotallone/collective-operations-latency

Awesome Lists containing this project

README

Modeling of MPI Collective Operations