https://github.com/piyueh/score_es_benchmark

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/piyueh/score_es_benchmark
Owner: piyueh
Created: 2023-06-28T20:28:23.000Z (about 2 years ago)
Default Branch: master
Last Pushed: 2023-10-12T18:08:26.000Z (almost 2 years ago)
Last Synced: 2025-02-15T23:28:52.512Z (5 months ago)
Language: Python
Size: 728 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

Self-contained reproducibility package for the following poster:
> P.-Y. Chuang, N. Shah, A. Attia, E. Constantinescu, & W. Feng (2023). A Recipe
> for Transforming High-Productivity DSL Code into High-Performance Code for
> Exascale Systems. _2023 SciDAC PI Meeting_

#### Disclaimer

The results are mostly computational performance. Thererfore, using different
hardware may give different values on the figures.

#### License Notice

No license. Except for GitHub's implicit permissions, all other uses of this
repository require explicit permission granted from the author.

## Preparation

Most of dependencies can be managed by Anaconda or its derived tools. Assuming
under the top-level repository directory, create an environment named
`scoring-test` with the command in the terminal:
```shell
conda env create -n scoring-test -f environment.yaml
```

And then get into this conda environment:
```shell
conda activate scoring-test
```

Now we need to compile C-extensions needed. This requires both GCC and Clang
compilers because we will also compare the two compilers' performances. Once
having the GCC and Clang compiler on the host computer, do
```shell
source build.sh
```

Lastly, we need to create the training data (i.e., mock experimental data). Do
```shell
python data_generator.py surface
python data_generator.py training
```
The data will be saved to a folder called `data`.

## Speedups of the loss function alone

This steps will create the figure as below:

First, do the calculation to obtain the results:
```shell
mpiexec -n python earth_mover_perf.py
```
`` is the wanted number of MPI tasks. Note that this is a single-core
profiling, so when launching the MPI tasks, make sure they are properly
configured so that tasks do not interfere with others. Also, several of the
single-core target functions require as much as 30G of memory.

To save the hassle, you can use the script `earth_mover_perf.sh` to submit the
calculation to a Slurm-based cluster. The resources (task topology, memory, etc)
are pre-configured, but you may still need to edit it according to the actual
configuration of the cluster.

Then, create the figure with
```shell
python postprocess_1.py
```
The figure will be saved to `images/earth_mover_speedups.png`

## Surface of the loss function

The steps will create the following figure

This step is not parallelized by MPI but just the built-in `multiprocessing`. Do
```shell
python earth_mover_surface.py
```
That being said, this Python script only works on a single node, and ``
cannot exceed what this single node has.

Once the calculation is done, create the figure with
```shell
python postprocess_2.py
```
The figure will be saved to `images/earth_mover_surface.png`

## Whole-workflow profiling before and after optimizing the loss

This step will create the following two figures

and

Do the calculations one-by-one with
```shell
python stats_workflow_perf.py small numpy
python stats_workflow_perf.py small c
python stats_workflow_perf.py medium numpy
python stats_workflow_perf.py medium c
python stats_workflow_perf.py large c
python stats_workflow_perf.py large numpy
```
Note that `python stats_workflow_perf.py large numpy ` needs about 80G of memory
and takes about 3.5 hours to finish on AMD EPYC 7702. Or, you can use the script
`stats_workflow_perf.sh` to submit a batch job to a Slurm-based cluster. Again,
the resources are pre-configured, but you still need to edit it accordingly.

Once the calculations are done, create the figures with
```shell
python postprocess_3.py
python postprocess_4.py
```
The figures will be saved to `images/profiling_before.png` and
`images/overall_speedups.png`.

## Ensembel analysis

This steps will create the following figure:

This calculation is parallelized with MPI. Do
```shell
mpiexec -n python stats_workflow_uq.py
```
Again, `` denotes the number of MPI processes wanted. It took about 70
minutes with 512 CPUs (AMD EPYC 7702). Alternatively, the script
`stats_workflow_uq.sh` can be used for a Slurm-based cluster. It needs to be
edited according to the cluster's configuration.

Once the calculation is done, do
```shell
python postprocess_5.py
```
The figure will be saved to `images/ensemble_analysis.png`.

## Contact

Pi-Yueh Chuang

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/piyueh/score_es_benchmark

Awesome Lists containing this project

README