https://github.com/hdfgroup/hdf5-iotest

HDF5 Performance Analysis Checklist
https://github.com/hdfgroup/hdf5-iotest
Last synced: 10 months ago
JSON representation
HDF5 Performance Analysis Checklist
Host: GitHub
URL: https://github.com/hdfgroup/hdf5-iotest
Owner: HDFGroup
License: other
Created: 2020-07-21T00:33:26.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2024-12-23T23:27:45.000Z (over 1 year ago)
Last Synced: 2025-08-12T21:11:35.264Z (11 months ago)
Language: Jupyter Notebook
Homepage:
Size: 1.3 MB
Stars: 13
Watchers: 4
Forks: 9
Open Issues: 6
Metadata Files:
- Readme: README.org
- License: COPYING
Awesome Lists containing this project

README

          #+TITLE:  HDF5 I/O Test

#+AUTHOR: Gerd Heber

#+EMAIL: gheber@hdfgroup.org

#+DATE: [2020-12-30 Wed]

#+PROPERTY: header-args :eval never-export

* Description

This is a simple I/O performance tester for HDF5.  (See also [[https://github.com/ornladios/ADIOS2/tree/master/source/utils/adios_iotest][ADIOS IOTEST]].) Its

purpose is to assess the /performance variability/ of a set of logically

equivalent HDF5 representations of a common pattern. The test repeatedly writes

(and reads) in parallel a set of 2D array variables in a tiled fashion, over a

set of "time steps." A schematic is shown below.

#+begin_src plantuml :hidden :file ./img/flow.png :exports results

start

:read configuration;

repeat

  :set internal parameters;

  :create HDF5 file;

  repeat

    :[create group];

    repeat

      :[create dataset(s)];

      fork

        :write 2D tile 1;

      fork again

        :...;

      fork again

        :write 2D tile P;

      end fork

    repeat while (last 2D array?) is (no)

    ->yes;

  repeat while (last time step?) is (no)

  ->yes;

  :close file;

  :...

  read back the data

  **details not shown**;

  :report timings;

repeat while (last combination?) is (no)

->yes;

stop

#+end_src

#+RESULTS:

[[file:./img/flow.png]]

A configuration file (see below) is used to control parameters such as the

number of steps, the count and shape of the 2D array variables, etc. (See

section [[sec:parameters]].) The test then uses between 48 and 192 different

combinations of up to 7 internal parameters (see section [[sec:internal-parameters]])

to configure the HDF5 library and output, to perform write and read operations,

and to report certain timings. (The number of combinations is different for

sequential and parallel runs, and depends on other choices.)

For a /baseline/, this test can (and should!) be run with a single MPI process

using the POSIX, core, and MPI-IO VFDs. Typically, each baseline variant covers

a different aspect of the underlying system, its configuration, and the

particular HDF5 library version. Doing MPI-parallel runs without this baseline

is the proverbial "getting off on the wrong foot."

* Installation

While it can be built manually, we recommend building =hdf5_iotest= from [[https://computing.llnl.gov/projects/spack-hpc-package-manager][Spack]].

1. Create a new Spack package

   #+begin_src sh

   spack create --name hdf5iotest

   #+end_src

   This creates a boilerplate =package.py= file, which we'll need to edit.

2. Edit the freshly minted =package.py= via

   #+begin_src sh

   spack edit hdf5iotest

   #+end_src

   =package.py= should look like this:

   #+begin_src python

   # Copyright 2013-2020 Lawrence Livermore National Security, LLC and other

   # Spack Project Developers. See the top-level COPYRIGHT file for details.

   #

   # SPDX-License-Identifier: (Apache-2.0 OR MIT)

   from spack import *

   class Hdf5iotest(AutotoolsPackage):

       """

       A simple I/O performance test for HDF5.

       Run with 'hdf5_iotest '. A sample INI file called

       hdf5_iotest.ini can be found in the share/hdf5-iotest subdirectory.

       It is recommended to run multiple configurations. The combinator.sh

       script in share/hdf5-iotest creates 24 "standard" parameter cominations.

       """

       homepage = "https://github.com/HDFGroup/hdf5-iotest"

       git      = "https://github.com/HDFGroup/hdf5-iotest.git"

       maintainers = ['gheber']

       version('master', branch='master')

       depends_on('autoconf', type='build')

       depends_on('automake', type='build')

       depends_on('libtool',  type='build')

       depends_on('m4',       type='build')

       depends_on('mpi')

       depends_on('hdf5')

       depends_on('util-linux-uuid')

       def configure_args(self):

           args = []

           if "+gperftools" not in self.spec:

               args += ["--enable-gperftools"]

           return args

       def test(self):

           pass

   #+end_src

3. Show the explicit dependency configuration (libraries, versions, compiler, architecture, etc. ) via

   #+begin_src sh

   spack spec hdf5iotest

   #+end_src

   Sample output can be found in the [[sec:spack-spec-out][appendix]].

4. Install the hdf5iotest package and all dependencies

   #+begin_src sh

   spack install hdf5iotest

   #+end_src

** Installation w/o Spack

You need a working installation of parallel HDF5 (which depends on MPI).

#+begin_src sh

./autogen.sh

CC=h5pcc ./configure --prefix=

make

make install

#+end_src

* Usage

=hdf5_iotest= accepts a single argument, the name of a configuration file. If no

configuration file name is provided, it looks for a configuration named

=hdf5_iotest.ini= in the current working directory.

#+begin_src sh

hdf5_iotest [INI file]

#+end_src

The configuration file syntax is shown [[sec:parameters][below]]:

#+begin_src conf-unix :tangle src/hdf5_iotest.ini :noweb no-export

[DEFAULT]

<>

#+end_src

A complete configuration can be found [[https://raw.githubusercontent.com/HDFGroup/hdf5-iotest/master/src/hdf5_iotest.ini][here]].

** Custom Settings

Rather than modifying the =[DEFAULT]= section, we recommend that testers create

a new section, e.g., =[CUSTOM]=, and overwrite the default values as needed.

#+begin_example

[DEFAULT]

...

[CUSTOM]

...

#+end_example

Any parameter specified in the =[CUSTOM]= section overwrites its =[DEFAULT]=

counterpart.

** Parameters<>

The following configuration parameters are supported.

- Version :: The HDF5 I/O test configuration version

    #+begin_src conf-unix :noweb-ref hdf5-iotest-conf

 version = 0

    #+end_src

    Currently, 0 is the only valid version.

- Steps :: The number of steps or repetitions, a positive integer.

    #+begin_src conf-unix :noweb-ref hdf5-iotest-conf

 steps = 20

    #+end_src

- Number of 2D Array Variables :: The number of 2D array variables to be

  written, a positive integer.

    #+begin_src conf-unix :noweb-ref hdf5-iotest-conf

 arrays = 500

    #+end_src

- Array Rows :: HDF5 I/O test can be run in /strong/ or /weak/ scaling mode (see

  [[sec:scaling][below]]). For /strong/ scaling, this is the total number (across all MPI ranks)

  of rows of each 2D array variable. For /weak/ scaling, this is the number of

  rows per MPI process per 2D array variable.

    #+begin_src conf-unix :noweb-ref hdf5-iotest-conf

 rows = 100

    #+end_src

- Array Columns :: HDF5 I/O test can be run in /strong/ or /weak/ scaling mode

  (see [[sec:scaling][below]]). For /strong/ scaling, this is the total number (across all MPI

  ranks) of columns of each 2D array variable. For /weak/ scaling, this is the

  number of columns per MPI process per 2D array variable.

    #+begin_src conf-unix :noweb-ref hdf5-iotest-conf

 columns = 200

    #+end_src

- Number of MPI Process Rows :: HDF5 I/O test is run over a logical 2D grid

     of MPI processes. This is the number of MPI process rows.

    #+begin_src conf-unix :noweb-ref hdf5-iotest-conf

 process-rows = 1

    #+end_src

    For strong scaling, the =rows= must be divisible by =process-rows=.

- Number of MPI Process Columns :: HDF5 I/O test is run over a logical 2D grid

     of MPI processes. This is the number of MPI process columns.

    #+begin_src conf-unix :noweb-ref hdf5-iotest-conf

 process-columns = 1

    #+end_src

    For strong scaling, the =columns= parameter must be divisible by

  =process-columns=.

- Scaling<> :: HDF5 I/O test can be run with strong or weak

  scaling. In /strong scaling/ mode, the total amount of data written and read

  is independent of the number of MPI processes, i.e., the per process I/O share

  diminishes with an increase in the number of I/O processes. In /weak scaling/

  mode, the amount of data written and read by each MPI-process is kept

  constant, and the total I/O increases with the number of MPI processes.

    #+begin_src conf-unix :noweb-ref hdf5-iotest-conf

    # [weak, strong]

    scaling = weak

    #+end_src

- Alignment Increment :: Align HDF5 objects greater than or equal to an

  alignment threshold on addresses which are a multiple of this increment.

    #+begin_src conf-unix :noweb-ref hdf5-iotest-conf

    # align along increment [bytes] boundaries

    alignment-increment = 1

    #+end_src

  By default, there are no alignment restrictions in effect, and only

  increments greater than 1 have any effect.

- Alignment Threshold :: The minimum object size (in bytes) for which alignment

  constraints will be enforced. A threshold of 0 forces everything to be

  aligned.

    #+begin_src conf-unix :noweb-ref hdf5-iotest-conf

    # minimum object size [bytes] to force alignment (0=all objects)

    alignment-threshold = 0

    #+end_src

- Meta Block Size :: The minimum size of metadata block allocations

  when ~H5FD_FEAT_AGGREGATE_METADATA~ is set by a VFL driver.

    #+begin_src conf-unix :noweb-ref hdf5-iotest-conf

    # minimum metadata block allocation size in [bytes]

    meta-block-size = 2048

    #+end_src

  Setting the value to 0 with this function will turn off metadata

  aggregation, even if the VFL driver attempts to use the metadata

  aggregation strategy.

  The POSIX, core, and MPI-IO VFDs all support metadata allocation

  aggregation.

- Single Process I/O :: The I/O driver or mode to be used when running with a

  single process.

    #+begin_src conf-unix :noweb-ref hdf5-iotest-conf

    # [posix, core, mpi-io-uni]

    single-process = posix

    #+end_src

    This setting is important when establishing a single-process

  /baseline/. =posix= uses the default POSIX VFD. =core= uses a memory-backed

  HDF5 file where the underlying memory buffer grows in 64 MB

  increments. =mpi-io-uni= uses the MPI-IO VFD (with a single process).

- HDF5 Output File Name :: The default HDF5 output file name is

     =hdf5_iotest.h5=. Use this parameter to select a different name.

     *Note*: The character "#" in the filename is reserved for creating an HDF5 

     with the "#" replaced with the accumulated case number, for example *hdf5_iotest.#.h5*.

    #+begin_src conf-unix :noweb-ref hdf5-iotest-conf

    hdf5-file = hdf5_iotest.h5

    #+end_src

- Results File :: When running the HDF5 I/O test, certain metrics are printed to

                  =stdout=. To simplify the analysis of results from multiple

                  runs, they are also written to a CSV file whose name is

                  configurable.

    #+begin_src conf-unix :noweb-ref hdf5-iotest-conf

 csv-file = hdf5_iotest.csv

    #+end_src

- Restart :: The simulations will resume from (and including) the last successful

             entry in the result's CSV file. A value of 1 indicates a restart run,

             and 0 is no restart. If the keyword is not present, the default is 

             not a restart.

    #+begin_src conf-unix :noweb-ref hdf5-iotest-conf

    # [0, 1]

    restart = 1

    #+end_src

- Split :: The split I/O driver will be used [1]. Zero indicates to not use

    the split file diriver, default.

    #+begin_src conf-unix :noweb-ref hdf5-iotest-conf

    # [0, 1]

    split = 1

    #+end_src

- One-case :: The simulation will run only one case in the parameter space.

    A value not equal to 0 indicates which parameter case to run, and the

    value is a cumulative counter of the nested loops over the parameter space.

    For example, to rerun the 100th case in the CSV file, the value would be 100.

    If the keyword is not present, the default is to do all the cases.

    #+begin_src conf-unix :noweb-ref hdf5-iotest-conf

    # case number in the parameter space to run.

    one-case = 100

    #+end_src

- Compression ::  Specifies the compression filter for chunked datasets and

    currently supports /gzip/ and /szip/. The value corresponds to

    parameters in the corresponding HDF5 API. Valid parameters for "/gzip/" is an

    integer for the /level/ (see H5Pset_deflate). For "/szip/" value is

    /options_mask/, and /pixels_per_block/ (see H5Pset_szip).

    #+begin_src conf-unix :noweb-ref hdf5-iotest-conf

    # compression filter (gzip, szip).

    szip = H5_SZIP_NN_OPTION_MASK, 8

    #+end_src

- Async :: Specifies calling the async APIs (requires HDF5 version > 1.12) and

    [[https://github.com/hpc-io/vol-async][ASYNC VOL]]

    #+begin_src conf-unix :noweb-ref hdf5-iotest-conf

    # enable async [0 - no, 1 - yes]

    async = 1

    #+end_src

- Delay :: Add a delay between time steps. Helpful in simulating a computing phase

    when doing async I/O.

    #+begin_src conf-unix :noweb-ref hdf5-iotest-conf

    # delay in the form of an integer and time unit second 's 'or milliseconds 'ms'

    delay = 1s

    #+end_src

* Internal Parameters<>

Currently, the I/O test varies the following parameters:

- Dataset Rank :: The 2D array variables can be stored individually, or embedded

  into 3D or 4D datasets. In other words, the rank can be 2, 3, or 4.

- Slowest Dimension :: The slowest dimension can be array (count) or time.

- Initialization with Fill Values :: The default behavior of the HDF5 library is

  to initialize storage with the default or a user-specified fill value. This

  incurs additional I/O and may reduce performance.

- Storage Layout :: The dataset storage layout in the HDF5 file can be chunked

  or contiguous (or compact or virtual or user-defined).

- Alignment :: HDF5 objects greater than or equal to an alignment threshold can

  be aligned on addresses that are a multiple of a certain increment.

- Lower Library Version Bound  :: The HDF5 library can be configured to use the

  earliest or latest available file format micro-versions when generating

  objects.

- MPI I/O Operations :: With MPI, the write and read operations can be collective

  or independent.

Since there is no shortage of knobs in the HDF5 API, other parameters might be

added in the future.

* Appendix <>

** Sample =spack spec hdf5iotest= output <>

#+begin_example

==> Using specified package name: 'hdf5iotest'

==> Created template for hdf5iotest package

==> Created package file: /home/gerdheber/GitHub/spack/var/spack/repos/builtin/packages/hdf5iotest/package.py

Waiting for Emacs...

% spack spec hdf5iotest

Input spec

--------------------------------

hdf5iotest

Concretized

--------------------------------

hdf5iotest@spack%gcc@8.3.0 arch=linux-debian10-skylake

 ^autoconf@2.69%gcc@8.3.0 arch=linux-debian10-skylake

     ^m4@1.4.18%gcc@8.3.0+sigsegv patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8 arch=linux-debian10-skylake

         ^libsigsegv@2.12%gcc@8.3.0 arch=linux-debian10-skylake

     ^perl@5.32.0%gcc@8.3.0+cpanm+shared+threads arch=linux-debian10-skylake

         ^berkeley-db@18.1.40%gcc@8.3.0 arch=linux-debian10-skylake

         ^gdbm@1.18.1%gcc@8.3.0 arch=linux-debian10-skylake

             ^readline@8.0%gcc@8.3.0 arch=linux-debian10-skylake

                 ^ncurses@6.2%gcc@8.3.0~symlinks+termlib arch=linux-debian10-skylake

                     ^pkgconf@1.7.3%gcc@8.3.0 arch=linux-debian10-skylake

 ^automake@1.16.3%gcc@8.3.0 arch=linux-debian10-skylake

 ^hdf5@1.10.7%gcc@8.3.0~cxx~debug~fortran~hl~java+mpi+pic+shared~szip~threadsafe api=none arch=linux-debian10-skylake

     ^openmpi@4.0.5%gcc@8.3.0~atomics~cuda~cxx~cxx_exceptions+gpfs~java~legacylaunchers~lustre~memchecker~pmi~singularity~sqlite3+static~thread_multiple+vt+wrapper-rpath fabrics=none schedulers=none arch=linux-debian10-skylake

         ^hwloc@2.2.0%gcc@8.3.0~cairo~cuda~gl~libudev+libxml2~netloc~nvml+pci+shared arch=linux-debian10-skylake

             ^libpciaccess@0.16%gcc@8.3.0 arch=linux-debian10-skylake

                 ^libtool@2.4.6%gcc@8.3.0 arch=linux-debian10-skylake

                 ^util-macros@1.19.1%gcc@8.3.0 arch=linux-debian10-skylake

             ^libxml2@2.9.10%gcc@8.3.0~python arch=linux-debian10-skylake

                 ^libiconv@1.16%gcc@8.3.0 arch=linux-debian10-skylake

                 ^xz@5.2.5%gcc@8.3.0~pic arch=linux-debian10-skylake

                 ^zlib@1.2.11%gcc@8.3.0+optimize+pic+shared arch=linux-debian10-skylake

         ^numactl@2.0.14%gcc@8.3.0 patches=4e1d78cbbb85de625bad28705e748856033eaafab92a66dffd383a3d7e00cc94 arch=linux-debian10-skylake

#+end_example

** Sample CSV Output

The CSV output looks like this. It's then easy to concatenate several of these

(after stripping out the header), and load them into [[https://pandas.pydata.org/][pandas]] or [[https://www.r-project.org/][R]].

#+begin_example

steps,arrays,rows,cols,scaling,proc-rows,proc-cols,slowdim,rank,version,alignment-increment,alignment-threshold,layout,fill,fmt,io,wall [s],fsize [B],write-phase-min [s],write-phase-max [s],creat-min [s],creat-max [s],write-min [s],write-max [s],read-phase-min [s],read-phase-max [s],read-min [s],read-max [s]

20,500,100,200,weak,1,1,step,2,"1.8.22",1,0,contiguous,true,earliest,core,15.37,1689710848,14.10,14.10,7.25,7.25,0.77,0.77,1.27,1.27,0.27,0.27

20,500,100,200,weak,1,1,step,2,"1.8.22",1,0,contiguous,true,latest,core,20.70,1682116757,19.53,19.53,12.08,12.08,0.78,0.78,1.17,1.17,0.28,0.28

...

#+end_example

*** Metrics

All timings are obtained via =MPI_Wtime=. For some metrics, we record minima and

maxima across MPI ranks. The columns =steps= through =mpi-io= are just

reiterations of the configuration parameters. The remaining columns are as

follows:

- =wall [s]= :: Wall time in seconds.

- =fsize [B]= :: The HDF5 output file size in bytes

- =write-phase-min [s],write-phase-max [s]= :: The fastest and slowest

  cumulative write phase time in seconds. This includes the time for file and

  dataset creation(s).

- =creat-min [s],creat-max [s]= :: The fastest and slowest time spent in

  =H5Fcreate=, =H5Fclose=, and =H5Dcreate= in seconds

- =write-min [s],write-max [s]= :: The fastest and slowest cumulative =H5Dwrite=

  time in seconds

- =read-phase-min [s],read-phase-max [s]= :: The fastest and slowest cumulative

  read phase time in seconds. This includes the times for opening and closing

  the HDF5 file and for creating dataset selections.

- =read-min [s],read-max [s]= :: The fastest and slowest cumulative =H5Dread=

  time in seconds
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hdfgroup/hdf5-iotest

Awesome Lists containing this project

README