An open API service indexing awesome lists of open source software.

https://github.com/tacc/t3pio


https://github.com/tacc/t3pio

Last synced: 2 months ago
JSON representation

Awesome Lists containing this project

README

        

Date: 8/15/2012

T3PIO Library: TACC's Terrific Tool for Parallel I/O
----------------------------------------------------
Robert McLay
[email protected]

This software (T3PIO) is a parallel library to improve the parallel performance of
MPI I/O. Since Parallel HDF5 uses MPI I/O, this library will improve the performance
of it as well. This library interacts with the Lustre filesystem to match the
application to the filesystem.

The parallel I/O performance depends on three parameters:

1) Number of writers
2) Number of stripes
3) Stripe size

This library focuses on writing of files. When an application creates a file, it is
the only time that one can control the number of stripes. This library extracts data
from the lustre file system and your parallel program to set all three parameters to
improve performance.

The use of the library is straight-forward. Only one extra call to the T3PIO library
is necessary to improve your performance.

Using T3PIO with HDF5:
----------------------

In Fortran, it looks like this (a complete example can be found in
cubeTestHDF5/writer.F90 (lines 52-140):

subroutine hdf5_writer(....)
use hdf5
use t3pio
integer globalSize ! Estimate of GlobalSize of file in (MB)
integer info ! MPI Info object
integer comm ! MPI Communicator
integer(hid_t) :: plist_id ! Property list identifier
character*(256) :: dir ! directory of output file
...

comm = MPI_COMM_WORLD

! Initialize info object.
call MPI_Info_create(info, ierr)

! use library to fill info with nwriters, stripe
dir = "./"
call t3pio_set_info(comm, info, dir, ierr, & ! <--- T3PIO library call
GLOBAL_SIZE=globalSize)
call H5open_f(ierr)
call H5Pcreate_f(H5P_FILE_ACCESS_F,plist_id,ierr)
call H5Pset_fapl_mpio_f(plist_id, comm, info, ierr)
call H5Fcreate_f(fileName, H5F_ACC_TRUNC_F, file_id, ierr,
access_prp = plist_id)

Essentially, you make the normal calls to create an HDF5 file. The
only addition is to call "t3pio_set_info" with the communicator, an
info object, and the directory where the file is to be written.
Optionally, you can specify an estimate of the global size of the
file. If you do so then stripe size will increase to improve
performance.

In C/C++ it is very similar (a complete example can be found in
unstructTestHDF5/h5writer.C (lines 71-95)). Note that the first
arguments to the function t3pio_set_info are required. All other
options are optional.

#include "t3pio.h"
#include "hdf5.h"
void hdf5_writer(....)
{
MPI_Info info = MPI_INFO_NULL;
const char *dir;
hid_t plist_id;
...

MPI_Info_create(&info);
dir = "./";
ierr = t3pio_set_info(comm, info, dir, /* <-- T3PIO call */
T3PIO_GLOBAL_SIZE, globalSize);
plist_id = H5Pcreate(H5P_FILE_ACCESS);
ierr = H5Pset_fapl_mpio(plist_id, comm, info);
file_id = H5Fcreate(fileName, H5F_ACC_TRUNC,
H5P_DEFAULT, plist_id);
...
}

Using T3PIO with MPI I/O:
-------------------------

Using T3PIO with MPI I/O is essentially the same as with HDF5. In Fortran 90
it would look like (see writer.F90 lines 300-317):

subroutine hdf5_writer(....)
use t3pio
integer info ! MPI Info object
integer comm ! MPI Communicator
integer iTotalSz ! File size in MB.
integer filehandle ! MPI file handle
character*(256) :: dir ! directory of output file

call MPI_Info_create(info,ierr)

iTotalSz = totalSz / (1024*1024)
dir = "./"
call t3pio_set_info(MPI_COMM_WORLD, info, dir, ierr, &
global_size = iTotalSz)

call MPI_File_open(p % comm, fn, MPI_MODE_CREATE+MPI_MODE_RDWR, &
info, filehandle, ierr)
...

Obviously, the use of the T3PIO library is the same when using MPI I/O or HDF5.

Installing T3PIO:
-----------------

The library can downloaded from github

$ git clone git://github.com/TACC/t3pio.git

To configure the package do:

$ FC=mpif90 F77=mpif77 CC=mpicc CXX=mpicxx ./configure [options]

Where you need to specify the four mpi compilers and where the options are:

--prefix=/path/to/install/dir

Path to where you wish the library and test programs to be installed

--with-phdf5=/path/to/parallel/hdf5/dir

The library does not require Parallel HDF5. However, the test programs do. Please
set the directory to the parent directory to hdf5 is installed. This package expects
that the "include" and "lib" to be directly below the directory specified above.

--with-node-memory=N (where N is in MB)

This is optional and avoids having the library run "free -m" to determine this quantity.
The size in MB. It should be the size of memory on the compute nodes.

After configure do:

$ make; make install

This will build and install the library and the two test programs: cubeTest, unstructTest

Test Program (1): cubeTest
----------------------------

To run the structured grid test code do the following to get help:

$ cubeTest -H



cubeTest version 1.0

Usage: cubeTest [options]
options:
-v : version
-H : This message and quit
--help : This message and quit
--dim num : number of dimension (2, or 3)
-g num : number of points in each direction globally
(no default)
-l num : number of points in each direction locally
(default = 5)
-f num : number of stripes per writer
(default = 2)
--stripes num : Allow no more than num stripes
(file system limit by default)
--h5chunk : use HDF5 with chunks
--h5slab : use HDF5 with slab
(default)
--romio : use MPI I/O
--numvar num : number of variables 1 to 9
(default = 1)

Test Program (2) : unstructTest
------------------------------

To run the unstructured grid test code do the following to get help:

$ unstructTest -h

Usage:
./unstructTest [options]

Options:
-h, -? : Print Usage
-v : Print Version
-C : use h5 chunk
-S : use h5 slab (default)
-l num : local size is num (default=10)
-f factor : number of stripes per writer (default=2)
-s num : maximum number of stripes