{"id":23278479,"url":"https://github.com/hdfgroup/vol-cache","last_synced_at":"2025-08-21T11:31:29.760Z","repository":{"id":46823106,"uuid":"319366733","full_name":"HDFGroup/vol-cache","owner":"HDFGroup","description":"HDF5 Cache VOL connector for caching data on fast storage layers and moving data asynchronously to the parallel file system to hide I/O overhead. ","archived":false,"fork":false,"pushed_at":"2024-12-09T23:13:22.000Z","size":3239,"stargazers_count":19,"open_issues_count":6,"forks_count":8,"subscribers_count":11,"default_branch":"develop","last_synced_at":"2024-12-10T00:22:19.760Z","etag":null,"topics":["hpc","parallel-io"],"latest_commit_sha":null,"homepage":"https://vol-cache.readthedocs.io","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HDFGroup.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-12-07T15:45:52.000Z","updated_at":"2024-12-09T23:13:26.000Z","dependencies_parsed_at":"2024-06-11T17:01:30.393Z","dependency_job_id":"88b375cd-0fc3-4308-8c76-ca148175e5a3","html_url":"https://github.com/HDFGroup/vol-cache","commit_stats":null,"previous_names":["hdfgroup/vol-cache"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HDFGroup%2Fvol-cache","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HDFGroup%2Fvol-cache/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HDFGroup%2Fvol-cache/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HDFGroup%2Fvol-cache/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HDFGroup","download_url":"https://codeload.github.com/HDFGroup/vol-cache/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230508022,"owners_count":18237108,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hpc","parallel-io"],"created_at":"2024-12-19T22:31:52.995Z","updated_at":"2024-12-19T22:31:53.490Z","avatar_url":"https://github.com/HDFGroup.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HDF5 Cache VOL: Efficient Parallel I/O through Caching Data on Fast Storage Layers\n\nDocumentation: \u003chttps://vol-cache.readthedocs.io\u003e\n\nThis is the public repo for Cache VOL, a software package developed in the ```ExaIO``` Exascale Computing Project. Cache VOL's main objective is to incorporate fast storage layers (e.g., burst buffer, node-local storage) into parallel I/O workflows for caching and staging data to improve I/O efficiency.\n\nThe design, implementation, and performance evaluation of Cache VOL is presented in our CCGrid'2022 paper:\nHuihuo Zheng, Venkatram Vishwanath, Quincey Koziol, Houjun Tang, John Ravi, John Mainzer, Suren Byna, \"HDF5 Cache VOL: Efficient and Scalable Parallel\nI/O through Caching Data on Node-local Storage,\" 2022 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2022, doi:10.1109/CCGrid54584.2022.00015\n\n## Files under this folder\n\n* ./src - Cache VOL source files\n  * cache_utils.c, cache_utils.h --  utility functions\n  * H5VLcache_ext.c, H5VLcache_ext.h -- cache VOL\n  * H5LS.c, H5LS.h -- functions for managing cache storage\n  * cache_new_h5api.h, cache_new_h5api.c -- new public API functions specific to the cache VOL.\n\n* ./benchmarks - microbenchmark codes\n  * write_cache.cpp -- testing code for parallel write\n  * read_cache.cpp, read_cache.py -- benchmark code for parallel read\n\n* ./docs/ - Documentation\n  * cache_vol.tex (OLD) -- prototype design based on explicit APIs and initial performance evaluation.\n  * readthedoc -- \u003chttps://vol-cache.readthedocs.io\u003e\n* tests: this contains a set of tests for different functions.\n\n## Building the Cache VOL\n\nWe outline below some basic information about how to use Cache VOL. Please find detailed instruction on \u003chttps://vol-cache.readthedocs.io\u003e.\n\nIn order for cmake to find the dependent libraries, the user have to define the following environment variables\n\n```bash\nHDF5_DIR # prefix for install the HDF5 library\nHDF5_ROOT # set to be the same as HDF5_DIR\nABT_DIR # prefix for install the Argobots library\nHDF5_VOL_DIR # prefix for install the VOL connectors\n```\n\n### Building HDF5 shared library\n\nCurrently, the cache VOL depends on the versions equal to or greater than 1.14 or the *develop* branch of HDF5,\n\n```bash\ngit clone -b develop https://github.com/HDFGroup/hdf5.git\ncd hdf5\n./autogen.sh\n./configure --prefix=$HDF5_DIR --enable-parallel --enable-threadsafe --enable-unsupported CC=mpicc\nmake all install \n```\n\nWhen running configure, make sure you **DO NOT** have the option \"--disable-shared\".\n\n### Build Argobots library\n\n```bash\ngit clone https://github.com/pmodels/argobots.git\ncd argobots\n./autogen.sh\n./configure --prefix=$ABT_DIR\nmake all install\n```\n\n### Building the Async VOL library\n\n```bash\ngit clone https://github.com/hpc-io/vol-async.git\nmkdir -p vol-async/build\ncd vol-async/build\ncmake .. -DCMAKE_INSTALL_PREFIX=$HDF5_VOL_DIR\nmake all install\n```\n\nHere, HDF5_VOL_DIR is set to be the prefix for installing all the vols.\n\n### Build the cache VOL library\n\n```bash\ngit clone https://github.com/hpc-io/vol-cache.git\nmkdir -p vol-cache/build\ncd vol-cache/build\ncmake .. -DCMAKE_INSTALL_PREFIX=$HDF5_VOL_DIR\nmake all install\n```\n\nTo run the demo, set following environment variables first:\n\n```bash\nexport HDF5_PLUGIN_PATH=$HDF5_VOL_DIR/lib\nexport HDF5_VOL_CONNECTOR=\"cache_ext config=config_1.cfg;under_vol=512;under_info={under_vol=0;under_info={}};\"\nexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HDF5_ROOT/lib:$HDF5_PLUGIN_PATH\n```\n\nIn this case, we have stacked Async VOL (VOL ID: 512) under the cache VOL to perform the data migration between the node-local storage and the global parallel file system.\n\nBy default, the debugging mode is enabled to ensure the VOL connector is working. To disable it, simply remove the $(DEBUG) option from the CC line, and rerun make.\n\nAll the setup of the local storage information is included in ```config_1.cfg```. Below is an example of config file\n\n```config\nHDF5_CACHE_STORAGE_SCOPE: LOCAL # the scope of the storage [LOCAL|GLOBAL]\nHDF5_CACHE_STORAGE_PATH: /local/scratch # path of local storage\nHDF5_CACHE_STORAGE_SIZE: 128188383838 # size of the storage space in bytes\nHDF5_CACHE_STORAGE_TYPE: SSD # local storage type [SSD|BURST_BUFFER|MEMORY|GPU], default SSD\nHDF5_CACHE_REPLACEMENT_POLICY: LRU # [LRU|LFU|FIFO|LIFO]\n```\n\n## Running the parallel HDF5 benchmarks\n\n### Environment variables\n\nCurrently, we use environmental variables to enable and disable the cache functionality.\n\n* HDF5_CACHE_RD [yes|no]: Whether to turn on caching for read. [default=no]\n* HDF5_CACHE_WR [yes|no]: Whether to turn on caching for write. [default=no]\n\n### Parallel write\n\n* **write_cache.cpp** is the benchmark code for evaluating the parallel write performance. In this testing case, each MPI rank has a local\n   buffer BI to be written into a HDF5 file organized in the following way: [B0|B1|B2|B3]|[B0|B1|B2|B3]|...|[B0|B1|B2|B3]. The repetition of [B0|B1|B2|B3] is the number of iterations\n  * --dim D1 D2: dimension of the 2D array [BI] // this is the local buffer size\n  * --niter NITER: number of iterations. Notice that the data is accumulately written to the file.\n  * --scratch PATH: the location of the raw data\n  * --sleep [seconds]: sleep between different iterations\n  * --collective: whether to use collective I/O or not.\n\n### Parallel read\n\n* **prepare_dataset.cpp** this is to prepare the dataset for the parallel read benchark.\n\n```bash\nmpirun -np 4 ./prepare_dataset --num_images 8192 --sz 224 --output images.h5\n```\n\nThis will generate a hdf5 file, images.h5, which contains 8192 samples. Each 224x224x3 (image-base dataset)\n\n* **read_cache.cpp, read_cache.py** is the benchmark code for evaluating the parallel read performance. We assume that the dataset is set us\n  * --input: HDF5 file [Default: images.h5]\n  * --dataset: the name of the dataset in the HDF5 file [Default: dataset]\n  * --num_epochs [Default: 2]: Number of epochs (at each epoch/iteration, we sweep through the dataset)\n  * --num_batches [Default: 16]: Number of batches to read per epoch\n  * --batch_size [Default: 32]: Number of samples per batch\n  * --shuffle: Whether to shuffle the samples at the beginning of each epoch.\n  * --local_storage [Default: ./]: The path of the local storage.\n\nTo accurately assess the read benchmark, isolating the effects of DRAM caching is crucial. By default, during the first iteration, the system caches all data in memory (RSS) unless the memory capacity is insufficient to store all the data. As a result, the second iteration achieves a very high bandwidth, regardless of the location of the node-local storage.\n\nTo remove the cache / buffering effect for read benchmarks, one can allocate a big array close to the RAM size so that it does not have any extra space to cache the input HDF5 file. This can be achieved by setting ```MEMORY_PER_PROC``` (memory per process in Giga Byte). **However, this might cause the compute node to crash.** The other way is to read dummy files by setting ```CACHE_NUM_FILES``` (number of dummy files to read per process).\n\n## Citation\nIf you use Cache VOL, please cite the following paper\n```\nH. Zheng et al., \"HDF5 Cache VOL: Efficient and Scalable Parallel I/O through Caching Data on Node-local Storage,\" 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), Taormina, Italy, 2022, pp. 61-70, doi: 10.1109/CCGrid54584.2022.00015.\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhdfgroup%2Fvol-cache","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhdfgroup%2Fvol-cache","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhdfgroup%2Fvol-cache/lists"}