Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sparticlesteve/cosmoflow-benchmark

Benchmark implementation of CosmoFlow in TensorFlow Keras
https://github.com/sparticlesteve/cosmoflow-benchmark

Last synced: 14 days ago
JSON representation

Benchmark implementation of CosmoFlow in TensorFlow Keras

Host: GitHub
URL: https://github.com/sparticlesteve/cosmoflow-benchmark
Owner: sparticlesteve
License: apache-2.0
Created: 2019-06-11T14:46:53.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2024-02-07T22:43:41.000Z (9 months ago)
Last Synced: 2024-08-01T16:46:43.217Z (3 months ago)
Language: Jupyter Notebook
Size: 3.26 MB
Stars: 19
Watchers: 7
Forks: 10
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

# CosmoFlow TensorFlow Keras benchmark implementation

**WARNING: this repo is old. For the latest MLPerf HPC reference implementation of cosmoflow, see https://github.com/mlcommons/hpc/tree/main/cosmoflow**

This is a an implementation of the
[CosmoFlow](https://arxiv.org/abs/1808.04728) 3D convolutional neural network
for benchmarking. It is written in TensorFlow with the Keras API and uses
[Horovod](https://github.com/horovod/horovod) for distributed training.

You can find the previous TensorFlow implementation which accompanied the CosmoFlow paper at
https://github.com/NERSC/CosmoFlow

## Datasets

The dataset we use for this benchmark comes from simulations run by the
ExaLearn group and hosted at NERSC. The following web portal describes the
technical content of the dataset and provides links to the raw data.

https://portal.nersc.gov/project/m3363/

For this benchmark we currently use a preprocessed version of the dataset which
generates crops of size (128, 128, 128, 4) and stores in TFRecord format.
This preprocessing is done using the [prepare.py](prepare.py) script included
in this package. We describe here how to get access to this processed dataset,
but please refer to the ExaLearn web portal for additional technical details.

Globus is the current recommended way to transfer the dataset locally.
There is a globus endpoint at:

https://app.globus.org/file-manager?origin_id=d0b1b73a-efd3-11e9-993f-0a8c187e8c12&origin_path=%2F

The contents are also available via HTTPS at:

https://portal.nersc.gov/project/dasrepo/cosmoflow-benchmark/

### MLPerf HPC v1.0 preliminary dataset

Preprocessed TFRecord files are available in a 1.7TB tarball named
`cosmoUniverse_2019_05_4parE_tf_v2.tar`. It contains subfolders for
train/val/test file splits.

In this preparation, there are 524288 samples for training and 65536 samples for
validation. The TFRecord files are written with gzip compression to reduce total
storage size.

### MLPerf HPC v0.7 dataset

The pre-processed dataset in TFRecord format is in the
`cosmoUniverse_2019_05_4parE_tf` folder, which contains training and validation
subfolders. There are 262144 samples for training and 65536 samples
for validation/testing. The combined size of the dataset is 5.1 TB.

For getting started, there is also a small tarball (179MB) with 32 training
samples and 32 validation samples, called `cosmoUniverse_2019_05_4parE_tf_small.tgz`.

## Running the benchmark

Submission scripts are in `scripts`. YAML configuration files go in `configs`.

### Running at NERSC

`sbatch -N 64 scripts/train_cori.sh`