https://github.com/reger-men/tensorflow_benchmark

TensorFlow benchmark scripts for single and multi nodes with multi GPUs
https://github.com/reger-men/tensorflow_benchmark

multi-gpus multi-nodes tensorflow-benchmark tensorflow2

Last synced: about 1 year ago
JSON representation

TensorFlow benchmark scripts for single and multi nodes with multi GPUs

Host: GitHub
URL: https://github.com/reger-men/tensorflow_benchmark
Owner: reger-men
License: mit
Created: 2019-11-28T14:52:03.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2023-12-20T10:55:57.000Z (over 2 years ago)
Last Synced: 2024-11-02T17:12:46.503Z (over 1 year ago)
Topics: multi-gpus, multi-nodes, tensorflow-benchmark, tensorflow2
Language: Python
Size: 64.5 KB
Stars: 6
Watchers: 3
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Tensorflow v2 benchmark
![Supports TFv2](https://img.shields.io/badge/Supports-tensorflow%20v2-blue.svg)

TensorFlow benchmark scripts for single and multi Nodes with Multi GPUs

#### Usage
##### Clone repo
```git clone https://github.com/reger-men/tensorflow_benchmark.git```

##### Pre-requirement
```pip3 install -r requirements.txt```

##### Single Node Single GPU
Train with custom loop:

```python3 train.py -train_mode loop```

Train with Keras fit:

```python3 train.py -train_mode fit```

##### Single Node Multi-GPUs
Mirrored strategy will be used as default with ```num_gpus>1```

```python3 train.py -train_mode fit -num_gpus 2```

```python3 train.py -train_mode loop -num_gpus 2```

##### Multi-Node Multi-GPUs
Experimental: launch Multi-Nodes training from the chief Worker Node

```python3 train.py --train_mode=fit --workers="localhost:122,localhost:123" --w_type="worker" --w_index=0 --distribution_strategy=MultiWorker```

Or with custom loop:

```python3 train.py --train_mode=loop --workers="localhost:122,localhost:123" --w_type="worker" --w_index=0 --distribution_strategy=MultiWorker```

##### Help Flags
```python3 train.py --helpfull```

```
train.py:
--batch_size: Batch Size
(default: '128')
(an integer)
--buffer_size: Shuffle buffer size
(default: '50000')
(an integer)
--display_every: Number of steps after which progress is printed out
(default: '20')
(an integer)
--distribution_strategy: Can be: Mirrored, MultiWorker, OneDevice
(default: 'OneDevice')
--epochs: Number of epochs
(default: '1')
(an integer)
--num_gpus: Number of GPUs. 0 will run on CPU
(default: '1')
(an integer)
--[no]setup_cluster: Setup the cluster from the chief worker or not. This is an expiremental feature
(default: 'true')
--train_mode: Use either keras fit or loop training
(default: 'fit')
--verbose: Set verbosity level
(default: '0')
(an integer)
--w_index: Worker index. 0 is appointed as the chief worker
(default: '0')
(an integer)
--w_type: Task type
(default: 'worker')
--workers: List of workers IP:Port
(default: 'localhost:122,localhost:123')
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/reger-men/tensorflow_benchmark

Awesome Lists containing this project

README