https://github.com/reger-men/tensorflow_benchmark
TensorFlow benchmark scripts for single and multi nodes with multi GPUs
https://github.com/reger-men/tensorflow_benchmark
multi-gpus multi-nodes tensorflow-benchmark tensorflow2
Last synced: about 1 year ago
JSON representation
TensorFlow benchmark scripts for single and multi nodes with multi GPUs
- Host: GitHub
- URL: https://github.com/reger-men/tensorflow_benchmark
- Owner: reger-men
- License: mit
- Created: 2019-11-28T14:52:03.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-12-20T10:55:57.000Z (over 2 years ago)
- Last Synced: 2024-11-02T17:12:46.503Z (over 1 year ago)
- Topics: multi-gpus, multi-nodes, tensorflow-benchmark, tensorflow2
- Language: Python
- Size: 64.5 KB
- Stars: 6
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Tensorflow v2 benchmark

TensorFlow benchmark scripts for single and multi Nodes with Multi GPUs
#### Usage
##### Clone repo
```git clone https://github.com/reger-men/tensorflow_benchmark.git```
##### Pre-requirement
```pip3 install -r requirements.txt```
##### Single Node Single GPU
Train with custom loop:
```python3 train.py -train_mode loop```
Train with Keras fit:
```python3 train.py -train_mode fit```
##### Single Node Multi-GPUs
Mirrored strategy will be used as default with ```num_gpus>1```
```python3 train.py -train_mode fit -num_gpus 2```
```python3 train.py -train_mode loop -num_gpus 2```
##### Multi-Node Multi-GPUs
Experimental: launch Multi-Nodes training from the chief Worker Node
```python3 train.py --train_mode=fit --workers="localhost:122,localhost:123" --w_type="worker" --w_index=0 --distribution_strategy=MultiWorker```
Or with custom loop:
```python3 train.py --train_mode=loop --workers="localhost:122,localhost:123" --w_type="worker" --w_index=0 --distribution_strategy=MultiWorker```
##### Help Flags
```python3 train.py --helpfull```
```
train.py:
--batch_size: Batch Size
(default: '128')
(an integer)
--buffer_size: Shuffle buffer size
(default: '50000')
(an integer)
--display_every: Number of steps after which progress is printed out
(default: '20')
(an integer)
--distribution_strategy: Can be: Mirrored, MultiWorker, OneDevice
(default: 'OneDevice')
--epochs: Number of epochs
(default: '1')
(an integer)
--num_gpus: Number of GPUs. 0 will run on CPU
(default: '1')
(an integer)
--[no]setup_cluster: Setup the cluster from the chief worker or not. This is an expiremental feature
(default: 'true')
--train_mode: Use either keras fit or loop training
(default: 'fit')
--verbose: Set verbosity level
(default: '0')
(an integer)
--w_index: Worker index. 0 is appointed as the chief worker
(default: '0')
(an integer)
--w_type: Task type
(default: 'worker')
--workers: List of workers IP:Port
(default: 'localhost:122,localhost:123')
```