Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/outerbounds/metaflow-tensorflow
https://github.com/outerbounds/metaflow-tensorflow
Last synced: 4 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/outerbounds/metaflow-tensorflow
- Owner: outerbounds
- License: apache-2.0
- Created: 2023-09-24T02:57:02.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-11-02T23:13:17.000Z (about 1 year ago)
- Last Synced: 2023-11-03T00:23:03.390Z (about 1 year ago)
- Language: Python
- Size: 23.4 KB
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Metaflow TensorFlow decorator
The [`tf.distribute.Strategy`](https://www.tensorflow.org/api_docs/python/tf/distribute/Strategy) allows TensorFlow developers to distribute model training to multiple GPUs/TPUs and machines. This repository implements the Metaflow `@tensorflow` decorator, which sets up a multi-node Metaflow step to use this functionality.### Features
### Installation
Install this experimental module:
```
pip install metaflow-tensorflow
```### Getting Started
This package will add a Metaflow extension to your already installed Metaflow, so you can use the `tensorflow` decorator.
```
from metaflow import FlowSpec, step, tensorflow, ...
```The rest of this `README.md` file describes how you can use TensorFlow with Metaflow in the single node and multi-node cases which require `@tensorflow`.
# TensorFlow Distributed on Metaflow guide
The examples in this repository are based on the [original TensorFlow Examples](https://www.tensorflow.org/guide/distributed_training#examples_and_tutorials).### Examples and guides
| Directory | TensorFlow script description |
| :--- | ---: |
| [MirroredStrategy](examples/single-node/README.md) | Synchronous distributed training on multiple GPUs on one machine. |
| [MultiWorkerMirroredStrategy](examples/multi-node/README.md) | Synchronous distributed training across multiple workers, each with potentially multiple GPUs. |#### Parameter Server
Not yet tested, please reach out to the Outerbounds team if you need help.#### Installing TensorFlow for GPU usage in Metaflow
> From [TensorFlow documentation](https://www.tensorflow.org/install/pip): Do not install TensorFlow with conda. It may not have the latest stable version. pip is recommended since TensorFlow is only officially released to PyPI.We have found the easiest way to install TensorFlow for GPU is to use the pre-made Docker image `tensorflow/tensorflow:latest-gpu`.
#### Fault Tolerance
See [TensorFlow documentation on this matter](https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras#fault_tolerance).
The TL;DR is to use a flavor of `tf.distribute.Strategy`, which implement mechanisms to handle worker failures gracefully.### License
`metaflow-tensorflow` is distributed under the Apache License.