https://github.com/wbuchwalter/fairing

👩‍🔬[Experimental] Easily train and serve ML models on Kubernetes, directly from your python code.
https://github.com/wbuchwalter/fairing

kubeflow kubernetes machine-learning

Last synced: 7 months ago
JSON representation

👩‍🔬[Experimental] Easily train and serve ML models on Kubernetes, directly from your python code.

Host: GitHub
URL: https://github.com/wbuchwalter/fairing
Owner: wbuchwalter
License: mit
Created: 2018-04-26T02:13:01.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2018-11-08T15:36:51.000Z (almost 7 years ago)
Last Synced: 2025-03-29T00:14:00.878Z (7 months ago)
Topics: kubeflow, kubernetes, machine-learning
Language: Python
Homepage:
Size: 174 KB
Stars: 31
Watchers: 7
Forks: 4
Open Issues: 6
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # :warning: Fairing has moved!:warning: 

 

Fairing is now part of the Kubeflow organisation, the new repository for the project is https://github.com/kubeflow/fairing 

# Fairing

Easily train and serve ML models on Kubernetes, directly from your python code.  

This projects uses [Metaparticle](http://metaparticle.io/) behind the scene.

fairing allows you to express how you want your model to be trained and served using native python decorators.  

## Table of Contents

- [Requirements](#requirements)

- [Getting `fairing`](#getting-fairing)

- [Training](#training)

  - [Simple Training](#simple-training)

  - [Hyperparameters Tuning](#hyperparameters-tuning)

  - [Population Based Training](#population-based-training)

- [Usage with Kubeflow](#usage-with-kubeflow)

  - [Simple TfJob](#simple-tfjob)

  - [Distributed Training](#distributed-training)

  - [From a Jupyter Notebook](#from-a-jupyter-notebook)

- [Monitoring with TensorBoard](#tensorboard)

## Requirements

If you are going to use `fairing` on your local machine (as opposed to from a Jupyter Notebook deployed inside a Kubernetes cluster for example), you will need 

to have access to a deployed Kubernetes cluster, and have the `kubeconfig` for this cluster on your machine.

You will also need to have docker installed locally.

## Getting `fairing`

**Note**: This projects requires python 3

```bash

pip install fairing

```

Or, in a Jupyter Notebook, create a new cell and execute: `!pip install fairing`.

## Training

`fairing` provides a `@Train` class decorator allowing you to specify how you want your model to be packaged and trained.  

Your model needs to be defined as a class to work with `fairing`. 

This limitation is needed in order to enable usage of more complex training strategies and simplify usage from within a Jupyter Notebook.

Following are a series of example that should help you understand how fairing works.

#### Simple Training

Your class needs to define a `train` method that will be called during training:

```python

from fairing.train import Train

@Train(repository='')

class MyModel(object):

    def train(self):

      # Training logic goes here

```

Complete example: [examples/simple-training/main.py](./examples/simple-training/main.py)

#### Hyperparameters Tuning

Allows you to run multiple trainings in parallel, each one with different values for your hyperparameters.

Your class should define a `hyperparameters` method that returns an dictionary of hyperparameters and their values.

This dictionary will be automatically passed to your `train` method. 

Don't forget to add a new argument to your `train` method to received the hyperparameters.

```python

from fairing.train import Train

from fairing.strategies.hp import HyperparameterTuning

@Train(

    repository='',

    strategy=HyperparameterTuning(runs=6),

)

class MyModel(object):

    def hyperparameters(self):

      return {

        'learning_rate': random.normalvariate(0.5, 0.45)

      }

    def train(self, hp):

      # Training logic goes here

```

To specify that we wanted to train our model using hyperparameters tuning, and not just a simple training, 

we passed a new `strategy` parameter to the `@Train` decorator, and specified the number of runs we wish to be created.

Complete example: [examples/hyperparameter-tuning/main.py](./examples/hyperparameter-tuning/main.py)

#### Population Based Training

We can also ask `fairing` to train our code using [Population Based Training](https://deepmind.com/blog/population-based-training-neural-networks/).

This is a more advanced training strategies that needs hook into different lifecycle steps of your model, thus we need to define several additional method into our model class.

A multiple read/write PVC name needs to be pass to the `PopulationBasedTraining` strategie. This is used to store and exchange the different models generated by our training to enable the `explore/exploit` mechanism of Population Based Training.

```python

from fairing.train import Train

from fairing.strategies.pbt import PopulationBasedTraining

@Train(

    repository='',

    strategy=PopulationBasedTraining(

        population_size=10,

        exploit_count=6,

        steps_per_exploit=5000,

        pvc_name='',

        model_path = MODEL_PATH

    )

)

class MyModel(object):

    def hyperparameters(self):

      # returns the dictionary of hyperparameters

    

    def build(self, hp):

      # build the model

    

    def train(self, hp):

      # training logic

    

    def save(self):

      # save the model at MODEL_PATH

    

    def restore(self, model_path):

      # restore the model from MODEL_PATH

```

Complete example: [examples/population-based-training/main.py](./examples/population-based-training/main.py)

## Usage with Kubeflow

### Simple TfJob

Instead of creating native `Jobs`, `fairing` can leverage Kubeflow's `TfJobs` assuming you have Kubeflow installed in your cluster.

Simply pass the Kubeflow architecture to the train decorator (note that you can still use all the training strategies mentionned above):

```python

from fairing.train import Train

from fairing.architectures.kubeflow.basic import BasicArchitecture

@Train(repository='wbuchwalter', architecture=BasicArchitecture())

class MyModel(object):

    def train(self):

       # training logic

```

### Distributed Training

Using Kubeflow, we can also ask `fairing` to start [distributed trainings](https://www.tensorflow.org/deploy/distributed) instead.

Simply import `DistributedTraining` architecture insteda of the `BasicArchitecture`:

```python

from fairing.train import Train

from fairing.architectures.kubeflow.distributed import DistributedTraining

@Train(

    repository='',

    architecture=DistributedTraining(ps_count=2, worker_count=5),

)

class MyModel(object):

    ...

```

Specify the number of desired parameter servers with `ps_count` and the number of workers with `worker_count`.

Another instance of type master will always be created.

See [https://github.com/Azure/kubeflow-labs/tree/master/7-distributed-tensorflow#modifying-your-model-to-use-tfjobs-tf_config](https://github.com/Azure/kubeflow-labs/tree/master/7-distributed-tensorflow#modifying-your-model-to-use-tfjobs-tf_config) to understand how you need to modify your model to support distributed training with Kubeflow.

Complete example: [examples/distributed-training/main.py](./examples/distributed-training/main.py)

### From a Jupyter Notebook

To make `fairing` work from a Jupyter Notebook deployed with Kubeflow, a few more requirements are needed (such as Knative Build deployed).

Refer [to the dedicated documentation and example](examples/kubeflow-jupyter-notebook/).

## TensorBoard

You can easily attach a TensorBoard instance to monitor your training:

```python

@Train(

    repository='',

    tensorboard={

      'log_dir': LOG_DIR,

      'pvc_name': '',

      'public': True # Request a public IP

    }

)

class MyModel(object):

    ...

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/wbuchwalter/fairing

Awesome Lists containing this project

README