{"id":18334153,"url":"https://github.com/wbuchwalter/fairing","last_synced_at":"2025-04-06T03:34:52.899Z","repository":{"id":66049806,"uuid":"131087394","full_name":"wbuchwalter/fairing","owner":"wbuchwalter","description":"👩‍🔬[Experimental] Easily train and serve ML models on Kubernetes, directly from your python code.","archived":false,"fork":false,"pushed_at":"2018-11-08T15:36:51.000Z","size":178,"stargazers_count":31,"open_issues_count":6,"forks_count":4,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-29T00:14:00.878Z","etag":null,"topics":["kubeflow","kubernetes","machine-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wbuchwalter.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-04-26T02:13:01.000Z","updated_at":"2022-01-05T17:04:21.000Z","dependencies_parsed_at":"2023-04-20T21:02:58.925Z","dependency_job_id":null,"html_url":"https://github.com/wbuchwalter/fairing","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wbuchwalter%2Ffairing","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wbuchwalter%2Ffairing/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wbuchwalter%2Ffairing/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wbuchwalter%2Ffairing/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wbuchwalter","download_url":"https://codeload.github.com/wbuchwalter/fairing/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247430838,"owners_count":20937873,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["kubeflow","kubernetes","machine-learning"],"created_at":"2024-11-05T19:47:06.030Z","updated_at":"2025-04-06T03:34:52.893Z","avatar_url":"https://github.com/wbuchwalter.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# :warning: Fairing has moved!:warning: \n \nFairing is now part of the Kubeflow organisation, the new repository for the project is https://github.com/kubeflow/fairing \n\n\n# Fairing\n\nEasily train and serve ML models on Kubernetes, directly from your python code.  \n\nThis projects uses [Metaparticle](http://metaparticle.io/) behind the scene.\n\nfairing allows you to express how you want your model to be trained and served using native python decorators.  \n\n\n## Table of Contents\n\n- [Requirements](#requirements)\n- [Getting `fairing`](#getting-fairing)\n- [Training](#training)\n  - [Simple Training](#simple-training)\n  - [Hyperparameters Tuning](#hyperparameters-tuning)\n  - [Population Based Training](#population-based-training)\n- [Usage with Kubeflow](#usage-with-kubeflow)\n  - [Simple TfJob](#simple-tfjob)\n  - [Distributed Training](#distributed-training)\n  - [From a Jupyter Notebook](#from-a-jupyter-notebook)\n- [Monitoring with TensorBoard](#tensorboard)\n\n## Requirements\n\nIf you are going to use `fairing` on your local machine (as opposed to from a Jupyter Notebook deployed inside a Kubernetes cluster for example), you will need \nto have access to a deployed Kubernetes cluster, and have the `kubeconfig` for this cluster on your machine.\n\nYou will also need to have docker installed locally.\n\n## Getting `fairing`\n\n**Note**: This projects requires python 3\n\n```bash\npip install fairing\n```\n\nOr, in a Jupyter Notebook, create a new cell and execute: `!pip install fairing`.\n\n## Training\n\n`fairing` provides a `@Train` class decorator allowing you to specify how you want your model to be packaged and trained.  \nYour model needs to be defined as a class to work with `fairing`. \n\nThis limitation is needed in order to enable usage of more complex training strategies and simplify usage from within a Jupyter Notebook.\n\nFollowing are a series of example that should help you understand how fairing works.\n\u003c!-- The train decorator \n* `package`: Defines the repository (this could be your DockerHub username, or something like `somerepo.acr.io` on Azure for example) and name that should be used to build the image. You can control wether you want to publish the image by setting `publish` to `True`.\n* `strategy`: Specify which training strategy should be used (more details below).\n* `architecture`: Specify which architecture should be used. (more details below)\n* `tensorboard`: [Optional] If specified, will spawn an instance of TensorBoard to monitor your trainings\n  * `log_dir`: Directory where the summaries are saved.\n  * `pvc_name`: Name of an existing `PermanentVolumeClaim` that should be mounted.\n  * `public`: If set to `True` then a public IP will be created for TensorBoard (provided your Kubernetes cluster supports this). Otherwise only a private IP will be created. --\u003e\n\n\u003c!-- ### Training Strategies --\u003e\n\n#### Simple Training\n\nYour class needs to define a `train` method that will be called during training:\n\n```python\nfrom fairing.train import Train\n\n@Train(repository='\u003cyour-repo-name\u003e')\nclass MyModel(object):\n    def train(self):\n      # Training logic goes here\n\n```\n\u003c!-- No `strategy` is specified here, since the default `strategy` is `basicTrainingStrategy`. --\u003e\n\nComplete example: [examples/simple-training/main.py](./examples/simple-training/main.py)\n\n\n#### Hyperparameters Tuning\n\nAllows you to run multiple trainings in parallel, each one with different values for your hyperparameters.\n\nYour class should define a `hyperparameters` method that returns an dictionary of hyperparameters and their values.\nThis dictionary will be automatically passed to your `train` method. \nDon't forget to add a new argument to your `train` method to received the hyperparameters.\n\n```python\nfrom fairing.train import Train\nfrom fairing.strategies.hp import HyperparameterTuning\n\n@Train(\n    repository='\u003cyour-repo-name\u003e',\n    strategy=HyperparameterTuning(runs=6),\n)\nclass MyModel(object):\n    def hyperparameters(self):\n      return {\n        'learning_rate': random.normalvariate(0.5, 0.45)\n      }\n\n    def train(self, hp):\n      # Training logic goes here\n```\n\nTo specify that we wanted to train our model using hyperparameters tuning, and not just a simple training, \nwe passed a new `strategy` parameter to the `@Train` decorator, and specified the number of runs we wish to be created.\n\n\nComplete example: [examples/hyperparameter-tuning/main.py](./examples/hyperparameter-tuning/main.py)\n\n#### Population Based Training\n\nWe can also ask `fairing` to train our code using [Population Based Training](https://deepmind.com/blog/population-based-training-neural-networks/).\n\nThis is a more advanced training strategies that needs hook into different lifecycle steps of your model, thus we need to define several additional method into our model class.\n\nA multiple read/write PVC name needs to be pass to the `PopulationBasedTraining` strategie. This is used to store and exchange the different models generated by our training to enable the `explore/exploit` mechanism of Population Based Training.\n\n```python\nfrom fairing.train import Train\nfrom fairing.strategies.pbt import PopulationBasedTraining\n\n@Train(\n    repository='\u003cyour-repo-name\u003e',\n    strategy=PopulationBasedTraining(\n        population_size=10,\n        exploit_count=6,\n        steps_per_exploit=5000,\n        pvc_name='\u003cpvc-name\u003e',\n        model_path = MODEL_PATH\n    )\n)\nclass MyModel(object):\n    def hyperparameters(self):\n      # returns the dictionary of hyperparameters\n    \n    def build(self, hp):\n      # build the model\n    \n    def train(self, hp):\n      # training logic\n    \n    def save(self):\n      # save the model at MODEL_PATH\n    \n    def restore(self, model_path):\n      # restore the model from MODEL_PATH\n```\n\nComplete example: [examples/population-based-training/main.py](./examples/population-based-training/main.py)\n\n\n\u003c!-- ### Training Architectures\n\n#### Basic Architure\n\nThis is the default `architecture`, each training run will be a single container acting in isolation.\nNo `architecure` is specified since this is the default value.\n\n```python\n# Note: we are note specifiying any architecture since this is the default value\n@Train(package={'name': '\u003cyour-image-name\u003e', 'repository': '\u003cyour-repo-name\u003e', 'publish': True})\nclass MyModel(object):\n    ...\n```\n\nComplete example: [examples/simple-training/main.py](./examples/simple-training/main.py) --\u003e\n\n## Usage with Kubeflow\n\n### Simple TfJob\n\nInstead of creating native `Jobs`, `fairing` can leverage Kubeflow's `TfJobs` assuming you have Kubeflow installed in your cluster.\nSimply pass the Kubeflow architecture to the train decorator (note that you can still use all the training strategies mentionned above):\n\n```python\nfrom fairing.train import Train\nfrom fairing.architectures.kubeflow.basic import BasicArchitecture\n\n@Train(repository='wbuchwalter', architecture=BasicArchitecture())\nclass MyModel(object):\n    def train(self):\n       # training logic\n```\n\n\n### Distributed Training\n\nUsing Kubeflow, we can also ask `fairing` to start [distributed trainings](https://www.tensorflow.org/deploy/distributed) instead.\nSimply import `DistributedTraining` architecture insteda of the `BasicArchitecture`:\n\n```python\nfrom fairing.train import Train\nfrom fairing.architectures.kubeflow.distributed import DistributedTraining\n\n@Train(\n    repository='\u003cyour-repo-name\u003e',\n    architecture=DistributedTraining(ps_count=2, worker_count=5),\n)\nclass MyModel(object):\n    ...\n```\n\nSpecify the number of desired parameter servers with `ps_count` and the number of workers with `worker_count`.\nAnother instance of type master will always be created.\n\nSee [https://github.com/Azure/kubeflow-labs/tree/master/7-distributed-tensorflow#modifying-your-model-to-use-tfjobs-tf_config](https://github.com/Azure/kubeflow-labs/tree/master/7-distributed-tensorflow#modifying-your-model-to-use-tfjobs-tf_config) to understand how you need to modify your model to support distributed training with Kubeflow.\n\nComplete example: [examples/distributed-training/main.py](./examples/distributed-training/main.py)\n\n### From a Jupyter Notebook\n\nTo make `fairing` work from a Jupyter Notebook deployed with Kubeflow, a few more requirements are needed (such as Knative Build deployed).\nRefer [to the dedicated documentation and example](examples/kubeflow-jupyter-notebook/).\n\n## TensorBoard\n\nYou can easily attach a TensorBoard instance to monitor your training:\n\n```python\n@Train(\n    repository='\u003cyour-repo-name\u003e',\n    tensorboard={\n      'log_dir': LOG_DIR,\n      'pvc_name': '\u003cpvc-name\u003e',\n      'public': True # Request a public IP\n    }\n)\nclass MyModel(object):\n    ...\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwbuchwalter%2Ffairing","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwbuchwalter%2Ffairing","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwbuchwalter%2Ffairing/lists"}