{"id":16597956,"url":"https://github.com/alexioannides/kubernetes-mlops","last_synced_at":"2025-05-15T23:08:37.033Z","repository":{"id":53662047,"uuid":"167996749","full_name":"AlexIoannides/kubernetes-mlops","owner":"AlexIoannides","description":"MLOps tutorial using Python, Docker and Kubernetes.","archived":false,"fork":false,"pushed_at":"2024-10-18T14:44:54.000Z","size":155,"stargazers_count":388,"open_issues_count":7,"forks_count":108,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-04-08T10:37:14.637Z","etag":null,"topics":["cloud-platform","docker","flask","gcp","helm","kubernetes","machine-learning","mlops","python","seldon","seldon-core"],"latest_commit_sha":null,"homepage":"https://alexioannides.com/2019/01/10/deploying-python-ml-models-with-flask-docker-and-kubernetes/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AlexIoannides.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-01-28T16:40:35.000Z","updated_at":"2025-04-05T15:03:11.000Z","dependencies_parsed_at":"2024-10-26T20:29:57.957Z","dependency_job_id":"a072babe-7562-4802-896d-2ef66ccbe431","html_url":"https://github.com/AlexIoannides/kubernetes-mlops","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexIoannides%2Fkubernetes-mlops","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexIoannides%2Fkubernetes-mlops/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexIoannides%2Fkubernetes-mlops/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexIoannides%2Fkubernetes-mlops/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AlexIoannides","download_url":"https://codeload.github.com/AlexIoannides/kubernetes-mlops/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254436949,"owners_count":22070947,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cloud-platform","docker","flask","gcp","helm","kubernetes","machine-learning","mlops","python","seldon","seldon-core"],"created_at":"2024-10-12T00:07:11.255Z","updated_at":"2025-05-15T23:08:30.211Z","avatar_url":"https://github.com/AlexIoannides.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Deploying Machine Learning Models on Kubernetes\n\nA common pattern for deploying Machine Learning (ML) models into production environments - e.g. ML models trained using the SciKit Learn or Keras packages (for Python), that are ready to provide predictions on new data - is to expose these ML as RESTful API microservices, hosted from within [Docker](https://www.docker.com) containers. These can then deployed to a cloud environment for handling everything required for maintaining continuous availability - e.g. fault-tolerance, auto-scaling, load balancing and rolling service updates.\n\nThe configuration details for a continuously available cloud deployment are specific to the targeted cloud provider(s) - e.g. the deployment process and topology for Amazon Web Services is not the same as that for Microsoft Azure, which in-turn is not the same as that for Google Cloud Platform. This constitutes knowledge that needs to be acquired for every cloud provider. Furthermore, it is difficult (some would say near impossible) to test entire deployment strategies locally, which makes issues such as networking hard to debug.\n\n[Kubernetes](https://kubernetes.io) is a container orchestration platform that seeks to address these issues. Briefly, it provides a mechanism for defining **entire** microservice-based application deployment topologies and their service-level requirements for maintaining continuous availability. It is agnostic to the targeted cloud provider, can be run on-premises and even locally on your laptop - all that's required is a cluster of virtual machines running Kubernetes - i.e. a Kubernetes cluster.\n\nThis README is designed to be read in conjunction with the code in this repository, that contains the Python modules, Docker configuration files and Kubernetes instructions for demonstrating how a simple Python ML model can be turned into a production-grade RESTful model-scoring (or prediction) API service, using Docker and Kubernetes - both locally and with Google Cloud Platform (GCP). It is not a comprehensive guide to Kubernetes, Docker or ML - think of it more as a 'ML on Kubernetes 101' for demonstrating capability and allowing newcomers to Kubernetes (e.g. data scientists who are more focused on building models as opposed to deploying them), to get up-and-running quickly and become familiar with the basic concepts and patterns.\n\nWe will demonstrate ML model deployment using two different approaches: a first principles approach using Docker and Kubernetes; and then a deployment using the [Seldon-Core](https://www.seldon.io) Kubernetes native framework for streamlining the deployment of ML services. The former will help to appreciate the latter, which constitutes a powerful framework for deploying and performance-monitoring many complex ML model pipelines.\n\nThis work was initially committed in 2018 and has since formed the basis of [Bodywork](https://github.com/bodywork-ml/bodywork-core) - an open-source MLOps tool for deploying machine learning projects developed in Python, to Kubernetes. Bodywork automates a lot of the steps that this project has demonstrated to the many machine learning engineers that have used it over the years - take a look at the [documentation](https://bodywork.readthedocs.io/en/latest/).\n\n## Containerising a Simple ML Model Scoring Service using Flask and Docker\n\nWe start by demonstrating how to achieve this basic competence using the simple Python ML model scoring REST API contained in the `api.py` module, together with the `Dockerfile`, both within the `py-flask-ml-score-api` directory, whose core contents are as follows,\n\n```bash\npy-flask-ml-score-api/\n | Dockerfile\n | Pipfile\n | Pipfile.lock\n | api.py\n```\n\nIf you're already feeling lost then these files are discussed in the points below, otherwise feel free to skip to the next section.\n\n### Defining the Flask Service in the `api.py` Module\n\nThis is a Python module that uses the [Flask](http://flask.pocoo.org) framework for defining a web service (`app`), with a function (`score`), that executes in response to a HTTP request to a specific URL (or 'route'), thanks to being wrapped by the `app.route` function. For reference, the relevant code is reproduced below,\n\n```python\nfrom flask import Flask, jsonify, make_response, request\n\napp = Flask(__name__)\n\n\n@app.route('/score', methods=['POST'])\ndef score():\n    features = request.json['X']\n    return make_response(jsonify({'score': features}))\n\n\nif __name__ == '__main__':\n    app.run(host='0.0.0.0', port=5000)\n```\n\nIf running locally - e.g. by starting the web service using `python run api.py` - we would be able reach our function (or 'endpoint') at `http://localhost:5000/score`. This function takes data sent to it as JSON (that has been automatically de-serialised as a Python dict made available as the `request` variable in our function definition), and returns a response (automatically serialised as JSON).\n\nIn our example function, we expect an array of features, `X`, that we pass to a ML model, which in our example returns those same features back to the caller - i.e. our chosen ML model is the identity function, which we have chosen for purely demonstrative purposes. We could just as easily have loaded a pickled SciKit-Learn or Keras model and passed the data to the approproate `predict` method, returning a score for the feature-data as JSON - see [here](https://github.com/AlexIoannides/ml-workflow-automation/blob/master/deploy/py-sklearn-flask-ml-service/api.py) for an example of this in action.\n\n### Defining the Docker Image with the `Dockerfile`\n\n A `Dockerfile` is essentially the configuration file used by Docker, that allows you to define the contents and configure the operation of a Docker container, when operational. This static data, when not executed as a container, is referred to as the 'image'. For reference, the `Dockerfile` is reproduced below,\n\n```docker\nFROM python:3.6-slim\nWORKDIR /usr/src/app\nCOPY . .\nRUN pip install pipenv\nRUN pipenv install\nEXPOSE 5000\nCMD [\"pipenv\", \"run\", \"python\", \"api.py\"]\n```\n\nIn our example `Dockerfile` we:\n\n - start by using a pre-configured Docker image (`python:3.6-slim`) that has a version of the [Alpine Linux](https://www.alpinelinux.org/community/) distribution with Python already installed;\n - then copy the contents of the `py-flask-ml-score-api` local directory to a directory on the image called `/usr/src/app`;\n - then use `pip` to install the [Pipenv](https://pipenv.readthedocs.io/en/latest/) package for Python dependency management (see the appendix at the bottom for more information on how we use Pipenv);\n - then use Pipenv to install the dependencies described in `Pipfile.lock` into a virtual environment on the image;\n - configure port 5000 to be exposed to the 'outside world' on the running container; and finally,\n - to start our Flask RESTful web service - `api.py`. Note, that here we are relying on Flask's internal [WSGI](https://en.wikipedia.org/wiki/Web_Server_Gateway_Interface) server, whereas in a production setting we would recommend on configuring a more robust option (e.g. Gunicorn), [as discussed here](https://pythonspeed.com/articles/gunicorn-in-docker/).\n\n Building this custom image and asking the Docker daemon to run it (remember that a running image is a 'container'), will expose our RESTful ML model scoring service on port 5000 as if it were running on a dedicated virtual machine. Refer to the official [Docker documentation](https://docs.docker.com/get-started/) for a more comprehensive discussion of these core concepts.\n\n### Building a Docker Image for the ML Scoring Service\n\nWe assume that [Docker is running locally](https://www.docker.com) (both Docker client and daemon), that the client is logged into an account on [DockerHub](https://hub.docker.com) and that there is a terminal open in the this project's root directory. To build the image described in the `Dockerfile` run,\n\n```bash\ndocker build --tag alexioannides/test-ml-score-api py-flask-ml-score-api\n```\n\nWhere 'alexioannides' refers to the name of the DockerHub account that we will push the image to, once we have tested it. \n\n#### Testing\n\nTo test that the image can be used to create a Docker container that functions as we expect it to use,\n\n```bash\ndocker run --rm --name test-api -p 5000:5000 -d alexioannides/test-ml-score-api\n```\n\nWhere we have mapped port 5000 from the Docker container - i.e. the port our ML model scoring service is listening to - to port 5000 on our host machine (localhost). Then check that the container is listed as running using,\n\n```bash\ndocker ps\n```\n\nAnd then test the exposed API endpoint using,\n\n```bash\ncurl http://localhost:5000/score \\\n    --request POST \\\n    --header \"Content-Type: application/json\" \\\n    --data '{\"X\": [1, 2]}'\n```\n\nWhere you should expect a response along the lines of,\n\n```json\n{\"score\":[1,2]}\n```\n\nAll our test model does is return the input data - i.e. it is the identity function. Only a few lines of additional code are required to modify this service to load a SciKit Learn model from disk and pass new data to it's 'predict' method for generating predictions - see [here](https://github.com/AlexIoannides/ml-workflow-automation/blob/master/deploy/py-sklearn-flask-ml-service/api.py) for an example. Now that the container has been confirmed as operational, we can stop it,\n\n```bash\ndocker stop test-api\n```\n\n#### Pushing the Image to the DockerHub Registry\n\nIn order for a remote Docker host or Kubernetes cluster to have access to the image we've created, we need to publish it to an image registry. All cloud computing providers that offer managed Docker-based services will provide private image registries, but we will use the public image registry at DockerHub, for convenience. To push our new image to DockerHub (where my account ID is 'alexioannides') use,\n\n```bash\ndocker push alexioannides/test-ml-score-api\n```\n\nWhere we can now see that our chosen naming convention for the image is intrinsically linked to our target image registry (you will need to insert your own account ID where required). Once the upload is finished, log onto DockerHub to confirm that the upload has been successful via the [DockerHub UI](https://hub.docker.com/u/alexioannides).\n\n## Installing Kubernetes for Local Development and Testing\n\nThere are two options for installing a single-node Kubernetes cluster that is suitable for local development and testing: via the [Docker Desktop](https://www.docker.com/products/docker-desktop) client, or via [Minikube](https://github.com/kubernetes/minikube).\n\n### Installing Kubernetes via Docker Desktop\n\nIf you have been using Docker on a Mac, then the chances are that you will have been doing this via the Docker Desktop application. If not (e.g. if you installed Docker Engine via Homebrew), then Docker Desktop can be downloaded [here](https://www.docker.com/products/docker-desktop). Docker Desktop now comes bundled with Kubernetes, which can be activated by going to `Preferences -\u003e Kubernetes` and selecting `Enable Kubernetes`. It will take a while for Docker Desktop to download the Docker images required to run Kubernetes, so be patient. After it has finished, go to `Preferences -\u003e Advanced` and ensure that at least 2 CPUs and 4 GiB have been allocated to the Docker Engine, which are the the minimum resources required to deploy a single Seldon ML component.\n\nTo interact with the Kubernetes cluster you will need the `kubectl` Command Line Interface (CLI) tool, which will need to be downloaded separately. The easiest way to do this on a Mac is via Homebrew - i.e with `brew install kubernetes-cli`. Once you have `kubectl` installed and a Kubernetes cluster up-and-running, test that everything is working as expected by running,\n\n```bash\nkubectl cluster-info\n```\n\nWhich ought to return something along the lines of,\n\n```bash\nKubernetes master is running at https://kubernetes.docker.internal:6443\nKubeDNS is running at https://kubernetes.docker.internal:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy\n\nTo further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.\n```\n\n### Installing Kubernetes via Minikube\n\nOn Mac OS X, the steps required to get up-and-running with Minikube are as follows:\n\n- make sure the [Homebrew](https://brew.sh) package manager for OS X is installed; then,\n- install VirtualBox using, `brew cask install virtualbox` (you may need to approve installation via OS X System Preferences); and then,\n- install Minikube using, `brew cask install minikube`.\n\nTo start the test cluster run,\n\n```bash\nminikube start --memory 4096\n```\n\nWhere we have specified the minimum amount of memory required to deploy a single Seldon ML component. Be patient - Minikube may take a while to start. To test that the cluster is operational run,\n\n```bash\nkubectl cluster-info\n```\n\nWhere `kubectl` is the standard Command Line Interface (CLI) client for interacting with the Kubernetes API (which was installed as part of Minikube, but is also available separately).\n\n### Deploying the Containerised ML Model Scoring Service to Kubernetes\n\nTo launch our test model scoring service on Kubernetes, we will start by deploying the containerised service within a Kubernetes [Pod](https://kubernetes.io/docs/concepts/workloads/pods/pod-overview/), whose rollout is managed by a [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/), which in in-turn creates a [ReplicaSet](https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/) - a Kubernetes resource that ensures a minimum number of pods (or replicas), running our service are operational at any given time. This is achieved with,\n\n```bash\nkubectl create deployment test-ml-score-api --image=alexioannides/test-ml-score-api:latest\n```\n\nTo check on the status of the deployment run,\n\n```bash\nkubectl rollout status deployment test-ml-score-api\n```\n\nAnd to see the pods that is has created run,\n\n```bash\nkubectl get pods\n```\n\nIt is possible to use [port forwarding](https://en.wikipedia.org/wiki/Port_forwarding) to test an individual container without exposing it to the public internet. To use this, open a separate terminal and run (for example),\n\n```bash\nkubectl port-forward test-ml-score-api-szd4j 5000:5000\n```\n\nWhere `test-ml-score-api-szd4j` is the precise name of the pod currently active on the cluster, as determined from the `kubectl get pods` command. Then from your original terminal, to repeat our test request against the same container running on Kubernetes run,\n\n```bash\ncurl http://localhost:5000/score \\\n    --request POST \\\n    --header \"Content-Type: application/json\" \\\n    --data '{\"X\": [1, 2]}'\n```\n\nTo expose the container as a (load balanced) [service](https://kubernetes.io/docs/concepts/services-networking/service/) to the outside world, we have to create a Kubernetes service that references it. This is achieved with the following command,\n\n```bash\nkubectl expose deployment test-ml-score-api --port 5000 --type=LoadBalancer --name test-ml-score-api-lb\n```\n\nIf you are using Docker Desktop, then this will automatically emulate a load balancer at `http://localhost:5000`. To find where Minikube has exposed its emulated load balancer run,\n\n```bash\nminikube service list\n```\n\nNow we test our new service - for example (with Docker Desktop),\n\n```bash\ncurl http://localhost:5000/score \\\n    --request POST \\\n    --header \"Content-Type: application/json\" \\\n    --data '{\"X\": [1, 2]}'\n```\n\nNote, neither Docker Desktop or Minikube setup a real-life load balancer (which is what would happen if we made this request on a cloud platform). To tear-down the load balancer, deployment and pod, run the following commands in sequence,\n\n```bash\nkubectl delete deployment test-ml-score-api\nkubectl delete service test-ml-score-api-lb\n```\n\n## Configuring a Multi-Node Cluster on Google Cloud Platform\n\nIn order to perform testing on a real-world Kubernetes cluster with far greater resources than those available on a laptop, the easiest way is to use a managed Kubernetes platform from a cloud provider. We will use Kubernetes Engine on [Google Cloud Platform (GCP)](https://cloud.google.com).\n\n### Getting Up-and-Running with Google Cloud Platform\n\nBefore we can use Google Cloud Platform, sign-up for an account and create a project specifically for this work. Next, make sure that the GCP SDK is installed on your local machine - e.g.,\n\n```bash\nbrew cask install google-cloud-sdk\n```\n\nOr by downloading an installation image [directly from GCP](https://cloud.google.com/sdk/docs/quickstart-macos). Note, that if you haven't already installed Kubectl, then you will need to do so now, which can be done using the GCP SDK,\n\n```bash\ngcloud components install kubectl\n```\n\nWe then need to initialise the SDK,\n\n```bash\ngcloud init\n```\n\nWhich will open a browser and guide you through the necessary authentication steps. Make sure you pick the project you created, together with a default zone and region (if this has not been set via Compute Engine -\u003e Settings).\n\n### Initialising a Kubernetes Cluster\n\nFirstly, within the GCP UI visit the Kubernetes Engine page to trigger the Kubernetes API to start-up. From the command line we then start a cluster using,\n\n```bash\ngcloud container clusters create k8s-test-cluster --num-nodes 3 --machine-type g1-small\n```\n\nAnd then go make a cup of coffee while you wait for the cluster to be created. Note, that this will automatically switch your `kubectl` context to point to the cluster on GCP, as you will see if you run, `kubectl config get-contexts`. To switch back to the Docker Desktop client use `kubectl config use-context docker-desktop`.\n\n### Launching the Containerised ML Model Scoring Service on GCP\n\nThis is largely the same as we did for running the test service locally - run the following commands in sequence,\n\n```bash\nkubectl create deployment test-ml-score-api --image=alexioannides/test-ml-score-api:latest\nkubectl expose deployment test-ml-score-api --port 5000 --type=LoadBalancer --name test-ml-score-api-lb\n```\n\nBut, to find the external IP address for the GCP cluster we will need to use,\n\n```bash\nkubectl get services\n```\n\nAnd then we can test our service on GCP - for example,\n\n```bash\ncurl http://35.246.92.213:5000/score \\\n    --request POST \\\n    --header \"Content-Type: application/json\" \\\n    --data '{\"X\": [1, 2]}'\n```\n\nOr, we could again use port forwarding to attach to a single pod - for example,\n\n```bash\nkubectl port-forward test-ml-score-api-nl4sc 5000:5000\n```\n\nAnd then in a separate terminal,\n\n```bash\ncurl http://localhost:5000/score \\\n    --request POST \\\n    --header \"Content-Type: application/json\" \\\n    --data '{\"X\": [1, 2]}'\n```\n\nFinally, we tear-down the replication controller and load balancer,\n\n```bash\nkubectl delete deployment test-ml-score-api\nkubectl delete service test-ml-score-api-lb\n```\n\n## Switching Between Kubectl Contexts\n\nIf you are running both with Kubernetes locally and with a cluster on GCP, then you can switch Kubectl [context](https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/) from one cluster to the other, as follows,\n\n```bash\nkubectl config use-context docker-desktop\n```\n\nWhere the list of available contexts can be found using,\n\n```bash\nkubectl config get-contexts\n```\n\n## Using YAML Files to Define and Deploy the ML Model Scoring Service\n\nUp to this point we have been using Kubectl commands to define and deploy a basic version of our ML model scoring service. This is fine for demonstrative purposes, but quickly becomes limiting, as well as unmanageable. In practice, the standard way of defining entire Kubernetes deployments is with YAML files,  posted to the Kubernetes API. The `py-flask-ml-score.yaml` file in the `py-flask-ml-score-api` directory is an example of how our ML model scoring service can be defined in a single YAML file. This can now be deployed using a single command,\n\n```bash\nkubectl apply -f py-flask-ml-score-api/py-flask-ml-score.yaml\n```\n\nNote, that we have defined three separate Kubernetes components in this single file: a [namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/), a deployment and a load-balanced service - for all of these components (and their sub-components), using `---` to delimit the definition of each separate component. To see all components deployed into this namespace use,\n\n```bash\nkubectl get all --namespace test-ml-app\n```\n\nAnd likewise set the `--namespace` flag when using any `kubectl get` command to inspect the different components of our test app. Alternatively, we can set our new namespace as the default context,\n\n```bash\nkubectl config set-context $(kubectl config current-context) --namespace=test-ml-app\n```\n\nAnd then run,\n\n```bash\nkubectl get all\n```\n\nWhere we can switch back to the default namespace using,\n\n```bash\nkubectl config set-context $(kubectl config current-context) --namespace=default\n```\n\nTo tear-down this application we can then use,\n\n```bash\nkubectl delete -f py-flask-ml-score-api/py-flask-ml-score.yaml\n```\n\nWhich saves us from having to use multiple commands to delete each component individually. Refer to the [official documentation for the Kubernetes API](https://kubernetes.io/docs/home/) to understand the contents of this YAML file in greater depth.\n\n## Using Helm Charts to Define and Deploy the ML Model Scoring Service\n\nWriting YAML files for Kubernetes can get repetitive and hard to manage, especially if there is a lot of 'copy-paste' involved, when only a handful of parameters need to be changed from one deployment to the next,  but there is a 'wall of YAML' that needs to be modified. Enter [Helm](https://helm.sh//) - a framework for creating, executing and managing Kubernetes deployment templates. What follows is a very high-level demonstration of how Helm can be used to deploy our ML model scoring service - for a comprehensive discussion of Helm's full capabilities (and here are a lot of them), please refer to the [official documentation](https://docs.helm.sh). Seldon-Core can also be deployed using Helm and we will cover this in more detail later on.\n\n### Installing Helm\n\nAs before, the easiest way to install Helm onto Mac OS X is to use the Homebrew package manager,\n\n```bash\nbrew install kubernetes-helm\n```\n\nHelm relies on a dedicated deployment server, referred to as the 'Tiller', to be running within the same Kubernetes cluster we wish to deploy our applications to. Before we deploy Tiller we need to create a cluster-wide super-user role to assign to it, so that it can create and modify Kubernetes resources in any namespace. To achieve this, we start by creating a [Service Account](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/) that is destined for our tiller. A Service Account is a means by which a pod (and any service running within it), when associated with a Service Accoutn, can authenticate itself to the Kubernetes API, to be able to view, create and modify resources. We create this in the `kube-system` namespace (a common convention) as follows,\n\n```bash\nkubectl --namespace kube-system create serviceaccount tiller\n```\n\nWe then create a binding between this Service Account and the `cluster-admin` [Cluster Role](https://kubernetes.io/docs/reference/access-authn-authz/rbac/), which as the name suggest grants cluster-wide admin rights,\n\n```bash\nkubectl create clusterrolebinding tiller \\\n    --clusterrole cluster-admin \\\n    --serviceaccount=kube-system:tiller\n```\n\nWe can now deploy the Helm Tiller to a Kubernetes cluster, with the desired access rights using,\n\n```bash\nhelm init --service-account tiller\n```\n\n### Deploying with Helm\n\nTo create a fresh Helm deployment definition - referred to as a 'chart' in Helm terminology - run,\n\n```bash\nhelm create NAME-OF-YOUR-HELM-CHART\n```\n\nThis creates a new directory - e.g. `helm-ml-score-app` as included with this repository - with the following high-level directory structure,\n\n```bash\nhelm-ml-score-app/\n | -- charts/\n | -- templates/\n | Chart.yaml\n | values.yaml\n```\n\nBriefly, the `charts` directory contains other charts that our new chart will depend on (we will not make use of this), the `templates` directory contains our Helm templates, `Chart.yaml` contains core information for our chart (e.g. name and version information) and `values.yaml` contains default values to render our templates with (in the case that no values are set from the command line).\n\nThe next step is to delete all of the files in the `templates` directory (apart from `NOTES.txt`), and to replace them with our own. We start with `namespace.yaml` for declaring a namespace for our app,\n\n```yaml\napiVersion: v1\nkind: Namespace\nmetadata:\n  name: {{ .Values.app.namespace }}\n```\n\nAnyone familiar with HTML template frameworks (e.g. Jinja), will be familiar with the use of ``{{}}`` for defining values that will be injected into the rendered template. In this specific instance `.Values.app.namespace` injects the `app.namespace` variable, whose default value defined in `values.yaml`. Next we define a deployment of pods in `deployment.yaml`,\n\n```yaml\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  labels:\n    app: {{ .Values.app.name }}\n    env: {{ .Values.app.env }}\n  name: {{ .Values.app.name }}\n  namespace: {{ .Values.app.namespace }}\nspec:\n  replicas: 1\n  selector:\n    matchLabels:\n      app: {{ .Values.app.name }}\n  template:\n    metadata:\n      labels:\n        app: {{ .Values.app.name }}\n        env: {{ .Values.app.env }}\n    spec:\n      containers:\n      - image: {{ .Values.app.image }}\n        name: {{ .Values.app.name }}\n        ports:\n        - containerPort: {{ .Values.containerPort }}\n          protocol: TCP\n```\n\nAnd the details of the load balancer service in `service.yaml`,\n\n```yaml\napiVersion: v1\nkind: Service\nmetadata:\n  name: {{ .Values.app.name }}-lb\n  labels:\n    app: {{ .Values.app.name }}\n  namespace: {{ .Values.app.namespace }}\nspec:\n  type: LoadBalancer\n  ports:\n  - port: {{ .Values.containerPort }}\n    targetPort: {{ .Values.targetPort }}\n  selector:\n    app: {{ .Values.app.name }}\n```\n\nWhat we have done, in essence, is to split-out each component of the deployment details from `py-flask-ml-score.yaml` into its own file and then define template variables for each parameter of the configuration that is most likely to change from one deployment to the next. To test and examine the rendered template, without having to attempt a deployment, run,\n\n```bash\nhelm install helm-ml-score-app --debug --dry-run\n```\n\nIf you are happy with the results of the 'dry run', then execute the deployment and generate a release from the chart using,\n\n```bash\nhelm install helm-ml-score-app --name test-ml-app\n```\n\nThis will automatically print the status of the release, together with the name that Helm has ascribed to it (e.g. 'willing-yak') and the contents of `NOTES.txt` rendered to the terminal. To list all available Helm releases and their names use,\n\n```bash\nhelm list\n```\n\nAnd to the status of all their constituent components (e.g. pods, replication controllers, service, etc.) use for example,\n\n```bash\nhelm status test-ml-app\n```\n\nThe ML scoring service can now be tested in exactly the same way as we have done previously (above). Once you have convinced yourself that it's working as expected, the release can be deleted using,\n\n```bash\nhelm delete test-ml-app\n```\n\n## Using Seldon to Deploy the ML Model Scoring Service to Kubernetes\n\nSeldon's core mission is to simplify the repeated deployment and management of complex ML prediction pipelines on top of Kubernetes. In this demonstration we are going to focus on the simplest possible example - i.e. the simple ML model scoring API we have already been using.\n\n### Building an ML Component for Seldon\n\nTo deploy a ML component using Seldon, we need to create Seldon-compatible Docker images. We start by following [these guidelines](https://docs.seldon.io/projects/seldon-core/en/latest/python/python_wrapping_docker.html) for defining a Python class that wraps an ML model targeted for deployment with Seldon. This is contained within the `seldon-ml-score-component` directory, whose contents are similar to those in `py-flask-ml-score-api`,\n\n```bash\nseldon-ml-score-component/\n | Dockerfile\n | MLScore.py\n | Pipfile\n | Pipfile.lock\n```\n\n#### Building the Docker Image for use with Seldon\n\nSeldon requires that the Docker image for the ML scoring service be structured in a particular way:\n\n- the ML model has to be wrapped in a Python class with a `predict` method with a particular signature (or interface) - for example, in `MLScore.py` (deliberately named after the Python class contained within it) we have,\n\n```python\nclass MLScore:\n    \"\"\"\n    Model template. You can load your model parameters in __init__ from\n    a location accessible at runtime\n    \"\"\"\n\n    def __init__(self):\n        \"\"\"\n        Load models and add any initialization parameters (these will\n        be passed at runtime from the graph definition parameters\n        defined in your seldondeployment kubernetes resource manifest).\n        \"\"\"\n        print(\"Initializing\")\n\n    def predict(self, X, features_names):\n        \"\"\"\n        Return a prediction.\n\n        Parameters\n        ----------\n        X : array-like\n        feature_names : array of feature names (optional)\n        \"\"\"\n        print(\"Predict called - will run identity function\")\n        return X\n```\n\n- the `seldon-core` Python package must be installed (we use `pipenv` to manage dependencies as discussed above and in the Appendix below); and,\n- the container starts by running the Seldon service using the `seldon-core-microservice` entry-point provided by the `seldon-core` package - both this and the point above can be seen the `DockerFile`,\n\n```docker\nFROM python:3.6-slim\nCOPY . /app\nWORKDIR /app\nRUN pip install pipenv\nRUN pipenv install\nEXPOSE 5000\n\n# Define environment variable\nENV MODEL_NAME MLScore\nENV API_TYPE REST\nENV SERVICE_TYPE MODEL\nENV PERSISTENCE 0\n\nCMD pipenv run seldon-core-microservice $MODEL_NAME $API_TYPE --service-type $SERVICE_TYPE --persistence $PERSISTENCE\n```\n\nFor the precise details refer to the [official Seldon documentation](https://docs.seldon.io/projects/seldon-core/en/latest/python/index.html). Next, build this image,\n\n```bash\ndocker build seldon-ml-score-component -t alexioannides/test-ml-score-seldon-api:latest\n```\n\nBefore we push this image to our registry, we need to make sure that it's working as expected. Start the image on the local Docker daemon,\n\n```bash\ndocker run --rm -p 5000:5000 -d alexioannides/test-ml-score-seldon-api:latest\n```\n\nAnd then send it a request (using a different request format to the ones we've used thus far),\n\n```bash\ncurl -g http://localhost:5000/predict \\\n    --data-urlencode 'json={\"data\":{\"names\":[\"a\",\"b\"],\"tensor\":{\"shape\":[2,2],\"values\":[0,0,1,1]}}}'\n```\n\nIf response is as expected (i.e. it contains the same payload as the request), then push the image,\n\n```bash\ndocker push alexioannides/test-ml-score-seldon-api:latest\n```\n\n### Deploying a ML Component with Seldon Core\n\nWe now move on to deploying our Seldon compatible ML component to a Kubernetes cluster and creating a fault-tolerant and scalable service from it. To achieve this, we will [deploy Seldon-Core using Helm charts](https://docs.seldon.io/projects/seldon-core/en/latest/workflow/install.html). We start by creating a namespace that will contain the `seldon-core-operator`, a custom Kubernetes resource required to deploy any ML model using Seldon,\n\n```bash\nkubectl create namespace seldon-core\n```\n\nThen we deploy Seldon-Core using Helm and the official Seldon Helm chart repository hosted at `https://storage.googleapis.com/seldon-charts`,\n\n```bash\nhelm install seldon-core-operator \\\n  --name seldon-core \\\n  --repo https://storage.googleapis.com/seldon-charts \\\n  --set usageMetrics.enabled=false \\\n  --namespace seldon-core\n```\n\nNext, we deploy the Ambassador API gateway for Kubernetes, that will act as a single point of entry into our Kubernetes cluster and will be able to route requests to any ML model we have deployed using Seldon. We will create a dedicate namespace for the Ambassador deployment,\n\n```bash\nkubectl create namespace ambassador\n```\n\nAnd then deploy Ambassador using the most recent charts in the official Helm repository,\n\n```bash\nhelm install stable/ambassador \\\n  --name ambassador \\\n  --set crds.keep=false \\\n  --namespace ambassador\n```\n\nIf we now run `helm list --namespace seldon-core` we should see that Seldon-Core has been deployed and is waiting for Seldon ML components to be deployed. To deploy our Seldon ML model scoring service we create a separate namespace for it,\n\n```bash\nkubectl create namespace test-ml-seldon-app\n```\n\nAnd then configure and deploy another official Seldon Helm chart as follows,\n\n```bash\nhelm install seldon-single-model \\\n  --name test-ml-seldon-app \\\n  --repo https://storage.googleapis.com/seldon-charts \\\n  --set model.image.name=alexioannides/test-ml-score-seldon-api:latest \\\n  --namespace test-ml-seldon-app\n```\n\nNote, that multiple ML models can now be deployed using Seldon by repeating the last two steps and they will all be automatically reachable via the same Ambassador API gateway, which we will now use to test our Seldon ML model scoring service.\n\n### Testing the API via the Ambassador Gateway API\n\nTo test the Seldon-based ML model scoring service, we follow the same general approach as we did for our first-principles Kubernetes deployments above, but we will route our requests via the Ambassador API gateway. To find the IP address for Ambassador service run,\n\n```bash\nkubectl -n ambassador get service ambassador\n```\n\nWhich will be `localhost:80` if using Docker Desktop, or an IP address if running on GCP or Minikube (were you will need to remember to use `minikuke service list` in the latter case). Now test the prediction end-point - for example,\n\n```bash\ncurl http://35.246.28.247:80/seldon/test-ml-seldon-app/test-ml-seldon-app/api/v0.1/predictions \\\n    --request POST \\\n    --header \"Content-Type: application/json\" \\\n    --data '{\"data\":{\"names\":[\"a\",\"b\"],\"tensor\":{\"shape\":[2,2],\"values\":[0,0,1,1]}}}'\n```\n\nIf you want to understand the full logic behind the routing see the [Seldon documentation](https://docs.seldon.io/projects/seldon-core/en/latest/workflow/serving.html), but the URL is essentially assembled using,\n\n```html\nhttp://\u003cambassadorEndpoint\u003e/seldon/\u003cnamespace\u003e/\u003cdeploymentName\u003e/api/v0.1/predictions\n```\n\nIf your request has been successful, then you should see a response along the lines of,\n\n```json\n{\n  \"meta\": {\n    \"puid\": \"hsu0j9c39a4avmeonhj2ugllh9\",\n    \"tags\": {\n    },\n    \"routing\": {\n    },\n    \"requestPath\": {\n      \"classifier\": \"alexioannides/test-ml-score-seldon-api:latest\"\n    },\n    \"metrics\": []\n  },\n  \"data\": {\n    \"names\": [\"t:0\", \"t:1\"],\n    \"tensor\": {\n      \"shape\": [2, 2],\n      \"values\": [0.0, 0.0, 1.0, 1.0]\n    }\n  }\n}\n```\n\n## Tear Down\n\nTo delete a single Seldon ML model and its namespace, deployed using the steps above, run,\n\n```bash\nhelm delete test-ml-seldon-app --purge \u0026\u0026\n  kubectl delete namespace test-ml-seldon-app\n```\n\nFollow the same pattern to remove the Seldon Core Operator and Ambassador,\n\n```bash\nhelm delete seldon-core --purge \u0026\u0026 kubectl delete namespace seldon-core\nhelm delete ambassador --purge \u0026\u0026 kubectl delete namespace ambassador\n```\n\nIf there is a GCP cluster that needs to be killed run,\n\n```bash\ngcloud container clusters delete k8s-test-cluster\n```\n\nAnd likewise if working with Minikube,\n\n```bash\nminikube stop\nminikube delete\n```\n\nIf running on Docker Desktop, navigate to `Preferences -\u003e Reset` to reset the cluster.\n\n## Where to go from Here\n\nThe following list of resources will help you dive deeply into the subjects we skimmed-over above:\n\n- the full set of functionality provided by [Seldon](https://www.seldon.io/open-source/);\n- running multi-stage containerised workflows (e.g. for data engineering and model training) using [Argo Workflows](https://argoproj.github.io/argo);\n- the excellent '_Kubernetes in Action_' by Marko Lukša [available from Manning Publications](https://www.manning.com/books/kubernetes-in-action);\n- '_Docker in Action_' by Jeff Nickoloff and Stephen Kuenzli [also available from Manning Publications](https://www.manning.com/books/docker-in-action-second-edition); and,\n- _'Flask Web Development'_ by Miguel Grinberg [O'Reilly](http://shop.oreilly.com/product/0636920089056.do).\n\nThis work was initially committed in 2018 and has since formed the basis of [Bodywork](https://www.bodyworkml.com) - an open-source MLOps tool for deploying machine learning projects developed in Python, to Kubernetes.\n\n## Appendix - Using Pipenv for Managing Python Package Dependencies\n\nWe use [pipenv](https://docs.pipenv.org) for managing project dependencies and Python environments (i.e. virtual environments). All of the direct packages dependencies required to run the code (e.g. Flask or Seldon-Core), as well as any packages that could have been used during development (e.g. flake8 for code linting and IPython for interactive console sessions), are described in the `Pipfile`. Their **precise** downstream dependencies are described in `Pipfile.lock`.\n\n### Installing Pipenv\n\nTo get started with Pipenv, first of all download it - assuming that there is a global version of Python available on your system and on the PATH, then this can be achieved by running the following command,\n\n```bash\npip3 install pipenv\n```\n\nPipenv is also available to install from many non-Python package managers. For example, on OS X it can be installed using the [Homebrew](https://brew.sh) package manager, with the following terminal command,\n\n```bash\nbrew install pipenv\n```\n\nFor more information, including advanced configuration options, see the [official pipenv documentation](https://docs.pipenv.org).\n\n### Installing Projects Dependencies\n\nIf you want to experiment with the Python code in the `py-flask-ml-score-api` or `seldon-ml-score-component` directories, then make sure that you're in the appropriate directory and then run,\n\n```bash\npipenv install\n```\n\nThis will install all of the direct project dependencies.\n\n### Running Python, IPython and JupyterLab from the Project's Virtual Environment\n\nIn order to continue development in a Python environment that precisely mimics the one the project was initially developed with, use Pipenv from the command line as follows,\n\n```bash\npipenv run python3\n```\n\nThe `python3` command could just as well be `seldon-core-microservice` or any other entry-point provided by the `seldon-core` package - for example, in the `Dockerfile` for the `seldon-ml-score-component` we start the Seldon-based ML model scoring service using,\n\n```bash\npipenv run seldon-core-microservice ...\n```\n\n### Pipenv Shells\n\nPrepending `pipenv` to every command you want to run within the context of your Pipenv-managed virtual environment, can get very tedious. This can be avoided by entering into a Pipenv-managed shell,\n\n```bash\npipenv shell\n```\n\nwhich is equivalent to 'activating' the virtual environment. Any command will now be executed within the virtual environment. Use `exit` to leave the shell session.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexioannides%2Fkubernetes-mlops","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falexioannides%2Fkubernetes-mlops","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexioannides%2Fkubernetes-mlops/lists"}