{"id":14958816,"url":"https://github.com/google-aai/tf-serving-k8s-tutorial","last_synced_at":"2025-04-13T09:05:13.164Z","repository":{"id":67595848,"uuid":"122511921","full_name":"google-aai/tf-serving-k8s-tutorial","owner":"google-aai","description":"A Tutorial for Serving Tensorflow Models using Kubernetes","archived":false,"fork":false,"pushed_at":"2025-03-11T16:22:29.000Z","size":2278,"stargazers_count":87,"open_issues_count":4,"forks_count":29,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-03-27T00:34:03.253Z","etag":null,"topics":["docker-image","google-cloud","keras-tensorflow","kubeflow","kubeflow-community","kubernetes","tensorflow-api","tensorflow-serving","tensorflow-tutorials"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google-aai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-02-22T17:32:24.000Z","updated_at":"2024-06-26T03:54:54.000Z","dependencies_parsed_at":"2024-09-22T08:00:53.413Z","dependency_job_id":null,"html_url":"https://github.com/google-aai/tf-serving-k8s-tutorial","commit_stats":{"total_commits":33,"total_committers":2,"mean_commits":16.5,"dds":0.06060606060606055,"last_synced_commit":"058110c0c88757c0d81dd19f435f3f47c171a537"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-aai%2Ftf-serving-k8s-tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-aai%2Ftf-serving-k8s-tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-aai%2Ftf-serving-k8s-tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-aai%2Ftf-serving-k8s-tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google-aai","download_url":"https://codeload.github.com/google-aai/tf-serving-k8s-tutorial/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248688576,"owners_count":21145766,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker-image","google-cloud","keras-tensorflow","kubeflow","kubeflow-community","kubernetes","tensorflow-api","tensorflow-serving","tensorflow-tutorials"],"created_at":"2024-09-24T13:18:20.622Z","updated_at":"2025-04-13T09:05:13.144Z","avatar_url":"https://github.com/google-aai.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Deploying Tensorflow Serving using Kubernetes Tutorial\n\n### What this project covers\nThis project is a tutorial that walks through the steps required to deploy and\n[serve a TensorFlow model](https://www.tensorflow.org/serving/serving_basic)\nusing [Kubernetes (K8s)](https://kubernetes.io/). The latest version runs the\nentire exercise on a K8s cluster using [KubeFlow](kubeflow.org).\n\nThe key concepts covered in this tutorial are as follows:\n\n* Convert a TensorFlow(TF) graph trained through various APIs into a servable\nmodel\n* Serve the model using Tensorflow Serving\n* Send online prediction requests to the cluster via a client. Profile latency\nand throughput. Experiment with different batch sizes.\n* Visualize image pixel contributions to model predictions to explain how the\nmodel sees the world.\n\n## Setup\n\nBefore the fun begins, we need to setup our environment and K8s clusters. \n\nAs a developer, it often helps to create a local deployment prior to deploying\non the cloud. We offer some guides on setting up a local K8s cluster, as well\nas setting up a cluster on Google Cloud Platform (GCP) below. You can also\nexperiment with this tutorial on [Amazon EKS](https://aws.amazon.com/eks/) or\n[Microsoft AKS](https://azure.microsoft.com/en-us/services/kubernetes-service/).\nFeedback is welcomed!\n\n* [Local: Minikube](LOCAL_SETUP.md)\n* [Cloud: Google K8s Engine](GKE_SETUP.md)\n\n### Additional Required Software\n\n* [Python 2.7+](https://www.python.org/): The TensorFlow Serving API currently\nruns in Python 2, so you will need to make client requests to your model server\nusing Python 2.\n* [Docker](https://www.docker.com/): to build images that can be deployed on K8s\n* [Git](https://git-scm.com/): so you can access this project and other projects\n* [Pip](https://pip.pypa.io/en/stable/installing/): to install Python packages\nrequired for the tutorial\n* [Ksonnet](https://github.com/ksonnet/ksonnet): follow the install instructions\nin github\n\n## Deploy KubeFlow onto your cluster\n\n[KubeFlow](kubeflow.org) is always in an active stage of development. For this\nexercise, we will be using a stable tag v0.2.5. Run the following command to deploy\nKubeFlow onto your Kubernetes cluster:\n\n```\ncd ~\ncurl https://raw.githubusercontent.com/kubeflow/kubeflow/v0.2.5/scripts/deploy.sh | bash\n```\n\nNext, run the following command to check on your pods and services. The services\nshould be up almost immediately, but pods may take longer to create.\n\n```\nkubectl get svc\n```\n\n```\nkubectl get pods\n```\n\nOnce your pods are all up and running, continue.\n\n**Troubleshooting:** see [here](TROUBLESHOOTING.md#kubeflow-install).\n\n## Accessing JupyterHub and Spinning up a Notebook\n\n![notebook everywhere](img/notebook_everywhere.png)\n\nJupyterHub is a popular K8s notebook spawner that can spin up Jupyter\nservers on demand for an entire team of data scientists, each server having its\nisolated set of customized resources (disk, memory, accelerators).\nWhen a data scientist is finally done training and fine-tuning a model for\nproduction, it is important to be able to export this model for serving. The\nfollowing exercises will be running through how to export a model from a local\nJupyter server into a location that K8s can access and serve using the\nTensorFlow Serving app.\n\nTo access your Jupyter notebook environment, you will need to use port forwarding\nfrom your local computer. \n\n```\nJUPYTER_POD=`kubectl get pods --selector=\"app=tf-hub\" --output=template --template=\"{{with index .items 0}}{{.metadata.name}}{{end}}\"`\nkubectl port-forward ${JUPYTER_POD} 8000:8000\n```\n\nGo to your browser and access JupyterHub here:\n\n```\nlocalhost:8000\n```\n\nYou should see a login screen like this:\n![jupyter login](img/jupyterhub_login.png)\n\nType in any username and password to spin up a local environment (Note: if the\nusername does not exist, a new Jupyter server will be created for that user\nwith its own isolated resources. If the username already exists, the password\nwill be required to match the one used to create the server, and you will end up\nin the previously created server environment.)\n\nThe next page, if you are creating a new environment, should have a drop down\nmenu with images, and separate CPU, memory, and extra resource text boxes.\nIn the dropdown menu, select the TensorFlow 1.8 CPU image:\n\n![jupyterhub tf 1.8 cpu](img/jupyterhub_kubeflow_img.png)\n\nSet CPU and memory to whatever your cluster can handle. 1 CPU and 2Gi memory is\nsufficient, but if you are on a cluster and can afford more resources, more CPU\nand memory will allow you to spin up and run notebooks faster without having to\nshut down notebook kernels. (The minikube setup used above will probably only\nallow 1 CPU for Jupyter without interfering with the model serving process.)\n\nThe notebook server may take a few minutes to spin up depending on cluster\nresources, so be patient. Once it has spun up, go into the work directory,\nand upload the local notebook `./jupyter/download_models_and_notebooks.ipynb`\ninto this directory. In your notebook, run all the cells to download the\nResnet50 models, project notebooks, and library dependencies required for the\nnext part of this exercise.\n\n## Create a Servable Resnet Model from Estimator and Keras APIs\n\n### Motivation\n\nYour data scientists have used JupyterHub to scale out training and exploration\nof datasets, and have finally built a few lovely models that meet performance\nrequirements. It is now time to serve models live in production! The first step\nis to take their trained models, at whatever checkpoint they deem most optimal,\nand package it for optimal serving performance using\n[TensorFlow Serving](https://www.tensorflow.org/serving/).\n\nAs a data scientist or production engineer, it helps to understand how to bridge\nthe gap between training and serving. The following exercises will guide you on\ndoing so.\n\n### Exercises\n\n[Run the exercises here](SERVABLE_MODEL.md) (solutions included).\n\n\n## Serving Your Model\n\nNow that you've packaged your data scientists' models into a servable format,\nyou're faced with an issue: JupyterHub by default, keeps each Jupyter\nenvironment and resources separate from all other environments (for good\nreason). In K8s, this works as follows:\n\nJupyterHub creates a\n[persistent volume (pv)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)\nand attaches a\n[persistent volume claim (pvc)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims)\nto that volume which only allows the user who spawned the notebook to access\nthat volume. (To be more precise, the pv and pvc are attached to the \n[pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/) that is used to\nhold the user's requested cluster resources and isolate the user's work\nenvironment). Unfortunately, no other users/pods are allowed to access this\nenvironment without setting permissions without reconfiguring the volume to\naccept multiple pods to read and write.\n\nFortunately, we have two options to handle this: by changing options for PVs\n(general solution that works on most if not all clusters), or by serving the\nmodel from a cloud storage system (s3, gs, etc.). The steps in the below\nsection will guide you through the process for serving on prem or on cloud.\n\n### Setting up TF Serving for your Model\n\nTo deploy TensorFlow Serving with KubeFlow, we will need a number of steps.\nFirst cd into your `kubeflow-ks-app` directory, and generate a tf-serving\ntemplate fron the KubeFlow repository.\n\n```\nks generate tf-serving my-tf-serving\n```\n\n`tf-serving` is the KubeFlow template name, and `my-tf-serving` is the\napp name (which you can name whatever you want).\n\nNext, we will need to configure the app using [ksonnet](ksonnet.io). Follow the\ntwo options below depending on where you running this exercise:\n\n[Follow the instructions here](SERVING_ON_PREM.md) for a general TensorFlow\nServing deployment using KubeFlow for local (minikube) and on-prem clusters.\n\n[Follow the instructions here](SERVING_ON_GKE.md) for serving on the Google\nCloud Platform using GKE. Note that the on-prem instructions above also work\nfor GKE, but serving your model from cloud storage offers flexibility to\nsecurely share your models with other users, projects, and k8s clusters, in case\nthere is a need to scale out to other environments.\n\n## TF Client\n\n### Setup\n\n**Note:** If you choose to use Google cloud to host a client, you can spin up\nthe VM using [compute engine](https://cloud.google.com/compute/), ssh into it,\nand run a [setup script](gcp/setup_client_vm.sh) to deploy all required\nlibraries. There's no need to follow the rest of the steps in the setup steps\nbelow. However, you may want to change your `my-tf-serving` application type to\n`LoadBalancer` to ensure that your VM can reach it, since you are not running\nkubectl from the VM. For simplicity and best practice however, we recommend\nrunning your client locally if possible.\n\nThe following steps below are for setting up a client locally.\n\nSetup port forwarding on port 9000 to enable secure access to your model server\nvia a local port:\n\n```\nSERVING_POD=`kubectl get pods --selector=\"app=my-tf-serving\" --output=template --template=\"{{with index .items 0}}{{.metadata.name}}{{end}}\"`\nkubectl port-forward ${SERVING_POD} 9000:9000\n```\n\nSince your port forwarding must be kept alive, open up a new terminal shell to\ncontinue.\n\nFirst, create a *Python 2* virtual environment.\n\n```\ndeactivate  # Run only if you need to exit the Python 3 virtual environment.\nvirtualenv \u003cpath/to/client-virtualenv\u003e\nsource \u003cpath/to/client-virtualenv\u003e/bin/activate\n```\n\ncd into the tutorial project directory (`tf-serving-k8s-tutorial`), and run:\n\n```\npip install -r client_requirements.txt\n```\n\nThen cd into the client directory within the tutorial project:\n\n```\ncd client\n```\n\nTo test the client, make sure you have your shell variable pointing to the type\nof model you generated. This is because the default pre-trained models for\nEstimator and Keras APIs have classes that are labeled slightly differently,\nso the client needs to interpret the results that it receives from the models.\nIf the `MODEL_TYPE` variable is unset, set it again:\n\n```\nMODEL_TYPE=\u003cestimator | keras\u003e  # choose one\n```\n\nThen enter the command (note that 127.0.0.1 is used below assuming\nyou have set up port forwarding with kubectl):\n\n```\npython resnet_client.py \\\n--server 127.0.0.1 \\\n--port 9000 \\\n--model_type ${MODEL_TYPE} \\\ncat_sample.jpg\n```\n\nThe server should return a list of the 5 top classes and confidences\n(probabilities) that the model thinks the image belongs to each class.\n\nYou can also batch predict by specifying multiple image paths/urls. For example:\n\n```\npython resnet_client.py \\\n--server 127.0.0.1 \\\n--port 9000 \\\n--model_type ${MODEL_TYPE} \\\ncat_sample.jpg \\\n\"https://www.popsci.com/sites/popsci.com/files/styles/1000_1x_/public/images/2017/09/depositphotos_33210141_original.jpg?itok=MLFznqbL\u0026fc=50,50\"\n```\n\nwill do batch prediction on a local cat image and a squirrel image online.\n\nCongratulations!\n\n![you just got served](img/you_got_served.png)\n\n### Profiling Latency\n\n**Exercise:**: Look at the profiler [resnet_profiler.py](client/resnet_profiler.py).\nTry sending requests with the profiler and compute latency and throughput.\nLatency is simply the round trip delay returned by the profiler. An\napproximation of throughput is the number of batches divided by latency when a\nlarge batch size is used. Try varying batch sizes between 1 and 256.\nHere is an example configuration:\n\n```\npython resnet_profiler.py \\\n--server 127.0.0.1 \\\n--port 9000 \\\n--model_type ${MODEL_TYPE} \\\n--replications 16 \\\n--num_trials 10 \\\ncat_sample.jpg\n```\n\nWhat do you notice about CPU/GPU performance with different batch sizes?\n\n**Remark:** Profiling is a very important step when you are trying to setup a\nrobust server. GPUs are great performers, but stop providing gains after a\ncertain batch size. Furthermore, servers can run out of memory, in which case TF\nserving often crashes silently, i.e. `kubectl logs \u003cpod\u003e` won't return anything\nuseful. When you deploy your own Kubernetes system, you will need to ensure that\nyour machine can load your model and process requested batch sizes.\n\n## Model Understanding and Visualization\n\nAs a bonus feature, we offer ways to validate a served model through\nvisualization! Run this notebook:\n\n```\nresnet_model_understanding.ipynb\n```\n\nThe notebook runs a visualization of pixels that are important in determining\nthat the image belongs to a particular class. More specifically, the visible\npixels correspond to the highest partial derivatives of the logit of the most\nprobable class as a function of each pixel, integrated over a path of image\npixels from a blank image (e.g. all grey pixels) to the actual image.\nThe visualization is based on a recent research paper by M. Sundararajan,\n A. Taly, and Q. Yan: [Axiomatic Attribution for Deep Networks](https://arxiv.org/pdf/1703.01365.pdf).\n\n## General Disclaimers and Pitfalls\n\n* TensorFlow Server is written in c++, so any Tensorflow code in your\nmodel that has python libraries embedded in it (e.g. using tf.py_func()) will\nFAIL! Make sure that your entire TensorFlow graph that is being imported, runs\nin the TensorFlow c++ environment.\n* Security is an issue! If you decide not to use port forwarding and instead\nreconfigure your `my-tf-serving` k8s service to `LoadBalancer` on a Cloud k8s\nengine, make sure to enable identity aware control for your K8s cluster,\nsuch as Google Cloud IAP.\n[See the Kubeflow IAP documentation](https://github.com/kubeflow/kubeflow/blob/master/docs/gke/iap.md)\nfor more information. \n\n## Additional KubeFlow Resources\n\nTo better appreciate KubeFlow, it is important to have a firm grasp of \n[k8s](kubernetes.io), and a good understanding of how to use\n[ksonnet](https://ksonnet.io) to create apps and deployments on k8s. \n \nFor more KubeFlow examples, go to [kubeflow.org](kubeflow.org). Also,\n[join the community](https://github.com/kubeflow/community) for discussions,\nupdates, or ways to help contribute and improve the project.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-aai%2Ftf-serving-k8s-tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle-aai%2Ftf-serving-k8s-tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-aai%2Ftf-serving-k8s-tutorial/lists"}