{"id":13599221,"url":"https://github.com/mrdbourke/cs329s-ml-deployment-tutorial","last_synced_at":"2025-04-04T20:15:38.787Z","repository":{"id":49325041,"uuid":"337292051","full_name":"mrdbourke/cs329s-ml-deployment-tutorial","owner":"mrdbourke","description":"Code and files to go along with CS329s machine learning model deployment tutorial.","archived":false,"fork":false,"pushed_at":"2022-11-12T18:37:21.000Z","size":70376,"stargazers_count":605,"open_issues_count":5,"forks_count":184,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-03-28T19:12:23.546Z","etag":null,"topics":["cloud-services","deployment-tutorial","food-vision","google-cloud","machine-learning","machine-learning-deployment","machine-learning-tutorial","tensorflow"],"latest_commit_sha":null,"homepage":"https://youtu.be/fw6NMQrYc6w","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mrdbourke.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-02-09T04:29:26.000Z","updated_at":"2025-03-17T14:29:03.000Z","dependencies_parsed_at":"2022-08-29T04:40:28.697Z","dependency_job_id":null,"html_url":"https://github.com/mrdbourke/cs329s-ml-deployment-tutorial","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrdbourke%2Fcs329s-ml-deployment-tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrdbourke%2Fcs329s-ml-deployment-tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrdbourke%2Fcs329s-ml-deployment-tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrdbourke%2Fcs329s-ml-deployment-tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mrdbourke","download_url":"https://codeload.github.com/mrdbourke/cs329s-ml-deployment-tutorial/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247242681,"owners_count":20907134,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cloud-services","deployment-tutorial","food-vision","google-cloud","machine-learning","machine-learning-deployment","machine-learning-tutorial","tensorflow"],"created_at":"2024-08-01T17:01:01.025Z","updated_at":"2025-04-04T20:15:33.780Z","avatar_url":"https://github.com/mrdbourke.png","language":"Jupyter Notebook","readme":"# CS329s Machine Learning Model Deployment Tutorial\n\n**Warning:** Following the steps of what's in here may cost you money (Google Cloud is a paid service), be sure to shut down any Google Cloud service you no longer need to use to avoid charges.\n\n**Thank you to:** [Mark Douthwaite's incredible ML + software engineering blog](https://mark.douthwaite.io/), [Lj Miranda's amazing post on software engineering tools for data scientists](https://ljvmiranda921.github.io/notebook/2020/11/15/data-science-swe/), [Chip Huyen](https://huyenchip.com/) and Ashik Shafi's gracious feedback on the raw materials of this tutorial.\n\n## What is in here?\n\nCode and files to go along with [CS329s machine learning model deployment tutorial](https://stanford-cs329s.github.io/syllabus.html).\n\n* Watch the [video tutorial on YouTube](https://youtu.be/fw6NMQrYc6w)\n* See the [slides](https://github.com/mrdbourke/cs329s-ml-deployment-tutorial/blob/main/CS329s-deploying-ml-models-tutorial.pdf)\n* Get the [model training code](https://github.com/mrdbourke/cs329s-ml-deployment-tutorial/blob/main/model_training.ipynb)\n\n## What do I need to get started?\n\n* A [Google Cloud account](https://cloud.google.com/gcp) and a [Google Cloud Project](https://cloud.google.com/resource-manager/docs/creating-managing-projects)\n* [Google Cloud SDK installed](https://cloud.google.com/sdk/docs/install) (gcloud CLI utitly)\n* Trained [machine learning model(s)](https://github.com/mrdbourke/cs329s-ml-deployment-tutorial/blob/main/model_training.ipynb), our app uses an image classification model trained on a number of different classes of food from [Food101 dataset](https://www.kaggle.com/dansbecker/food-101)\n* [Docker installed](https://docs.docker.com/get-docker/)\n\n**Warning (again):** Using Google Cloud services costs money. If you don't have credits (you get $300USD when you first sign up), you will be charged. Delete and shutdown your work when finished to avoid charges.\n\n## What will I end up with?\n\nIf you go through the steps below without fail, you should end up with a [Streamlit](http://streamlit.io/)-powered web application (Food Vision 🍔👁) for classifying images of food (deployed on Google Cloud if you want).\n\nOur app running locally making a prediction on an image of ice cream (using a machine learning model deployed on Google Cloud):\n![food vision demo](https://github.com/mrdbourke/cs329s-ml-deployment-tutorial/raw/main/images/food-vision-demo-cropped.gif)\n\n## Okay, I'm in, how can I use it?\n\nWe're going to tackle this in 3 parts:\n1. Getting the app running (running Streamlit on our local machines)\n2. Deploying a machine learning model to AI Platform (getting Google Cloud to host one of our models)\n3. Deploying our app to App Engine (getting our app on the internet)\n\n### 1. Getting the app running\n\n1. Clone this repo\n```\ngit clone https://github.com/mrdbourke/cs329s-ml-deployment-tutorial\n```\n\n2. Change into the `food-vision` directory\n```\ncd food-vision\n```\n\n3. Create and activate a virtual environment (call it what you want, I called mine \"env\")\n```\npip install virtualenv\nvirtualenv \u003cENV-NAME\u003e\nsource \u003cENV-NAME\u003e/bin/activate\n```\n4. Install the required dependencies (Streamlit, TensorFlow, etc)\n```\npip install -r requirements.txt\n```\n5. Activate Streamlit and run `app.py`\n```\nstreamlit run app.py\n``` \nRunning the above command should result in you seeing the following:\n![](https://raw.githubusercontent.com/mrdbourke/cs329s-ml-deployment-tutorial/main/images/streamlit-app-what-you-should-see.png)\n\nThis is Food Vision 🍔👁 the app we're making.\n\n6. Try an upload an image (e.g. one of the ones in [`food-images/`](https://github.com/mrdbourke/cs329s-ml-deployment-tutorial/tree/main/food-images) such as [`ice_cream.jpeg`](https://github.com/mrdbourke/cs329s-ml-deployment-tutorial/blob/main/food-images/ice_cream.jpeg) and it should load.\n\n7. Notice a \"Predict\" button appears when you upload an image to the app, click it and see what happens.\n\n8. The app breaks because it tries to contact Google Cloud Platform (GCP) looking for a machine learning model and it either:\n * won't be able to find the model (wrong API call or the model doesn't exist)\n * won't be able to use the existing model because the credentials are wrong (seen below)\n![credential error](https://raw.githubusercontent.com/mrdbourke/cs329s-ml-deployment-tutorial/main/images/streamlit-app-first-error-youll-run-into.png)\n \nThis is a good thing! It means our app is trying to contact GCP (using functions in `food-vision/app.py` and `food-vision/utils.py`). \n\nNow let's learn how to get a model hosted on GCP.\n\n### 2. Getting a machine learning model hosted on GCP\n \n\u003e How do I fix this error? (Streamlit can't access your model) \n\nTo fix it, we're going to need a couple of things:\n* A trained machine learning model (suited to our problem, we'll be uploading this to Google Storage)\n* A Google Storage bucket (to store our trained model)\n* A hosted model on Google AI Platform (we'll connect the model in our Google Storage bucket to here)\n* A service key to access our hosted model on Google AI Platform\n\nLet's see how we'll can get the above.\n\n1. To train a machine learning model and save it in the [`SavedModel`](https://www.tensorflow.org/guide/saved_model) format (this TensorFlow specific, do what you need for PyTorch), we can follow the steps in [`model_training.ipynb`](https://github.com/mrdbourke/cs329s-ml-deployment-tutorial/blob/main/model_training.ipynb).\n\n2. Once we've got a `SavedModel`, we'll upload it Google Storage but before we do that, we'll need to [create a Google Storage Bucket](https://cloud.google.com/storage/docs/creating-buckets) (a bucket is like a hard drive on the cloud).\n\n![creating a bucket on google cloud](https://raw.githubusercontent.com/mrdbourke/cs329s-ml-deployment-tutorial/main/images/gcp-creating-a-bucket.png)\n\nCall your bucket whatever you like (e.g. my_cool_bucket_name). You'll want to store your data in a region which is either closest to you or wherever you're allowed to store data (if this doesn't make sense, store it in the US).\n\n3. With a bucket created, we can [copy our model to the bucket](https://cloud.google.com/storage/docs/uploading-objects#gsutil).\n```\n## Uploading a model to Google Storage from within Colab ##\n\n# Authorize Colab and initalize gcloud (enter the appropriate inputs when asked)\nfrom google.colab import auth\nauth.authenticate_user()\n!curl https://sdk.cloud.google.com | bash\n!gcloud init\n\n# Upload SavedModel to Google Storage Bucket\n!gsutil cp -r \u003cYOUR_MODEL_PATH\u003e \u003cYOUR_GOOGLE_STORAGE_BUCKET\u003e\n```\n\n4. [Connect model in bucket to AI Platform](https://cloud.google.com/ai-platform/prediction/docs/deploying-models) (this'll make our model accessible via an API call, if you're not sure what an API call is, imagine writing a function that could trigger our model from anywhere on the internet)\n * Don't like clicking around Google Cloud's console? You can also [use `gcloud` to create a model in AI Platform](https://cloud.google.com/sdk/gcloud/reference/ai-platform/models/create) on the command line \n* Create a model on AI Platform (choose a region which is closest to you or where you'd like your model to be accessed from):\n![creating a model on AI Platform](https://raw.githubusercontent.com/mrdbourke/cs329s-ml-deployment-tutorial/main/images/gcp-creating-a-model-on-ai-platform.png)\n* Once you've got a model on AI Platform (above), you'll need to create a model version which matches up with what your model was trained with (e.g. choose TensorFlow if your model is trained with TensorFlow):\n![creating a model version on AI Platform](https://raw.githubusercontent.com/mrdbourke/cs329s-ml-deployment-tutorial/main/images/gcp-creating-a-model-version.png)\n* And then link your model version to your trained model in Google Storage:\n![linking a model version to Google Storage](https://raw.githubusercontent.com/mrdbourke/cs329s-ml-deployment-tutorial/main/images/gcp-connecting-a-model-version-to-google-storage.png)\n\n5. Create a [service account to access AI Platform](https://cloud.google.com/iam/docs/creating-managing-service-accounts) (GCP loves permissions, it's for the security of your app)\n * You'll want to make a service account with permissions to use the \"ML Engine Developer\" role\n\n![ml developer role permission](https://raw.githubusercontent.com/mrdbourke/cs329s-ml-deployment-tutorial/main/images/gcp-ml-engine-permissions.png)\n\n6. Once you've got an active service account, [create and download its key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys) (this will come in the form of a .JSON file)\n * 🔑 **Note:** Service keys grant access to your GCP account, keep this file private (e.g add `*.json` to your `.gitignore` so you don't accidently add it to GitHub)\n\n7. Update the following variables:\n * In `app.py`, change the existing GCP key path to your key path:\n```\n# Google Cloud Services look for these when your app runs\n\n# Old\nos.environ[\"GOOGLE_APPLICATION_CREDENTIALS\"] = \"daniels-dl-playground-4edbcb2e6e37.json\"\n\n# New \nos.environ[\"GOOGLE_APPLICATION_CREDENTIALS\"] = \"\u003cPATH_TO_YOUR_KEY\u003e\"\n```\n * In `app.py`, change the GCP project and region to your GCP project and region\n```\n# Old\nPROJECT = \"daniels-dl-playground\"\nREGION = \"us-central1\" \n\n# New\nPROJECT = \"\u003cYOUR_GCP_PROJECT_NAME\u003e\"\nREGION = \"\u003cYOUR_GCP_REGION\u003e\"\n```\n * In `utils.py`, change the `\"model_name\"` key of `\"model_1\"` to your model name:\n ```\n # Old\n classes_and_models = {\n    \"model_1\": {\n        \"classes\": base_classes,\n        \"model_name\": \"efficientnet_model_1_10_classes\" \n    }\n }\n \n # New\n  classes_and_models = {\n    \"model_1\": {\n        \"classes\": base_classes,\n        \"model_name\": \"\u003cYOUR_AI_PLATFORM_MODEL_NAME\u003e\" \n    }\n }\n```\n\n8. Retry the app to see if it works (refresh the Streamlit app by pressing R or refreshing the page and then reupload an image and click \"Predict\")\n\n![what you'll see when you click the predict button and your model is hosted correctly](https://raw.githubusercontent.com/mrdbourke/cs329s-ml-deployment-tutorial/main/images/streamlit-predict-button-clicked.png)\n  \n### 3. Deploying the whole app to GCP\n\n\u003e Okay, I've fixed the permissions error, how do I deploy my model/app?\n \nI'm glad you asked...\n \n1. run `make gcloud-deploy`... wait 5-10 mins and your app will be on App Engine (as long as you've activated the App Engine API)\n\n...and you're done\n \n\u003e But wait, what happens when you run `make gcloud-deploy`?\n\nWhen you run `make gcloud-deploy`, the `gcloud-deploy` command within the Makefile ([`food-vision/Makefile`](https://github.com/mrdbourke/cs329s-ml-deployment-tutorial/blob/main/food-vision/Makefile)) gets triggered. \n\n`make gcloud-deploy` is actually an alias for running:\n\n```\ngcloud app deploy app.yaml\n```\n\nThis is `gcloud`'s way of saying \"Hey, Google Cloud, kick off the steps you need to do to get our locally running app (`food-vision/app.py`) running on App Engine.\"\n\nTo do this, the `gcloud app deploy` command does a number of things:\n* Our app is put into a [Docker container](https://www.docker.com/resources/what-container) defined by [`[food-vision/Dockerfile]`](https://github.com/mrdbourke/cs329s-ml-deployment-tutorial/blob/main/food-vision/Dockerfile) (imagine a Docker container as a box which contains our locally running app and everything it needs to run, once it's in the box, the box can be run anywhere Docker is available and it should work and the Dockerfile defines how the container should be created).\n* Once the Docker container is created, it becomes a Docker image (confusing, I know but think of a Docker image as an immutable Docker container, e.g. it won't change when we move it somewhere).\n* The Docker image is then uploaded to [Google Container Registry (GCR)](https://cloud.google.com/container-registry), Google's place for hosting Docker images.\n* Once our Docker image is hosted on GCR, it gets deployed to an App Engine instance (think a computer just like ours but running online, where other people can access it).\n* The App Engine instance is defined by the instructions in [`food-vision/app.yaml`](https://github.com/mrdbourke/cs329s-ml-deployment-tutorial/blob/main/food-vision/app.yaml), if you check out this file you'll notice it's quite simple, it has two lines:\n```\nruntime: custom # we want to run our own custom Docker container\nenv: flex # we want our App Engine to be flexible and install our various dependencies (in requirements.txt)\n```\n\nSeems like a lot right?\n\nAnd it is, but once you've had a little practice which each, you'll start to realise there's a specific reason behind each of them.\n\nIf all the steps executed correctly, you should see your app running live on App Engine under a URL similar to:\n\n```\nhttp://\u003cYOUR_PROJECT_NAME\u003e.ue.r.appspot.com/\n```\n\nWhich should look exactly like our app running locally!\n\n![our streamlit app running on App Engine](https://raw.githubusercontent.com/mrdbourke/cs329s-ml-deployment-tutorial/main/images/streamlit-app-on-app-engine.png)\n \n## Breaking down `food-vision`\n\n\u003e What do all the files in `food-vision` do?\n\nThere's a bunch of files in our [`food-vision` directory](https://github.com/mrdbourke/cs329s-ml-deployment-tutorial/tree/main/food-vision) and seeing them for the first time can be confusing. So here's a quick one-liner for each.\n\n* `.dockerignore` - files/folders to ignore when are Docker container is being created (similar to how `.gitignore` tells what files/folders to ignore when committing.\n* `Dockerfile` - instructions for how our Docker container (a box with all of what our app needs to run) should be created.\n* `Makefile` - a handy script for executing commands like `make gcloud-deploy` on the command which run larger commands (this saves us typing large commands all the time, see [What is a Makefile?](https://www.google.com/search?client=safari\u0026rls=en\u0026q=what+is+a+makefile\u0026ie=UTF-8\u0026oe=UTF-8) for more).\n* `SessionState.py`- a Python script to help our Streamlit app maintain state (not delete everything) when we a click a button, see the [Streamlit forums for more](https://discuss.streamlit.io/t/is-there-any-working-example-for-session-state-for-streamlit-version-0-63-1/4551/2).\n* `app.py` - our Food Vision 👁🍔 app built with [Streamlit](http://streamlit.io/).\n* `app.yaml` - the instructions for what type of instance App Engine should create when we deploy our app.\n* `requirements.txt`- all of the dependencies required to run `app.py`.\n* `utils.py` - helper functions used in `app.py` (this prevents our app from getting too large).\n\n## Where else your app will break\n\nDuring the tutorial (see [timestamp 1:32:31](https://youtu.be/fw6NMQrYc6w?t=5551)), we saw the app we've deployed is far from perfect and we saw a couple of places where our app will break, but there's one more:\n\nThe default app (the on you'll get when you clone the repo) works with 3 models:\n * Model 1: 10 food classes from [Food101](https://www.kaggle.com/dansbecker/food-101).\n * Model 2: 11 food classes from Food101.\n * Model 3: 11 food classes Food101 + 1 not_food class (random images from ImageNet).\n \nAll of these models can be trained using [`model_training.ipynb`](https://github.com/mrdbourke/cs329s-ml-deployment-tutorial/blob/main/model_training.ipynb), however, if you do have access to all 3, your app will break if you choose anything other than Model 1 in the sidebar (the app requires at least 1 model to run).\n\n## Learn more\n\n\u003e Where can I learn all of this?\n\nJust like there's an infinite way you can construct deep learning neural networks with different layers, what we've done here is only *one* way you can deploy machine learning models/applications with Google Cloud (other cloud services have similar offerings as well).\n\nIf you'd like to learn more about Google Cloud, I'd recommend [Google's Qwiklabs](https://google.qwiklabs.com/), here you'll get hands-on experience using Google Cloud for different uses-cases (all for free).\n\nIf you'd like more about how software engineering crosses over with machine learning, I'd recommend the following blogs:\n\n* LJ Miranda's [How to improve software engineering skills as a researcher](https://ljvmiranda921.github.io/notebook/2020/11/15/data-science-swe/) \n* Mark Douthwaite's [software engineering and machine learning blog](https://mark.douthwaite.io/)\n\nFor more on the concept of the \"data flywheel\" (discussed during the tutorial), check out Josh Tobin's talk [A Missing Link in the Machine Learning Infrastrcuture Stack](https://youtu.be/o4q_ljRkXqw).\n\n## Extensions\n\n\u003e How can I extend this app?\n\n**CI/CD** - you'll hear this a lot when you start building and shipping software. It stands for \"continuous integration/continuous delivery\". I think of it like this, say you make a change to your app and you'd like to push it to your users immediately, you could have a service such as [GitHub Actions](https://github.com/features/actions) watch for changes in your GitHub repo. If a change occurs on a certain branch, GitHub Actions performs steps very similar to what we've done here and redeploys your (updated) app automatically.\n * Mark Douthwaite has a great blog post on [CI/CD with GitHub Actions](https://mark.douthwaite.io/continuous-training-and-delivery/).\n\n**Codify everything!** - when deploying our app, we did a lot of clicking around the Google Cloud console, however you can do all of what we did using the [`gcloud` SDK](https://cloud.google.com/sdk), this means you could automate everything we've done and make the whole process far less manual!\n\n## Questions?\n\nStart a [discussion](https://github.com/mrdbourke/cs329s-ml-deployment-tutorial/discussions) or send me a message: daniel at mrdbourke dot com.\n","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmrdbourke%2Fcs329s-ml-deployment-tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmrdbourke%2Fcs329s-ml-deployment-tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmrdbourke%2Fcs329s-ml-deployment-tutorial/lists"}