{"id":24081727,"url":"https://github.com/sagor0078/fastapi-mlops-docker-k8s","last_synced_at":"2026-04-13T03:03:24.254Z","repository":{"id":270931588,"uuid":"906266032","full_name":"Sagor0078/fastapi-mlops-docker-k8s","owner":"Sagor0078","description":"This application demonstrates deploying machine learning models using FastAPI, Docker, and Kubernetes. It includes background task processing with Celery and Redis, and provides endpoints for making predictions and retrieving results. The application uses the Breast Cancer Wisconsin (Diagnostic) dataset for model training and evaluation","archived":false,"fork":false,"pushed_at":"2025-01-21T09:13:27.000Z","size":841,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-21T10:21:58.122Z","etag":null,"topics":["background-jobs","celery","docker","docker-compose","fastapi","k8s","k8s-cluster","minikube","minikube-cluster","ml","mlops","pylint","pytest","python3","redis","unit-testing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Sagor0078.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-20T14:12:50.000Z","updated_at":"2025-01-21T09:13:31.000Z","dependencies_parsed_at":"2025-01-04T19:45:20.070Z","dependency_job_id":null,"html_url":"https://github.com/Sagor0078/fastapi-mlops-docker-k8s","commit_stats":null,"previous_names":["sagor0078/fastapi-mlops-docker-k8s"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sagor0078%2Ffastapi-mlops-docker-k8s","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sagor0078%2Ffastapi-mlops-docker-k8s/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sagor0078%2Ffastapi-mlops-docker-k8s/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sagor0078%2Ffastapi-mlops-docker-k8s/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Sagor0078","download_url":"https://codeload.github.com/Sagor0078/fastapi-mlops-docker-k8s/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240958402,"owners_count":19884906,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["background-jobs","celery","docker","docker-compose","fastapi","k8s","k8s-cluster","minikube","minikube-cluster","ml","mlops","pylint","pytest","python3","redis","unit-testing"],"created_at":"2025-01-09T23:25:52.495Z","updated_at":"2026-04-13T03:03:24.192Z","avatar_url":"https://github.com/Sagor0078.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Deploying ML models using FastAPI, Docker and k8s\n\n## Application View\n\n[![Directory docs](img/backend_view.png)](https://github.com/Sagor0078/fastapi-mlops-docker-k8s)\n\n\n\n## Dataset\n\nFor this repo, we are going to work with the following dataset:\n\nhttps://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)\n\nFeatures are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.\nn the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: \"Robust Linear Programming Discrimination of Two Linearly Inseparable Sets\", Optimization Methods and Software 1, 1992, 23-34].\n\n### Attribute Information:\n\n1) ID number\n2) Diagnosis (M = malignant, B = benign)\n\nTen real-valued features are computed for each cell nucleus:\n```\na) radius (mean of distances from center to points on the perimeter)\nb) texture (standard deviation of gray-scale values)\nc) perimeter\nd) area\ne) smoothness (local variation in radius lengths)\nf) compactness (perimeter^2 / area - 1.0)\ng) concavity (severity of concave portions of the contour)\nh) concave points (number of concave portions of the contour)\ni) symmetry\nj) fractal dimension (\"coastline approximation\" - 1)\n```\n## Preparing the environment\n\n1. Clone the repository, and navigate to the downloaded folder.\n```\nhttps://github.com/Sagor0078/fastapi-mlops-docker-k8s\ncd fastapi-mlops-docker-k8s\n```\n2. Create the virtual environment - Run the following command in the terminal to create a virtual environment named env:\n\n```bash\npython3 -m venv env\n```\n\n3. Activate the virtual environment - Activate the virtual environment using the following command:\n\n```bash\nsource env/bin/activate\n```\n4. Install dependencies - Install the required dependencies using the pip install command:\n\n```bash\npip install -r requirements.txt\n```\n\n\n# Train\n\nAfter installing all the dependencies we can now run the script in code/train.py, this script takes the input data and outputs a trained model and a pipeline for our web service.\n\n```bash\npython core/train.py\n```\n\n## Web application\n\nFinally, we can test our web application by running:\n\n```bash\nfastapi dev core/main.py\n```\n\n## Docker\n\nNow that we have our web application running, we can use the Dockerfile to create an image for running our web application inside a container\n\n```bash\ndocker build . -t ml_fastapi_docker\n```\n\nAnd now we can test our application using Docker\n\n```bash\ndocker run -p 8000:8000 ml_fastapi_docker\n```\n\n## docker-compose\n\nI also wrote the docker-compose file to mount our model and run our service.We will be using the model that we trained earlier. We can copy it to our `/var/tmp` directory, so we'll have a common directory to mount to the VM. We can use the command below for Mac and Linux:\n\n```bash\ncp -R ./model /var/tmp\n```\n\nthen run the following docker-compose command:\n\n```bash\ndocker-compose up\n```\n\n\n## Test!\n\nRun the following commands in terminal or run the `core/test.py` script:\n\n\n```bash\n# GET method info\ncurl -XGET http://localhost:8000/info\n\n# GET method health\ncurl -XGET http://localhost:8000/health\n\n# POST method predict\ncurl -H \"Content-Type: application/json\" -d '{\n  \"concavity_mean\": 0.3001,\n  \"concave_points_mean\": 0.1471,\n  \"perimeter_se\": 8.589,\n  \"area_se\": 153.4,\n  \"texture_worst\": 17.33,\n  \"area_worst\": 2019.0\n}' -XPOST http://0.0.0.0:8000/predict\n```\n\n# Deploying on Kubernetes\n\n* setup Kubernetes in our local machine for learning and development\n* create Kubernetes objects using YAML files\n* deploy containers\n* access the deployment using a Nodeport service\n* autoscale the deployment to dynamically handle incoming traffic \n\n## Installation\n\nFirst, we will setup our machine to run a local Kubernetes cluster. It's a great tool for learning and for local development as well. There are several Kubernetes distributions and the one best suited for our purpose is [Minikube](https://minikube.sigs.k8s.io/docs/). \n\n\nYou will need to install the following tools:\n\n* **curl** - a command-line tool for transferring data using various network protocols. You may have a already installed this but in case you haven't, [here is one reference](https://reqbin.com/Article/InstallCurl) to do so. You will use this to query your model later.\n\n* **Virtualbox** - Minikube is meant to run in a virtual machine (VM) so you will need virtualization software to act as the VM driver. While you can also specify `docker` as the VM driver, we found that it has limitations, so it's best to use Virtualbox instead. Installation instructions can be found [here](https://www.virtualbox.org/wiki/Downloads). When prompted by your OS, make sure to allow network traffic for this software, so you won't have firewall issues later on.\n\n* **kubectl** - the command line tool for interacting with Kubernetes clusters. Installation instructions can be found [here](https://kubernetes.io/docs/tasks/tools/)\n\n* **Minikube** - a Kubernetes distribution geared towards new users and development work. It is not meant for production deployments however since it can only run a single node cluster on our machine. Installation instructions [here](https://minikube.sigs.k8s.io/docs/start/).\n\n\n## Architecture\n\nThe application we'll be building will look like the figure below:\n\n\u003cimg src='img/kubernetes.png' alt='img/kubernetes.png'\u003e\n\nWe will create a deployment that spins up containers that runs a model server. In this case, that will be from the `ml_fastapi_docker` image we already used in the section. The deployment can be accessed by external terminals (i.e. our users) through an exposed service. This brings inference requests to the model servers and responds with predictions from our model.\n\nLastly, the deployment will spin up or spin down pods based on CPU utilization. It will start with one pod but when the load exceeds a pre-defined point, it will spin up additional pods to share the load.\n\n## Start Minikube\n\nWe are now almost ready to start our Kubernetes cluster. There is just one more additional step. As mentioned earlier, Minikube runs inside a virtual machine. That implies that the pods we will create later on will only see the volumes inside this VM. Thus, if we want to load a model into our pods, then we should first mount the location of this model inside Minikube's VM. Let's set that up now.\n\nWe will be using the model that we trained earlier. We can copy it to our `/var/tmp` directory, so we'll have a common directory to mount to the VM. We can use the command below for Mac and Linux:\n\n```bash\ncp -R ./model /var/tmp\n```\n\n\nNow we're ready to start Minikube! Run the command below to initialize the VM with Virtualbox and mount the folder containing our model file:\n\n\n```bash\nminikube start --mount=True --mount-string=\"/var/tmp:/var/tmp\" --vm-driver=virtualbox\n```\n```bash\n minikube mount /var/tmp:/var/tmp\n```\n\n## Creating Objects with YAML files\n\nIn the official Kubernetes basics tutorial, you mainly used `kubectl` to create objects such as pods, deployments, and services. While this definitely works, our setup will be more portable and easier to maintain if we configure them using [YAML](https://yaml.org/spec/1.2/spec.html) files. I've included these in the `yaml` directory of this ungraded lab, so we can peruse how these are constructed. The [Kubernetes API](https://kubernetes.io/docs/reference/kubernetes-api/) also documents the supported fields for each object. For example, the API for Pods can be found [here](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/).\n\nOne way to generate this when we don't have a template to begin with is to first use the `kubectl` command then use the `-o yaml` flag to output the YAML file for us. For example, the [kubectl cheatsheet](https://kubernetes.io/docs/reference/kubectl/cheatsheet/) shows that we can generate the YAML for a pod running an `nginx` image with this command:\n\n```bash\nkubectl run nginx --image=nginx --dry-run=client -o yaml \u003e pod.yaml\n```\n\nAll objects needed, are already provided, and you are free to modify them later when you want to practice different settings. Let's go through them one by one in the next sections.\n\n### Config Maps\n\nFirst, we will create a [config map](https://kubernetes.io/docs/concepts/configuration/configmap/) that defines a `MODEL_NAME` and `MODEL_PATH` variable. This is needed because of how the docker image is configured. \n\nIt basically starts up the model server and uses the environment variables `MODEL_BASE_PATH` and `MODEL_NAME` to find the model. Though we can explicitly define this as well in the `Deployment` YAML file, it would be more organized to have it in a configmap, so we can plug it in later. Please open `yaml/configmap.yaml` to see the syntax.\n\nWe can create the object now using `kubectl` as shown below. Notice the `-f` flag to specify a filename. We can also specify a directory but we'll do that later.\n\n```bash\nkubectl apply -f infra/configmap.yaml\n```\n\nWith that, you should be able to `get` and `describe` the object as before. For instance, `kubectl describe cm mlserving-configs` should show you:\n\n```\nName:         mlserving-configs\nNamespace:    default\nLabels:       \u003cnone\u003e\nAnnotations:  \u003cnone\u003e\n\nData\n====\nMODEL_NAME:\n----\nbreast_model\nMODEL_PATH:\n----\n/model/model_binary.dat.gz\n\nBinaryData\n====\n\nEvents:  \u003cnone\u003e\n```\n\n### Create a Deployment\n\n#### (Optional): Build and use the local docker images for minikube \nTo use a docker image without uploading it, you can follow these steps:\n\n- Set the environment variables with `eval $(minikube docker-env)`\n- Build the image with the Docker daemon of Minikube (e.g. `docker build -t ml_fastapi_docker .`)\n- Set the image in the pod spec like the build tag (e.g. my-image)\n- Set the `imagePullPolicy` to `Never`, otherwise Kubernetes will try to download the image.\n- **Important note**: You have to run eval `$(minikube docker-env)` on each terminal you want to use, since it only sets the environment variables for the current shell session.\n\n#### Creating deployment\nWe will now create the deployment for our application. Please open `infra/deployment.yaml` to see the spec for this object. You will see that it starts up one replica, uses `localhost:5000/fastapi-mlops-docker-k8s:latest` as the container image and defines environment variables via the `envFrom` tag. It also exposes port `8000` of the container because we will be sending HTTP requests to it later on. It also defines cpu and memory limits and mounts the volume from the Minikube VM to the container.\n\nAs before, we can apply this file to create the object:\n\n```bash\nkubectl apply -f infra/deployment.yaml\n```\n\nRunning `kubectl get deploy` after around 90 seconds should show you something like below to tell you that the deployment is ready.\n\n```\nNAME                    READY   UP-TO-DATE   AVAILABLE   AGE\nml-serving-deployment   1/1     1            1           15s\n```\n\nTroubleshooting commands: \n\n```bash\nkubectl get pods\nkubectl describe pods\nkubectl logs -f deployments/ml-serving-deployment\n```\n\n### Background Processing with Celery and Redis\n\nThis project uses Celery for background task processing and Redis as the message broker. Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation but supports scheduling as well.\n\n#### Setting Up Redis\n\nFirst, ensure that Redis is running. You can use the provided Kubernetes deployment configuration to set up Redis:\n\n```bash\nkubectl apply -f infra/redis-deployment.yaml\n```\n\nVerify that the Redis service is running:\n\n```bash\nkubectl get svc redis\n```\n\nYou should see something like this:\n\n```\nNAME    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE\nredis   ClusterIP   10.96.232.136   \u003cnone\u003e        6379/TCP   1m\n```\n\n#### Configuring Celery\n\nCelery is configured in the `celery.py` file. The Celery instance is created with Redis as the broker and backend:\n\n```python\nfrom celery import Celery\n\n# Create a Celery instance\ncelery_app = Celery(\n    'tasks',\n    broker='redis://localhost:6379/0',\n    backend='redis://localhost:6379/0'\n)\n\n# Load task modules from all registered Django app configs.\ncelery_app.autodiscover_tasks()\n```\n\n#### Defining Tasks\n\nTasks are defined in the `tasks.py` file. Here is an example task that processes model responses:\n\n```python\nfrom celery import Celery\nfrom utils.functions import get_model_response\n\ncelery_app = Celery('tasks', broker='redis://localhost:6379/0')\n\n@celery_app.task(name=\"tasks.get_model_response_task\")\ndef get_model_response_task(data):\n    return get_model_response(data)\n```\n\n#### Using Celery in FastAPI\n\nIn the `app.py` file, Celery tasks are used for background processing. Here is an example endpoint that uses Celery to process predictions:\n\n\n### Expose the deployment through a service\n\nWe will need to create a service so our application can be accessible outside the cluster. I've included `yaml/service.yaml` for that. It defines a [NodePort](https://kubernetes.io/docs/concepts/services-networking/service/#nodeport) service which exposes the node's port `30001`. Requests sent to this port will be sent to the containers' specified `targetPort` which is `8000`. \n\nApply `infra/service.yaml`:\n```bash\nkubectl apply -f infra/service.yaml\n```\nand run \n```bash\nkubectl get svc ml-serving-service\n```\n\nYou should see something like this:\n\n```\nNAME                 TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE\nml-serving-service   NodePort   10.102.161.7   \u003cnone\u003e        8000:30001/TCP   20m\n```\n\nWe can try accessing the deployment now as a sanity check. The following `curl` command will send a row of inference requests to the Nodeport service:\n\n```bash\ncurl -H \"Content-Type: application/json\" -d '{\n  \"concavity_mean\": 0.3001,\n  \"concave_points_mean\": 0.1471,\n  \"perimeter_se\": 8.589,\n  \"area_se\": 153.4,\n  \"texture_worst\": 17.33,\n  \"area_worst\": 2019.0\n}' -XPOST $(minikube ip):30001/predict\n```\n\nIf the command above does not work, you can run `minikube ip` first to get the IP address of the Minikube node. It should return a local IP address like `192.168.59.101`. You can then plug this in the command above by replacing the `$(minikube ip)` string. For example:\n\n```bash\ncurl -H \"Content-Type: application/json\" -d '{\n  \"concavity_mean\": 0.3001,\n  \"concave_points_mean\": 0.1471,\n  \"perimeter_se\": 8.589,\n  \"area_se\": 153.4,\n  \"texture_worst\": 17.33,\n  \"area_worst\": 2019.0\n}' -XPOST 192.168.59.101:30001/predict\n```\n\n\n\nIf the command is successful, you should see the results returned by the model:\n\n```\n{\"label\":\"M\",\"prediction\":1}\n```\n\nGreat! Our application is successfully running and can be accessed outside the cluster!\n\n### Horizontal Pod Autoscaler\n\nOne of the great advantages of container orchestration is it allows us to scale our application depending on user needs. Kubernetes provides a [Horizontal Pod Autoscaler (HPA)](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) to create or remove replicasets based on observed metrics. To do this, the HPA queries a [Metrics Server](https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/#metrics-server) to measure resource utilization such as CPU and memory. The Metrics Server is not launched by default in Minikube and needs to be enabled with the following command:\n\n```bash\nminikube addons enable metrics-server\n```\n\nYou should see a prompt saying `🌟  The 'metrics-server' addon is enabled\n` shortly. This launches a `metrics-server` deployment in the `kube-system` namespace. Run the command below and wait for the deployment to be ready.\n\n```bash\nkubectl get deployment metrics-server -n kube-system\n```\n\nYou should see something like:\n\n```\nNAME             READY   UP-TO-DATE   AVAILABLE   AGE\nmetrics-server   1/1     1            1           76s\n```\n\nWith that, we can now create our autoscaler by applying `infra/autoscale.yaml`:\n```bash\nkubectl apply -f infra/autoscale.yaml\n```\n\nPlease wait for about a minute, so it can query the metrics server. Running `kubectl get hpa` should show: \n\n```\nNAME             REFERENCE                          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE\nml-serving-hpa   Deployment/ml-serving-deployment   0%/2%    1         3         1          38s\n\n```\n\nIf it's showing `Unknown` instead of `0%` in the `TARGETS` column, you can try sending a few curl commands as you did earlier then wait for another minute.\n\n\n### Stress Test\n\nTo test the autoscaling capability of our deployment, I provided a short bash script (`request.sh`) that will just persistently send requests to our application. Please open a new terminal window, make sure that you're in the root directory of this README file, then run this command (for Linux and Mac) (alternatives we can use Locust for stress test):\n\n```bash\n/bin/bash request.sh\n```\n\n\nYou should see results being printed in quick succession.\n\nIf you're seeing connection refused, make sure that our service is still running with `kubectl get svc ml-serving-service`.\n\n### Dashboard\nThere are several ways to monitor this but the easiest would be to use Minikube's built-in dashboard. We can launch it by running:\n\n```\nminikube dashboard\n```\n\nIf you launched this immediately after you ran the request script, you should initially see a single replica running in the `Deployments` and `Pods` section:\n\n\u003cimg src='img/initial_load.png'\u003e\n\nAfter about a minute of running the script, you will observe that the CPU utilization will reach 5 to 6m. This is more than the 20% that we set in the HPA so it will trigger spinning up the additional replicas:\n\n\u003cimg src='img/autoscale_start.png'\u003e\n\nFinally, all 3 pods will be ready to accept request and will be sharing the load. See that each pod below shows `2.00m` CPU Usage.\n\n\u003cimg src='img/autoscaled.png'\u003e\n\nWe can now stop the `request.sh` script by pressing `Ctrl/Cmd + C`. Unlike scaling up, scaling down the number of pods will take longer before it is executed. You will wait around 5 minutes (where the CPU usage is below 1m) before you see that there is only one pod running again. This is the behavior for the `autoscaling/v1` API version we are using. There is already a `v2` in the beta stage being developed to override this behavior and you can read more about it [here](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#api-object).\n\n### Tear Down\n\nAfter we're done experimenting, we can destroy the resources we created. We can simply call `kubectl delete -f infra` to delete all resources defined in the `infra` folder. You should see something like this:\n\n```\nhorizontalpodautoscaler.autoscaling \"ml-serving-hpa\" deleted\nconfigmap \"mlserving-configs\" deleted\ndeployment.apps \"ml-serving-deployment\" deleted\nservice \"ml-serving-service\" deleted\n```\n\nWe can then re-create them all next time with one command by running `kubectl apply -f infra`. Just remember to check if `metrics-server` is enabled and running.\n\nIf we also want to destroy the VM, then we can run `minikube delete`. \n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsagor0078%2Ffastapi-mlops-docker-k8s","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsagor0078%2Ffastapi-mlops-docker-k8s","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsagor0078%2Ffastapi-mlops-docker-k8s/lists"}