{"id":15034068,"url":"https://github.com/gokumohandas/mlops-course","last_synced_at":"2025-05-15T02:06:23.186Z","repository":{"id":37384095,"uuid":"312152973","full_name":"GokuMohandas/mlops-course","owner":"GokuMohandas","description":"Learn how to design, develop, deploy and iterate on production-grade ML applications.","archived":false,"fork":false,"pushed_at":"2024-08-16T05:27:56.000Z","size":7509,"stargazers_count":3080,"open_issues_count":1,"forks_count":540,"subscribers_count":54,"default_branch":"main","last_synced_at":"2025-04-11T14:17:03.842Z","etag":null,"topics":["data-engineering","data-quality","data-science","deep-learning","distributed-ml","llms","machine-learning","mlops","natural-language-processing","python","pytorch","ray"],"latest_commit_sha":null,"homepage":"https://madewithml.com","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GokuMohandas.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-11-12T03:05:00.000Z","updated_at":"2025-04-08T09:30:04.000Z","dependencies_parsed_at":"2023-02-16T21:01:35.963Z","dependency_job_id":"feb7da8d-085e-42b6-a3eb-53dac280237f","html_url":"https://github.com/GokuMohandas/mlops-course","commit_stats":null,"previous_names":["gokumohandas/mlops"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GokuMohandas%2Fmlops-course","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GokuMohandas%2Fmlops-course/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GokuMohandas%2Fmlops-course/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GokuMohandas%2Fmlops-course/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GokuMohandas","download_url":"https://codeload.github.com/GokuMohandas/mlops-course/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254259370,"owners_count":22040819,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-engineering","data-quality","data-science","deep-learning","distributed-ml","llms","machine-learning","mlops","natural-language-processing","python","pytorch","ray"],"created_at":"2024-09-24T20:23:49.018Z","updated_at":"2025-05-15T02:06:23.169Z","avatar_url":"https://github.com/GokuMohandas.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MLOps Course\n\nLearn how to combine machine learning with software engineering to design, develop, deploy and iterate on production-grade ML applications.\n\n- Lessons: https://madewithml.com/\n- Code: [GokuMohandas/Made-With-ML](https://github.com/GokuMohandas/Made-With-ML)\n\n\u003ca href=\"https://madewithml.com/#course\"\u003e\n  \u003cimg src=\"https://madewithml.com/static/images/lessons.png\" alt=\"lessons\"\u003e\n\u003c/a\u003e\n\n## Overview\n\nIn this course, we'll go from experimentation (model design + development) to production (model deployment + iteration). We'll do this iteratively by motivating the components that will enable us to build a *reliable* production system.\n\n\u003cblockquote\u003e\n  \u003cimg width=20 src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/0/09/YouTube_full-color_icon_%282017%29.svg/640px-YouTube_full-color_icon_%282017%29.svg.png\"\u003e\u0026nbsp; Be sure to watch the video below for a quick overview of what we'll be building.\n\u003c/blockquote\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://youtu.be/AWgkt8H8yVo\"\u003e\u003cimg src=\"https://img.youtube.com/vi/AWgkt8H8yVo/0.jpg\" alt=\"Course overview video\"\u003e\u003c/a\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\n- **💡 First principles**: before we jump straight into the code, we develop a first principles understanding for every machine learning concept.\n- **💻 Best practices**: implement software engineering best practices as we develop and deploy our machine learning models.\n- **📈 Scale**: easily scale ML workloads (data, train, tune, serve) in Python without having to learn completely new languages.\n- **⚙️ MLOps**: connect MLOps components (tracking, testing, serving, orchestration, etc.) as we build an end-to-end machine learning system.\n- **🚀 Dev to Prod**: learn how to quickly and reliably go from development to production without any changes to our code or infra management.\n- **🐙 CI/CD**: learn how to create mature CI/CD workflows to continuously train and deploy better models in a modular way that integrates with any stack.\n\n## Audience\n\nMachine learning is not a separate industry, instead, it's a powerful way of thinking about data that's not reserved for any one type of person.\n\n- **👩‍💻 All developers**: whether software/infra engineer or data scientist, ML is increasingly becoming a key part of the products that you'll be developing.\n- **👩‍🎓 College graduates**: learn the practical skills required for industry and bridge gap between the university curriculum and what industry expects.\n- **👩‍💼 Product/Leadership**: who want to develop a technical foundation so that they can build amazing (and reliable) products powered by machine learning.\n\n## Set up\n\nBe sure to go through the [course](https://madewithml/#course) for a much more detailed walkthrough of the content on this repository. We will have instructions for both local laptop and Anyscale clusters for the sections below, so be sure to toggle the ► dropdown based on what you're using (Anyscale instructions will be toggled on by default). If you do want to run this course with Anyscale, where we'll provide the **structure**, **compute (GPUs)** and **community** to learn everything in one weekend, join our next upcoming live cohort → [sign up here](https://4190urw86oh.typeform.com/madewithml)!\n\n### Cluster\n\nWe'll start by setting up our cluster with the environment and compute configurations.\n\n\u003cdetails\u003e\n  \u003csummary\u003eLocal\u003c/summary\u003e\u003cbr\u003e\n  Your personal laptop (single machine) will act as the cluster, where one CPU will be the head node and some of the remaining CPU will be the worker nodes. All of the code in this course will work in any personal laptop though it will be slower than executing the same workloads on a larger cluster.\n\u003c/details\u003e\n\n\u003cdetails open\u003e\n  \u003csummary\u003eAnyscale\u003c/summary\u003e\u003cbr\u003e\n\n  We can create an [Anyscale Workspace](https://docs.anyscale.com/develop/workspaces/get-started) using the [webpage UI](https://console.anyscale.com/o/madewithml/workspaces/add/blank).\n\n  ```md\n  - Workspace name: `madewithml`\n  - Project: `madewithml`\n  - Cluster environment name: `madewithml-cluster-env`\n  # Toggle `Select from saved configurations`\n  - Compute config: `madewithml-cluster-compute`\n  ```\n\n  \u003e Alternatively, we can use the [CLI](https://docs.anyscale.com/reference/anyscale-cli) to create the workspace via `anyscale workspace create ...`\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003eOther (cloud platforms, K8s, on-prem)\u003c/summary\u003e\u003cbr\u003e\n\n  If you don't want to do this course locally or via Anyscale, you have the following options:\n\n  - On [AWS and GCP](https://docs.ray.io/en/latest/cluster/vms/index.html#cloud-vm-index). Community-supported Azure and Aliyun integrations also exist.\n  - On [Kubernetes](https://docs.ray.io/en/latest/cluster/kubernetes/index.html#kuberay-index), via the officially supported KubeRay project.\n  - Deploy Ray manually [on-prem](https://docs.ray.io/en/latest/cluster/vms/user-guides/launching-clusters/on-premises.html#on-prem) or onto platforms [not listed here](https://docs.ray.io/en/latest/cluster/vms/user-guides/community/index.html#ref-cluster-setup).\n\n\u003c/details\u003e\n\n### Git setup\n\nCreate a repository by following these instructions: [Create a new repository](https://github.com/new) → name it `Made-With-ML` → Toggle `Add a README file` (**very important** as this creates a `main` branch) → Click `Create repository` (scroll down)\n\nNow we're ready to clone the repository that has all of our code:\n\n```bash\ngit clone https://github.com/GokuMohandas/Made-With-ML.git .\ngit remote set-url origin https://github.com/GITHUB_USERNAME/Made-With-ML.git  # \u003c-- CHANGE THIS to your username\ngit checkout -b dev\n```\n\n### Virtual environment\n\n\u003cdetails\u003e\n  \u003csummary\u003eLocal\u003c/summary\u003e\u003cbr\u003e\n\n  ```bash\n  export PYTHONPATH=$PYTHONPATH:$PWD\n  python3 -m venv venv  # recommend using Python 3.10\n  source venv/bin/activate  # on Windows: venv\\Scripts\\activate\n  python3 -m pip install --upgrade pip setuptools wheel\n  python3 -m pip install -r requirements.txt\n  pre-commit install\n  pre-commit autoupdate\n  ```\n\n  \u003e Highly recommend using Python `3.10` and using [pyenv](https://github.com/pyenv/pyenv) (mac) or [pyenv-win](https://github.com/pyenv-win/pyenv-win) (windows).\n\n\u003c/details\u003e\n\n\u003cdetails open\u003e\n  \u003csummary\u003eAnyscale\u003c/summary\u003e\u003cbr\u003e\n\n  Our environment with the appropriate Python version and libraries is already all set for us through the cluster environment we used when setting up our Anyscale Workspace. So we just need to run these commands:\n  ```bash\n  export PYTHONPATH=$PYTHONPATH:$PWD\n  pre-commit install\n  pre-commit autoupdate\n  ```\n\n\u003c/details\u003e\n\n## Notebook\n\nStart by exploring the [jupyter notebook](notebooks/madewithml.ipynb) to interactively walkthrough the core machine learning workloads.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://madewithml.com/static/images/mlops/systems-design/workloads.png\"\u003e\n\u003c/div\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003eLocal\u003c/summary\u003e\u003cbr\u003e\n\n  ```bash\n  # Start notebook\n  jupyter lab notebooks/madewithml.ipynb\n```\n\n\u003c/details\u003e\n\n\u003cdetails open\u003e\n  \u003csummary\u003eAnyscale\u003c/summary\u003e\u003cbr\u003e\n\n  Click on the Jupyter icon \u0026nbsp;\u003cimg width=15 src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/3/38/Jupyter_logo.svg/1200px-Jupyter_logo.svg.png\"\u003e\u0026nbsp; at the top right corner of our Anyscale Workspace page and this will open up our JupyterLab instance in a new tab. Then navigate to the `notebooks` directory and open up the `madewithml.ipynb` notebook.\n\n\u003c/details\u003e\n\n\n## Scripts\n\nNow we'll execute the same workloads using the clean Python scripts following software engineering best practices (testing, documentation, logging, serving, versioning, etc.) The code we've implemented in our notebook will be refactored into the following scripts:\n\n```bash\nmadewithml\n├── config.py\n├── data.py\n├── evaluate.py\n├── models.py\n├── predict.py\n├── serve.py\n├── train.py\n├── tune.py\n└── utils.py\n```\n\n**Note**: Change the `--num-workers`, `--cpu-per-worker`, and `--gpu-per-worker` input argument values below based on your system's resources. For example, if you're on a local laptop, a reasonable configuration would be `--num-workers 6 --cpu-per-worker 1 --gpu-per-worker 0`.\n\n### Training\n```bash\nexport EXPERIMENT_NAME=\"llm\"\nexport DATASET_LOC=\"https://raw.githubusercontent.com/GokuMohandas/Made-With-ML/main/datasets/dataset.csv\"\nexport TRAIN_LOOP_CONFIG='{\"dropout_p\": 0.5, \"lr\": 1e-4, \"lr_factor\": 0.8, \"lr_patience\": 3}'\npython madewithml/train.py \\\n    --experiment-name \"$EXPERIMENT_NAME\" \\\n    --dataset-loc \"$DATASET_LOC\" \\\n    --train-loop-config \"$TRAIN_LOOP_CONFIG\" \\\n    --num-workers 1 \\\n    --cpu-per-worker 3 \\\n    --gpu-per-worker 1 \\\n    --num-epochs 10 \\\n    --batch-size 256 \\\n    --results-fp results/training_results.json\n```\n\n### Tuning\n```bash\nexport EXPERIMENT_NAME=\"llm\"\nexport DATASET_LOC=\"https://raw.githubusercontent.com/GokuMohandas/Made-With-ML/main/datasets/dataset.csv\"\nexport TRAIN_LOOP_CONFIG='{\"dropout_p\": 0.5, \"lr\": 1e-4, \"lr_factor\": 0.8, \"lr_patience\": 3}'\nexport INITIAL_PARAMS=\"[{\\\"train_loop_config\\\": $TRAIN_LOOP_CONFIG}]\"\npython madewithml/tune.py \\\n    --experiment-name \"$EXPERIMENT_NAME\" \\\n    --dataset-loc \"$DATASET_LOC\" \\\n    --initial-params \"$INITIAL_PARAMS\" \\\n    --num-runs 2 \\\n    --num-workers 1 \\\n    --cpu-per-worker 3 \\\n    --gpu-per-worker 1 \\\n    --num-epochs 10 \\\n    --batch-size 256 \\\n    --results-fp results/tuning_results.json\n```\n\n### Experiment tracking\n\nWe'll use [MLflow](https://mlflow.org/) to track our experiments and store our models and the [MLflow Tracking UI](https://www.mlflow.org/docs/latest/tracking.html#tracking-ui) to view our experiments. We have been saving our experiments to a local directory but note that in an actual production setting, we would have a central location to store all of our experiments. It's easy/inexpensive to spin up your own MLflow server for all of your team members to track their experiments on or use a managed solution like [Weights \u0026 Biases](https://wandb.ai/site), [Comet](https://www.comet.ml/), etc.\n\n```bash\nexport MODEL_REGISTRY=$(python -c \"from madewithml import config; print(config.MODEL_REGISTRY)\")\nmlflow server -h 0.0.0.0 -p 8080 --backend-store-uri $MODEL_REGISTRY\n```\n\n\u003cdetails\u003e\n  \u003csummary\u003eLocal\u003c/summary\u003e\u003cbr\u003e\n\n  If you're running this notebook on your local laptop then head on over to \u003ca href=\"http://localhost:8080/\" target=\"_blank\"\u003ehttp://localhost:8080/\u003c/a\u003e to view your MLflow dashboard.\n\n\u003c/details\u003e\n\n\u003cdetails open\u003e\n  \u003csummary\u003eAnyscale\u003c/summary\u003e\u003cbr\u003e\n\n  If you're on \u003ca href=\"https://docs.anyscale.com/develop/workspaces/get-started\" target=\"_blank\"\u003eAnyscale Workspaces\u003c/a\u003e, then we need to first expose the port of the MLflow server. Run the following command on your Anyscale Workspace terminal to generate the public URL to your MLflow server.\n\n  ```bash\n  APP_PORT=8080\n  echo https://$APP_PORT-port-$ANYSCALE_SESSION_DOMAIN\n  ```\n\n\u003c/details\u003e\n\n### Evaluation\n```bash\nexport EXPERIMENT_NAME=\"llm\"\nexport RUN_ID=$(python madewithml/predict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\nexport HOLDOUT_LOC=\"https://raw.githubusercontent.com/GokuMohandas/Made-With-ML/main/datasets/holdout.csv\"\npython madewithml/evaluate.py \\\n    --run-id $RUN_ID \\\n    --dataset-loc $HOLDOUT_LOC \\\n    --results-fp results/evaluation_results.json\n```\n```json\n{\n  \"timestamp\": \"June 09, 2023 09:26:18 AM\",\n  \"run_id\": \"6149e3fec8d24f1492d4a4cabd5c06f6\",\n  \"overall\": {\n    \"precision\": 0.9076136428670714,\n    \"recall\": 0.9057591623036649,\n    \"f1\": 0.9046792827719773,\n    \"num_samples\": 191.0\n  },\n...\n```\n\n### Inference\n```bash\n# Get run ID\nexport EXPERIMENT_NAME=\"llm\"\nexport RUN_ID=$(python madewithml/predict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\npython madewithml/predict.py predict \\\n    --run-id $RUN_ID \\\n    --title \"Transfer learning with transformers\" \\\n    --description \"Using transformers for transfer learning on text classification tasks.\"\n```\n```json\n[{\n  \"prediction\": [\n    \"natural-language-processing\"\n  ],\n  \"probabilities\": {\n    \"computer-vision\": 0.0009767753,\n    \"mlops\": 0.0008223939,\n    \"natural-language-processing\": 0.99762577,\n    \"other\": 0.000575123\n  }\n}]\n```\n\n### Serving\n\n\u003cdetails\u003e\n  \u003csummary\u003eLocal\u003c/summary\u003e\u003cbr\u003e\n\n  ```bash\n  # Start\n  ray start --head\n  ```\n\n  ```bash\n  # Set up\n  export EXPERIMENT_NAME=\"llm\"\n  export RUN_ID=$(python madewithml/predict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\n  python madewithml/serve.py --run_id $RUN_ID\n  ```\n\n  While the application is running, we can use it via cURL, Python, etc.:\n\n  ```bash\n  # via cURL\n  curl -X POST -H \"Content-Type: application/json\" -d '{\n    \"title\": \"Transfer learning with transformers\",\n    \"description\": \"Using transformers for transfer learning on text classification tasks.\"\n  }' http://127.0.0.1:8000/predict\n  ```\n\n  ```python\n  # via Python\n  import json\n  import requests\n  title = \"Transfer learning with transformers\"\n  description = \"Using transformers for transfer learning on text classification tasks.\"\n  json_data = json.dumps({\"title\": title, \"description\": description})\n  requests.post(\"http://127.0.0.1:8000/predict\", data=json_data).json()\n  ```\n\n  ```bash\n  ray stop  # shutdown\n  ```\n\n```bash\nexport HOLDOUT_LOC=\"https://raw.githubusercontent.com/GokuMohandas/Made-With-ML/main/datasets/holdout.csv\"\ncurl -X POST -H \"Content-Type: application/json\" -d '{\n    \"dataset_loc\": \"https://raw.githubusercontent.com/GokuMohandas/Made-With-ML/main/datasets/holdout.csv\"\n  }' http://127.0.0.1:8000/evaluate\n```\n\n\u003c/details\u003e\n\n\u003cdetails open\u003e\n  \u003csummary\u003eAnyscale\u003c/summary\u003e\u003cbr\u003e\n\n  In Anyscale Workspaces, Ray is already running so we don't have to manually start/shutdown like we have to do locally.\n\n  ```bash\n  # Set up\n  export EXPERIMENT_NAME=\"llm\"\n  export RUN_ID=$(python madewithml/predict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\n  python madewithml/serve.py --run_id $RUN_ID\n  ```\n\n  While the application is running, we can use it via cURL, Python, etc.:\n\n  ```bash\n  # via cURL\n  curl -X POST -H \"Content-Type: application/json\" -d '{\n    \"title\": \"Transfer learning with transformers\",\n    \"description\": \"Using transformers for transfer learning on text classification tasks.\"\n  }' http://127.0.0.1:8000/predict\n  ```\n\n  ```python\n  # via Python\n  import json\n  import requests\n  title = \"Transfer learning with transformers\"\n  description = \"Using transformers for transfer learning on text classification tasks.\"\n  json_data = json.dumps({\"title\": title, \"description\": description})\n  requests.post(\"http://127.0.0.1:8000/predict\", data=json_data).json()\n  ```\n\n\u003c/details\u003e\n\n### Testing\n```bash\n# Code\npython3 -m pytest tests/code --verbose --disable-warnings\n\n# Data\nexport DATASET_LOC=\"https://raw.githubusercontent.com/GokuMohandas/Made-With-ML/main/datasets/dataset.csv\"\npytest --dataset-loc=$DATASET_LOC tests/data --verbose --disable-warnings\n\n# Model\nexport EXPERIMENT_NAME=\"llm\"\nexport RUN_ID=$(python madewithml/predict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\npytest --run-id=$RUN_ID tests/model --verbose --disable-warnings\n\n# Coverage\npython3 -m pytest --cov madewithml --cov-report html\n```\n\n## Production\n\nFrom this point onwards, in order to deploy our application into production, we'll need to either be on Anyscale or on a [cloud VM](https://docs.ray.io/en/latest/cluster/vms/index.html#cloud-vm-index) / [on-prem](https://docs.ray.io/en/latest/cluster/vms/user-guides/launching-clusters/on-premises.html#on-prem) cluster you manage yourself (w/ Ray). If not on Anyscale, the commands will be [slightly different](https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.html) but the concepts will be the same.\n\n\u003e If you don't want to set up all of this yourself, we highly recommend joining our [upcoming live cohort](https://4190urw86oh.typeform.com/madewithml){:target=\"_blank\"} where we'll provide an environment with all of this infrastructure already set up for you so that you just focused on the machine learning.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://madewithml.com/static/images/mlops/jobs_and_services/manual.png\"\u003e\n\u003c/div\u003e\n\n### Authentication\n\nThese credentials below are **automatically** set for us if we're using Anyscale Workspaces. We **do not** need to set these credentials explicitly on Workspaces but we do if we're running this locally or on a cluster outside of where our Anyscale Jobs and Services are configured to run.\n\n``` bash\nexport ANYSCALE_HOST=https://console.anyscale.com\nexport ANYSCALE_CLI_TOKEN=$YOUR_CLI_TOKEN  # retrieved from Anyscale credentials page\n```\n\n### Cluster environment\n\nThe cluster environment determines **where** our workloads will be executed (OS, dependencies, etc.) We've already created this [cluster environment](./deploy/cluster_env.yaml) for us but this is how we can create/update one ourselves.\n\n```bash\nexport CLUSTER_ENV_NAME=\"madewithml-cluster-env\"\nanyscale cluster-env build deploy/cluster_env.yaml --name $CLUSTER_ENV_NAME\n```\n\n### Compute configuration\n\nThe compute configuration determines **what** resources our workloads will be executes on. We've already created this [compute configuration](./deploy/cluster_compute.yaml) for us but this is how we can create it ourselves.\n\n```bash\nexport CLUSTER_COMPUTE_NAME=\"madewithml-cluster-compute\"\nanyscale cluster-compute create deploy/cluster_compute.yaml --name $CLUSTER_COMPUTE_NAME\n```\n\n### Anyscale jobs\n\nNow we're ready to execute our ML workloads. We've decided to combine them all together into one [job](./deploy/jobs/workloads.yaml) but we could have also created separate jobs for each workload (train, evaluate, etc.) We'll start by editing the `$GITHUB_USERNAME` slots inside our [`workloads.yaml`](./deploy/jobs/workloads.yaml) file:\n```yaml\nruntime_env:\n  working_dir: .\n  upload_path: s3://madewithml/$GITHUB_USERNAME/jobs  # \u003c--- CHANGE USERNAME (case-sensitive)\n  env_vars:\n    GITHUB_USERNAME: $GITHUB_USERNAME  # \u003c--- CHANGE USERNAME (case-sensitive)\n```\n\nThe `runtime_env` here specifies that we should upload our current `working_dir` to an S3 bucket so that all of our workers when we execute an Anyscale Job have access to the code to use. The `GITHUB_USERNAME` is used later to save results from our workloads to S3 so that we can retrieve them later (ex. for serving).\n\nNow we're ready to submit our job to execute our ML workloads:\n```bash\nanyscale job submit deploy/jobs/workloads.yaml\n```\n\n### Anyscale Services\n\nAnd after our ML workloads have been executed, we're ready to launch our serve our model to production. Similar to our Anyscale Jobs configs, be sure to change the `$GITHUB_USERNAME` in [`serve_model.yaml`](./deploy/services/serve_model.yaml).\n\n```yaml\nray_serve_config:\n  import_path: deploy.services.serve_model:entrypoint\n  runtime_env:\n    working_dir: .\n    upload_path: s3://madewithml/$GITHUB_USERNAME/services  # \u003c--- CHANGE USERNAME (case-sensitive)\n    env_vars:\n      GITHUB_USERNAME: $GITHUB_USERNAME  # \u003c--- CHANGE USERNAME (case-sensitive)\n```\n\nNow we're ready to launch our service:\n```bash\n# Rollout service\nanyscale service rollout -f deploy/services/serve_model.yaml\n\n# Query\ncurl -X POST -H \"Content-Type: application/json\" -H \"Authorization: Bearer $SECRET_TOKEN\" -d '{\n  \"title\": \"Transfer learning with transformers\",\n  \"description\": \"Using transformers for transfer learning on text classification tasks.\"\n}' $SERVICE_ENDPOINT/predict/\n\n# Rollback (to previous version of the Service)\nanyscale service rollback -f $SERVICE_CONFIG --name $SERVICE_NAME\n\n# Terminate\nanyscale service terminate --name $SERVICE_NAME\n```\n\n### CI/CD\n\nWe're not going to manually deploy our application every time we make a change. Instead, we'll automate this process using GitHub Actions!\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://madewithml.com/static/images/mlops/cicd/cicd.png\"\u003e\n\u003c/div\u003e\n\n1. We'll start by adding the necessary credentials to the [`/settings/secrets/actions`](https://github.com/GokuMohandas/Made-With-ML/settings/secrets/actions) page of our GitHub repository.\n\n``` bash\nexport ANYSCALE_HOST=https://console.anyscale.com\nexport ANYSCALE_CLI_TOKEN=$YOUR_CLI_TOKEN  # retrieved from https://console.anyscale.com/o/madewithml/credentials\n```\n\n2. Now we can make changes to our code (not on `main` branch) and push them to GitHub. But in order to push our code to GitHub, we'll need to first authenticate with our credentials before pushing to our repository:\n\n```bash\ngit config --global user.name \"Your Name\"  # \u003c-- CHANGE THIS to your name\ngit config --global user.email you@example.com  # \u003c-- CHANGE THIS to your email\ngit add .\ngit commit -m \"\"  # \u003c-- CHANGE THIS to your message\ngit push origin dev\n```\n\nNow you will be prompted to enter your username and password (personal access token). Follow these steps to get personal access token: [New GitHub personal access token](https://github.com/settings/tokens/new) → Add a name → Toggle `repo` and `workflow` → Click `Generate token` (scroll down) → Copy the token and paste it when prompted for your password.\n\n3. Now we can start a PR from this branch to our `main` branch and this will trigger the [workloads workflow](/.github/workflows/workloads.yaml). If the workflow (Anyscale Jobs) succeeds, this will produce comments with the training and evaluation results directly on the PR.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://madewithml.com/static/images/mlops/cicd/comments.png\"\u003e\n\u003c/div\u003e\n\n4. If we like the results, we can merge the PR into the `main` branch. This will trigger the [serve workflow](/.github/workflows/serve.yaml) which will rollout our new service to production!\n\n### Continual learning\n\nWith our CI/CD workflow in place to deploy our application, we can now focus on continually improving our model. It becomes really easy to extend on this foundation to connect to scheduled runs (cron), [data pipelines](https://madewithml.com/courses/mlops/data-engineering/), drift detected through [monitoring](https://madewithml.com/courses/mlops/monitoring/), [online evaluation](https://madewithml.com/courses/mlops/evaluation/#online-evaluation), etc. And we can easily add additional context such as comparing any experiment with what's currently in production (directly in the PR even), etc.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://madewithml.com/static/images/mlops/cicd/continual.png\"\u003e\n\u003c/div\u003e\n\n## FAQ\n\n### Jupyter notebook kernels\n\nIssues with configuring the notebooks with jupyter? By default, jupyter will use the kernel with our virtual environment but we can also manually add it to jupyter:\n```bash\npython3 -m ipykernel install --user --name=venv\n```\nNow we can open up a notebook → Kernel (top menu bar) → Change Kernel → `venv`. To ever delete this kernel, we can do the following:\n```bash\njupyter kernelspec list\njupyter kernelspec uninstall venv\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgokumohandas%2Fmlops-course","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgokumohandas%2Fmlops-course","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgokumohandas%2Fmlops-course/lists"}