{"id":18339947,"url":"https://github.com/mxagar/mlflow_guide","last_synced_at":"2025-08-20T17:12:28.247Z","repository":{"id":224547616,"uuid":"763544486","full_name":"mxagar/mlflow_guide","owner":"mxagar","description":"My personal notes on how to use MLflow, compiled after following courses \u0026 tutorials, and after making personal experiences.","archived":false,"fork":false,"pushed_at":"2024-03-14T14:07:08.000Z","size":1926,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-09T20:47:44.673Z","etag":null,"topics":["data-science","guide","machine-learning","mlflow","mlops","tracking","tutorial"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mxagar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-26T13:51:58.000Z","updated_at":"2024-11-19T16:29:56.000Z","dependencies_parsed_at":"2024-02-26T15:28:04.981Z","dependency_job_id":"22279db2-1ad5-486d-be87-d500b5177749","html_url":"https://github.com/mxagar/mlflow_guide","commit_stats":null,"previous_names":["mxagar/mlflow_guide"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mxagar/mlflow_guide","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mxagar%2Fmlflow_guide","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mxagar%2Fmlflow_guide/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mxagar%2Fmlflow_guide/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mxagar%2Fmlflow_guide/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mxagar","download_url":"https://codeload.github.com/mxagar/mlflow_guide/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mxagar%2Fmlflow_guide/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263421504,"owners_count":23464013,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","guide","machine-learning","mlflow","mlops","tracking","tutorial"],"created_at":"2024-11-05T20:19:56.908Z","updated_at":"2025-07-03T23:34:08.014Z","avatar_url":"https://github.com/mxagar.png","language":null,"readme":"# MLflow\n\nThese are my personal notes on how to use MLflow, compiled after following courses and tutorials, as well as making personal experiences.\n\n**The main course I followed to structure the guide is [MLflow in Action - Master the art of MLOps using MLflow tool](https://www.udemy.com/course/mlflow-course), created by J Garg and published on Udemy.**\n\nI also followed the official MLflow tutorials as well as other resources; in any case, these are all referenced.\n\nIn addition to the current repository, you might be interested in my notes on the Udacity ML DevOps Nanodegree, which briefly introduces MLflow, [mlops_udacity](https://github.com/mxagar/mlops_udacity):\n\n- [Reproducible Model Workflows](https://github.com/mxagar/mlops_udacity/blob/main/02_Reproducible_Pipelines/MLOpsND_ReproduciblePipelines.md). While the current guide focuses on tracking and model handling, the Udacity notes focus more on how the project pipelines can be built using MLflow. Among others, sophisticated pipelines can be defined so that several components/modules are run one after the other, each storing artifacts used by the one that come later.\n- [Deploying a Scalable ML Pipeline in Production](https://github.com/mxagar/mlops_udacity/blob/main/03_Deployment/MLOpsND_Deployment.md)\n\n## Overview\n\n- [MLflow](#mlflow)\n  - [Overview](#overview)\n  - [1. Introduction to MLOps](#1-introduction-to-mlops)\n  - [2. Introduction to MLflow](#2-introduction-to-mlflow)\n    - [Components](#components)\n    - [Setup](#setup)\n  - [3. MLflow Tracking Component](#3-mlflow-tracking-component)\n    - [Basic Tracking - 01\\_tracking](#basic-tracking---01_tracking)\n    - [MLflow UI - 01\\_tracking](#mlflow-ui---01_tracking)\n    - [Extra: MLflow Tracking Quickstart with Server, Model Registration and Loading](#extra-mlflow-tracking-quickstart-with-server-model-registration-and-loading)\n  - [4. MLflow Logging Functions](#4-mlflow-logging-functions)\n    - [Get and Set Tracking URI - 02\\_logging](#get-and-set-tracking-uri---02_logging)\n    - [Experiment: Creating and Setting - 02\\_logging](#experiment-creating-and-setting---02_logging)\n    - [Runs: Starting and Ending - 02\\_logging](#runs-starting-and-ending---02_logging)\n    - [Logging Parameters, Metrics, Artifacts and Tags](#logging-parameters-metrics-artifacts-and-tags)\n  - [5. Launch Multiple Experiments and Runs - 03\\_multiple\\_runs](#5-launch-multiple-experiments-and-runs---03_multiple_runs)\n  - [6. Autologging in MLflow - 04\\_autolog](#6-autologging-in-mlflow---04_autolog)\n  - [7. Tracking Server of MLflow](#7-tracking-server-of-mlflow)\n  - [8. MLflow Model Component](#8-mlflow-model-component)\n    - [Storage Format: How the Models are Packages and Saved](#storage-format-how-the-models-are-packages-and-saved)\n    - [Model Signatures - 05\\_signatures](#model-signatures---05_signatures)\n    - [Model API](#model-api)\n  - [9. Handling Customized Models in MLflow](#9-handling-customized-models-in-mlflow)\n    - [Example: Custom Python Model - 06\\_custom\\_libraries](#example-custom-python-model---06_custom_libraries)\n    - [Custom Flavors](#custom-flavors)\n  - [10. MLflow Model Evaluation](#10-mlflow-model-evaluation)\n    - [Example: Evaluation of a Python Model - 07\\_evaluation](#example-evaluation-of-a-python-model---07_evaluation)\n    - [Example: Custom Evaluation Metrics and Artifacts - 07\\_evaluation](#example-custom-evaluation-metrics-and-artifacts---07_evaluation)\n    - [Example: Evaluation against Baseline - 07\\_evaluation](#example-evaluation-against-baseline---07_evaluation)\n  - [11. MLflow Registry Component](#11-mlflow-registry-component)\n    - [Registering via UI](#registering-via-ui)\n    - [Registering via API - 08\\_registry](#registering-via-api---08_registry)\n  - [12. MLflow Project Component](#12-mlflow-project-component)\n    - [CLI Options and Environment Variables](#cli-options-and-environment-variables)\n    - [Example: Running a Project with the CLI - 09\\_projects](#example-running-a-project-with-the-cli---09_projects)\n    - [Example: Running a Project with the Python API - 09\\_projects](#example-running-a-project-with-the-python-api---09_projects)\n    - [More Advanced Project Setups](#more-advanced-project-setups)\n  - [13. MLflow Client](#13-mlflow-client)\n  - [14. MLflow CLI Commands](#14-mlflow-cli-commands)\n  - [15. AWS Integration with MLflow](#15-aws-integration-with-mlflow)\n    - [AWS Account Setup](#aws-account-setup)\n    - [Setup AWS CodeCommit, S3, and EC2](#setup-aws-codecommit-s3-and-ec2)\n    - [Code Respository and Development](#code-respository-and-development)\n      - [Data Preprocessing](#data-preprocessing)\n      - [Training](#training)\n      - [MLproject file and Running Locally](#mlproject-file-and-running-locally)\n    - [Setup AWS Sagemaker](#setup-aws-sagemaker)\n    - [Training on AWS Sagemaker](#training-on-aws-sagemaker)\n    - [Model Comparison and Evaluation](#model-comparison-and-evaluation)\n    - [Deployment on AWS Sagemaker](#deployment-on-aws-sagemaker)\n    - [Model Inference](#model-inference)\n    - [Clean Up](#clean-up)\n  - [Authorship](#authorship)\n  - [Interesting Links](#interesting-links)\n\nThe examples are located in [`examples/`](./examples/).\n\n## 1. Introduction to MLOps\n\nSee [Building a Reproducible Model Workflow](https://github.com/mxagar/mlops_udacity/blob/main/02_Reproducible_Pipelines/MLOpsND_ReproduciblePipelines.md).\n\n## 2. Introduction to MLflow\n\n[MLflow](https://mlflow.org/docs/latest/index.html) was created by Databricks in 2018 and keeps being maintained by them; as they describe it...\n\n\u003e MLflow is an open-source platform, purpose-built to assist machine learning practitioners and teams in handling the complexities of the machine learning process. \n\u003e MLflow focuses on the full lifecycle for machine learning projects, ensuring that each phase is manageable, traceable, and reproducible.\n\u003e MLflow provides a unified platform to navigate the intricate maze of model development, deployment, and management.\n\nMain MLflow components:\n\n- Tracking: track experiements and compare parameters and results/metrics.\n- Projects: package code to ensure reusability and reproducibility.\n- Model and model registry: packaging for deployment, storing, and reusing models.\n\nAdditional (newer) components:\n\n- MLflow Deployments for LLMs\n- Evaluate\n- Prompt Engineering UI\n- Recipes\n\nMLflow...\n\n- is Language agnostic: it is a modular API-first approach, can be used with any language and minor changes are required in our code.\n- is Compatible: can be used in combination with any ML library/framework (PyTorch, Keras/TF, ...).\n- supports Integration tools: Docker containers, Spark, Kubernetes, etc.\n\n### Components\n\nAs mentioned, the main/foundational components are:\n\n- Tracking\n- Projects\n- Model\n- Model registry\n\nOther points:\n\n- Local and remote tracking servers can be set.\n- There is a UI.\n- There is a CLI.\n- Packaged models support many framework-model *flavours*, and can be served in varios forms, such as docker containers and REST APIs.\n\n### Setup\n\nIn order to use MLflow, we need to set up a Python environment and install MLflow using the [`requirements.txt`](./requirements.txt) file; here a quick recipe with the [conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) environment manager and [pip-tools](https://github.com/jazzband/pip-tools):\n\n```bash\n# Set proxy, if required\n\n# Create environment, e.g., with conda, to control Python version\nconda create -n mlflow python=3.10 pip\nconda activate mlflow\n\n# Install pip-tools\npython -m pip install -U pip-tools\n\n# Generate pinned requirements.txt\npip-compile requirements.in\n\n# Install pinned requirements, as always\npython -m pip install -r requirements.txt\n\n# If required, add new dependencies to requirements.in and sync\n# i.e., update environment\npip-compile requirements.in\npip-sync requirements.txt\npython -m pip install -r requirements.txt\n\n# To delete the conda environment, if required\nconda remove --name mlflow --all\n```\n\n## 3. MLflow Tracking Component\n\n### Basic Tracking - 01_tracking\n\nMLflow distinguishes:\n\n- Experiments: logical groups of runs\n- Runs: an experiment can have several runs, which is a single code execution,\n  - each with a defined set of hyperparameters, which can be specific to the run and a specific code version,\n  - and where run metrics can be saved.\n\nIn the section example, a regularized linear regression is run using `ElasticNet` from `sklearn` (it combines L1 and L2 regularizations).\n\nSummary of [`01_tracking/basic_regression_mlflow.py`](./examples/01_tracking/basic_regression_mlflow.py):\n\n```python\n# Imports\nimport mlflow\nimport mlflow.sklearn\nfrom mlflow.models import infer_signature\n\n# ...\n# Fit model\n# It is recommended to fit and evaluate the model outside\n# of the `with` context in which the run is logged:\n# in case something goes wrong, no run is created\nlr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)\nlr.fit(train_x, train_y)\n# Predict and evaluate\npredicted_qualities = lr.predict(test_x)\n(rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)\n\n# Create experiment, if not existent, else set it\nexp = mlflow.set_experiment(experiment_name=\"experment_1\")\n\n# Infer the model signature: model input \u0026 output schemas\nsignature = infer_signature(train_x, lr.predict(train_x))\n\n# Log run in with context\nwith mlflow.start_run(experiment_id=exp.experiment_id):    \n    # Log: parameters, metrics, model itself\n    mlflow.log_param(\"alpha\", alpha)\n    mlflow.log_param(\"l1_ratio\", l1_ratio)\n    mlflow.log_metric(\"rmse\", rmse)\n    mlflow.log_metric(\"r2\", r2)\n    mlflow.log_metric(\"mae\", mae)\n    mlflow.sklearn.log_model(\n        sk_model=lr,\n        artifact_path=\"wine_model\", # dir name in the artifacts to dump model\n        signature=signature,\n        input_example=train_x[:2],\n        # If registered_model_name is given, the model is registered!\n        #registered_model_name=f\"elasticnet-{alpha}-{l1_ratio}\",\n    )\n```\n\nWe can run the script as follows:\n\n```bash\nconda activate mlflow\ncd .../examples/01_tracking\n# Run 1\npython ./basic_regression_mlflow.py # default parameters: 0.7, 0.7\n# Run 2\npython ./basic_regression_mlflow.py --alpha 0.5 --l1_ratio 0.1\n# Run 3\npython ./basic_regression_mlflow.py --alpha 0.1 --l1_ratio 0.9\n```\n\nThen, a folder `mlruns` is created, which contains all the information of the experiments we create and the associated runs we execute.\n\nThis `mlruns` folder is very important, and it contains the following\n\n```\n.trash/             # deleted infor of experiments, runs, etc.\n0/                  # default experiment, ignore it\n99xxx/              # our experiment, hashed id\n  meta.yaml         # experiment YAML: id, name, creation time, etc.\n  8c3xxx/           # a run, for each run we get a folder with an id\n    meta.yaml       # run YAML: id, name, experiment_id, time, etc.\n    artifacts/\n      mymodel/      # dumped model: PKL, MLmodel, conda.yaml, requirements.txt, etc.\n        ...\n    metrics/        # once ASCII file for each logged metric\n    params/         # once ASCII file for each logged param\n    tags/           # metadata tags, e.g.: run name, committ hash, filename, ...\n  6bdxxx/           # another run\n    ...\nmodels/             # model registry, if we have registered any model\n  \u003cmodel_name\u003e/\n    meta.yaml\n    version-x/\n      meta.yaml\n```\n\n Notes:\n\n- Even though here it's not obvious, MLflow works with a client-server architecture: the server is a tracking server, which can be remote, and we use a local client that can send/get the data to/from the server backend. The client is usually our program where we call the Python API.\n- We can specify where this `mlruns` folder is created, it can be even a remote folder.\n- If we log the artifacts using some calls, the folder `mlartifacts` might be created, close to `mlruns`; then, the artifacts are stored in `mlartifacts` and they end up being referenced in the `mlruns` YAML with a URI of the form `mlflow-artifacts:/xxx`.\n- Usually, the `mlruns` and `mlartifacts` folders should be in a remote server; if local, we should add it to `.gitignore`.\n- **Note that `artifacts/` contains everything necessary to re-create the environment and load the trained model!**\n- **We have logged the model, but it's not registered unless the parameter `registered_model_name` is passed, i.e., there's no central model registry yet without the registering name!**\n- Usually, the UI is used to visualize the metrics; see below.\n\n### MLflow UI - 01_tracking\n\nThe results of executing different runs can be viewed in a web UI:\n\n```bash\nconda activate mlflow\n# Go to the folder where the experiment/runs are, e.g., we should see the mlruns/ folder\ncd .../examples/01_tracking\n# Serve web UI\nmlflow ui\n# Open http://127.0.0.1:5000 in browser\n# The experiments and runs saved in the local mlruns folder are loaded\n```\n\nThe UI has two main tabs: `Experiments` and ``Models`.\n\nIn `Models`, we can see the registered models.\n\nIn `Experiments`, we can select our `experiment_1` and run information is shown:\n\n- We see each run has a (default) name assigned, if not given explicitly.\n- Creation time stamp appears.\n- We can add param/metric columns.\n- We can filter/sort with column values.\n- We can select Table/Chart/Evaluation views.\n- We can download the runs as a CSV.\n- We can select \u003e= 2 runs and click on `Compare`; different comparison plots are possible: \n  - Parallel plot\n  - Scatter plot\n  - Box plot\n  - Contour plot\n- We can click on each run and view its details:\n  - Parameters\n  - Metrics\n  - Artifacts: here we see the model and we can register it if we consider the run produced a good one.\n\n![MLflow Experiments: UI](./assets/mlflow_experiments_ui.jpg)\n\n![MLflow Experiments Plots: UI](./assets/mlflow_experiments_ui_plots.jpg)\n\n![MLflow Experiment Run: UI](./assets/mlflow_experiments_ui_run.jpg)\n\n### Extra: MLflow Tracking Quickstart with Server, Model Registration and Loading\n\nSource: [MLflow Tracking Quickstart](https://mlflow.org/docs/latest/getting-started/intro-quickstart/index.html)\n\nIn addition to the example above, this other (official) example is also interesting: The Iris dataset is used to fit a logistic regression. These new points are shown:\n\n- A dedicated server is started with `mlflow server`; beforehand, we did not explicitly start a server, i.e., the library operated without any servers. We can start a server to, e.g., have a local/remote server instance. In the following example, a local server is started. In those cases, we need to explicitly use the server URI in the code. Additionally, since we now have a server, we don't run `mlflow ui`, but we simply open the server URI.\n- MLflow tracking/logging is done using the server URI.\n- The model is loaded using `mlflow.pyfunc.load_model()` and used to generate some predictions.\n\nA server is created as follows:\n\n```bash\nmlflow server --host 127.0.0.1 --port 8080\n# URI: http://127.0.0.1:8080, http://localhost:8080\n# To open the UI go to that URI with the browser\n```\n\nEven though for the user starting or not starting the server seems to have minimal effects on the operations (only the URI needs to be set), the underlying architecture is different:\n\n- When no server is launched, `mlflow` is used as a library which creates/stores some files.\n- When a server is launched, the `mlflow` library communicates to a server (REST) which creates/stores some files.\n\nFor more information on the **tracking server**, see the section [7. Tracking Server of MLflow](#7-tracking-server-of-mlflow).\n\nExample code:\n\n1. ML training and evaluation\n2. MLflow tracking with model registration\n3. MLflow model loading and using\n\n```python\nimport mlflow\nfrom mlflow.models import infer_signature\n\nimport pandas as pd\nfrom sklearn import datasets\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score\n\n### -- 1. ML Training and evaluation\n\n# Load the Iris dataset\nX, y = datasets.load_iris(return_X_y=True)\n\n# Split the data into training and test sets\nX_train, X_test, y_train, y_test = train_test_split(\n    X, y, test_size=0.2, random_state=42\n)\n\n# Define the model hyperparameters\nparams = {\n    \"solver\": \"lbfgs\",\n    \"max_iter\": 1000,\n    \"multi_class\": \"auto\",\n    \"random_state\": 8888,\n}\n\n# Train the model\nlr = LogisticRegression(**params)\nlr.fit(X_train, y_train)\n\n# Predict on the test set\ny_pred = lr.predict(X_test)\n\n# Calculate metrics\naccuracy = accuracy_score(y_test, y_pred)\n\n### -- 2. MLflow tracking with model registration\n\n# Set our tracking server uri for logging\nmlflow.set_tracking_uri(uri=\"http://127.0.0.1:8080\")\n\n# Create a new MLflow Experiment\nmlflow.set_experiment(\"MLflow Quickstart\")\n\n# Start an MLflow run\nwith mlflow.start_run():\n    # Log the hyperparameters\n    mlflow.log_params(params)\n\n    # Log the loss metric\n    mlflow.log_metric(\"accuracy\", accuracy)\n\n    # Set a tag that we can use to remind ourselves what this run was for\n    mlflow.set_tag(\"Training Info\", \"Basic LR model for iris data\")\n\n    # Infer the model signature: model input and output schemas\n    signature = infer_signature(X_train, lr.predict(X_train))\n\n    # Log the model\n    model_info = mlflow.sklearn.log_model(\n        sk_model=lr,\n        artifact_path=\"iris_model\",\n        signature=signature,\n        input_example=X_train,\n        registered_model_name=\"tracking-quickstart\",\n    )\n\n### -- 3. MLflow model loading and using\n\n# Load the model back for predictions as a generic Python Function model\nloaded_model = mlflow.pyfunc.load_model(model_info.model_uri)\n\npredictions = loaded_model.predict(X_test)\n\niris_feature_names = datasets.load_iris().feature_names\n\nresult = pd.DataFrame(X_test, columns=iris_feature_names)\nresult[\"actual_class\"] = y_test\nresult[\"predicted_class\"] = predictions\n\nprint(result[:4])\n\n```\n\n## 4. MLflow Logging Functions\n\nIn this section `mlflow.log_*` functions are explained in detail.\n\n### Get and Set Tracking URI - 02_logging\n\nWe can use MLflow tracking in different ways:\n\n- If we simply write python code, `mlruns` is created locally and all information is stored there. Then, we start `mlflow ui` in the terminal, in the folder which contains `mlruns`, to visualize the results.\n- We can also run `mlflow server --host \u003cHOST\u003e --port \u003cPORT\u003e`; in that case, in our code we need to `mlflow.set_tracking_uri(http://\u003cHOST\u003e:\u003cPORT\u003e)` to connect to the tracking server and to open the UI we need to open `http://\u003cHOST\u003e:\u003cPORT\u003e` with the browser.\n- Additionally, we can use `set_tracking_uri()` to define in the code where the data is/should be stored. Similarly, `get_tracking_uri()` retrieves the location.\n\nPossible parameter values for `set_tracking_uri()`\n\n    empty string: data saved automatically in ./mlruns\n    local folder name: \"./my_folder\"\n    file path: \"file:/Users/username/path/to/file\" (no C:)\n    URL:\n      (local) \"http://localhost:5000\"\n      (remote) \"https://my-tracking-server:5000\"\n    databricks workspace: \"databricks://\u003cprofileName\u003e\"\n\nThe file [`02_logging/uri.py`](./examples/02_logging/uri.py) is the same as [`01_tracking/basic_regression_mlflow.py`](./examples/01_tracking/basic_regression_mlflow.py), but with these new lines:\n\n```python\n# We set the empty URI\nmlflow.set_tracking_uri(uri=\"\")\n# We get the URI\nprint(\"The set tracking uri is \", mlflow.get_tracking_uri()) # \"\"\n# Create experiment, if not existent, else set it\nexp = mlflow.set_experiment(experiment_name=\"experment_1\")\n```\n\nIf we change:\n\n- `uri=\"my_tracking\"`\n- `experiment_name=\"experiment_2\"`\n\nThen, we're going to get a new folder `my_tracking` beside usual `mlruns`.\n\nWe can run the script as follows:\n\n```bash\nconda activate mlflow\ncd .../examples/02_logging\npython ./uri.py\n\n# To start the UI pointing to that tracking folder\nmlflow ui --backend-store-uri 'my_tracking'\n```\n\n### Experiment: Creating and Setting - 02_logging\n\nOriginal MLflow documentation:\n\n- [Creating Experiments](https://mlflow.org/docs/latest/getting-started/logging-first-model/step3-create-experiment.html)\n- [`mlflow.create_experiment()`](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.create_experiment)\n- [`mlflow.set_experiment()`](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_experiment)\n\nThe file [`02_logging/experiment.py`](./examples/02_logging/experiment.py) is the same as [`01_tracking/basic_regression_mlflow.py`](./examples/01_tracking/basic_regression_mlflow.py), but with these new lines:\n\n```python\nfrom pathlib import Path\n\n# ...\n\n# Create new experiment\n# - name: unique name\n# - artifact_location: location to store run artifacts, default: artifacts\n# - tags: optional dictionary of string keys and values to set tags\n# Return: id\nexp_id = mlflow.create_experiment(\n    name=\"exp_create_exp_artifact\",\n    tags={\"version\": \"v1\", \"priority\": \"p1\"},\n    artifact_location=Path.cwd().joinpath(\"myartifacts\").as_uri() # must be a URI: file://...\n)\n\nexp = mlflow.get_experiment(exp_id)\nprint(\"Name: {}\".format(exp.name)) # exp_create_exp_artifact\nprint(\"Experiment_id: {}\".format(exp.experiment_id)) # 473668474626843335\nprint(\"Artifact Location: {}\".format(exp.artifact_location)) # file:///C:/Users/.../mlflow_guide/examples/02_logging/myartifacts\nprint(\"Tags: {}\".format(exp.tags)) # {'priority': 'p1', 'version': 'v1'}\nprint(\"Lifecycle_stage: {}\".format(exp.lifecycle_stage)) # active\nprint(\"Creation timestamp: {}\".format(exp.creation_time)) # 1709202652141\n\n# Set existent experiment; not existent, it is created\n# - name\n# - experiment_id\n# Return: experiment object itself, not the id as in create_experiment!\nexp = mlflow.set_experiment(\n    name=\"exp_create_exp_artifact\"\n)\n```\n\n### Runs: Starting and Ending - 02_logging\n\nWe can start runs outside from `with` contexts.\n\nOriginal documentation links:\n\n- [`mlflow.start_run()`](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.start_run)\n- [`mlflow.end_run()`](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.end_run)\n- [`mlflow.active_run()`](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.active_run)\n- [`mlflow.last_active_run()`](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.last_active_run)\n\nThe file [`02_logging/run.py`](./examples/02_logging/run.py) is the same as [`01_tracking/basic_regression_mlflow.py`](./examples/01_tracking/basic_regression_mlflow.py), but with these new lines:\n\n```python\n# Start a run\n# - run_id: optional; we can set it to overwrite runs, for instance\n# - experiment_id: optional\n# - run_name: optional, if run_id not specified\n# - nested: to create a run within a run, set it to True\n# - tags\n# - description\n# Returns: mlflow.ActiveRun context manager that can be used in `with` block\nactive_run = mlflow.start_run()\n\n# If we don't call start_run() inside a with, we need to stop it\n# - status = \"FINISHED\" (default), \"SCHEDULED\", \"FAILED\", \"KILLED\"\nmlflow.end_run()\n\n# Get current active run\n# Returns ActiveRun context manager\nactive_run = mlflow.active_run()\n\n# Get the last run which was active, called after end_run()\n# Returns Run object\nmlflow.end_run()\nrun = mlflow.last_active_run()\nprint(\"Active run id is {}\".format(run.info.run_id)) # 02ae930f5f2348c6bc3b411bb7de297a\nprint(\"Active run name is {}\".format(run.info.run_name)) # traveling-tern-43\n```\n\n### Logging Parameters, Metrics, Artifacts and Tags\n\nWe have several options to log parameters, metrics and artifacts, as shown below.\n\nThe file [`02_logging/artifact.py`](./examples/02_logging/artifact.py) is similar to [`01_tracking/basic_regression_mlflow.py`](./examples/01_tracking/basic_regression_mlflow.py); these are the \n\n```python\n# ...\n\n## -- Parameters\n\n# Hyperparameters passed as key-value pairs\nmlflow.log_param(key: str, value: Any) # single hyperparam -\u003e Returns logged param!\nmlflow.log_params(params: Dict[str, Any]) # multiple hyperparams -\u003e No return\nmlflow.log_params(params={\"alpha\": alpha, \"l1_ratio\": l1_ratio})\n\n## -- Metrics\n\n# Metrics passed as key-value pairs: RMSE, etc.\nmlflow.log_metric(key: str, value: float, step: Optional[int] = None) # single -\u003e No return\nmlflow.log_metrics(metrics: Dict[str, float], step: Optional[int] = None) # multiple -\u003e No return\nmlflow.log_metrics(metrics={\"mae\": mae, \"r2\": r2})\n\n## -- Artifacts\n\n# Log artifacts: datasets, etc.\n# We can also log models, but it's better to use mlflow.\u003cframework\u003e.log_model for that\n# We pass the local_path where the artifact is\n# and it will be stored in the mlruns folder, in the default path for the artifacts,\n# unless we specify a different artifact_path\nmlflow.log_artifact(local_path: str, artifact_path: Optional[str] = None) # single -\u003e No return\n# The multiple case takes a directory, and all the files within it are stored\n# Use-cases: Computer Vision, folder with images; train/test splits\nmlflow.log_artifacts(local_dir: str, artifact_path: Optional[str] = None) # multiple\n\n# Example\n# local dataset: org \u0026 train/test split\ndata = pd.read_csv(\"../data/red-wine-quality.csv\")\nPath(\"./data/\").mkdir(parents=True, exist_ok=True)\ndata.to_csv(\"data/red-wine-quality.csv\", index=False)\ntrain, test = train_test_split(data)\ntrain.to_csv(\"data/train.csv\")\ntest.to_csv(\"data/test.csv\")\nmlflow.log_artifacts(\"data/\")\n\n# Get the absolute URI of an artifact\n# If we input the artifact_path, the URI of the specific artifact is returned,\n# else, the URI of the current artifact directory is returned.\nartifacts_uri = mlflow.get_artifact_uri(artifact_path: Optional[str] = None)\n\nartifacts_uri = mlflow.get_artifact_uri() # file://.../exp_xxx/run_yyy/artifacts\nartifacts_uri = mlflow.get_artifact_uri(artifact_path=\"data/train.csv\") # file://.../exp_xxx/run_yyy/artifacts/data/train.csv\n\n## -- Tags\n# Tags are used to group runs; mlflow creates also some internal tags automatically\n# Tags are assigned to a run, so they can be set between start \u0026 end\nmlflow.set_tag(key: str, value: Any) # single -\u003e No return\nmlflow.set_tags(tags: Dict[str, Any]) # multiple -\u003e No return\nmlflow.set_tags(tags={\"version\": \"1.0\", \"environment\": \"dev\"})\n```\n\n## 5. Launch Multiple Experiments and Runs - 03_multiple_runs\n\nIn some cases we want to do several runs in the same training session:\n\n- When we perform *incremental training*, i.e., we train until a given point and then we decide to continue doing it.\n- If we are saving *model checkpoints*.\n- *Hyperparameter tuning*: one run for eahc parameter set.\n- *Cross-validation*: one run for each fold.\n- *Feature engineering*: one run for each set of transformations.\n- ...\n\nSimilarly, we can launch several experiments in a process; this makes sense when we are trying different models.\n\nIn order to run several experiments/runs one after the other, we can just choose the names of each manually, nothing more needs to be done.\n\n```python\n# -- Experiment 1\nexp = mlflow.set_experiment(experiment_name=\"experiment_1\")\n# Run 1\nmlflow.start_run(run_name=\"run_1.1\")\n# ... do anthing\nmlflow.end_run()\n# Run 2\nmlflow.start_run(run_name=\"run_1.2\")\n# ... do anything\nmlflow.end_run()\n\n# -- Experiment 2\nexp = mlflow.set_experiment(experiment_name=\"experiment_2\")\n# Run 1\nmlflow.start_run(run_name=\"run_1.1\")\n# ... do anthing\nmlflow.end_run()\n```\n\nExamples in [`03_multiple_runs/multiple_runs.py`](./examples/03_multiple_runs/multiple_runs.py).\n\nNote that if we launch several runs and experiments, it makes sense to launch them in parallel!\n\n## 6. Autologging in MLflow - 04_autolog\n\nMLflow allows automatically logging parameters and metrics, without the need to specifying them explicitly. We just need to place `mlflow.autolog()` just before the model definition and training; then, all the model parameters and metrics are logged.\n\nIf we activate the autologging but would like to still log manually given things (e.g., models), we need to de-activate the autologging for those things in the `mlflow.autolog()` call.\n\n```python\n# Generic autolog: the model library is detected and its logs are carried out\nmlflow.autolog(log_models: boot = True, # log model or not\n               log_input_examples: bool = False, # log input examples or not\n               log_model_signatures: bool = True, # signatures: schema of inputs and outputs\n               log_datasets: bool = False,\n               disable: bool = False, # disable all automatic logging\n               exclusive: bool = False, # if True, autologged content not logged to user-created runs\n               disable_for_unsupported_versions: bool = False, # \n               silent: bool = False) # supress all event logs and warnings\n\n# Library-specific, i.e., we explicitly specify the librar:\n# sklearn, keras, xgboost, pytorch, spark, gluon, statsmodels, ...\n# Same parameters as mlflow.autolog) + 5 additonal\nmlflow.\u003cframework\u003e.autolog(...,\n                           max_tuning_runs: int = 5, # max num of child MLflow runs for hyperparam search\n                           log_post_training_metrics: bool = True, # metrics depend on model type\n                           serialization_format: str = 'cloudpickle', # each library has its own set\n                           registered_model_name: Optional[str] = None, # to serialize the model\n                           pos_label: Optional[str] = None) # positive class in binary classification\nmlflow.sklearn.autolog(...)\n\n# Now we define and train the model\n# ...\n```\n\n## 7. Tracking Server of MLflow\n\nInstead of storing everything locally on `./mlruns`, we can launch a **tracking server** hosted loally or remotely, as explained in the section [Extra; MLflow Tracking Quickstart with Server Model Registration and Loading](#extra-mlflow-tracking-quickstart-with-server-model-registration-and-loading). Then, the experiments are run on the *client*, which sends the information to the *server*.\n\nThe tracking server has 2 components:\n\n- Storage: We have two types, and both can be local/remote:\n  - **Backend store**: metadata, parameters, metrics, etc. We have two types:\n    - DB Stores: SQLite, MySQL, PostgreSQL, MsSql\n    - File Stores: local, Amazon S3, etc.\n  - **Artifact store**: artifacts, models, images, etc. Can be also local or remote!\n- Networking (communication): we stablish communucation between the client and the server. We have three types of communication:\n  - **REST API (HTTP)**\n  - RPC (gRPC)\n  - Proxy access: restricted access depending on user/role\n\nFor small projects, we can have everythung locally; however, as the projects get larger, we should have remote/distributed architectures.\n\nWe can consider several scenarios:\n\n1. MLflow locally:\n  - client: local machine where experiments run\n  - localhost:5000, but no separate server, i.e., no `mlflow server` launched\n  - artifact store in `./mlruns` or a specified folder\n  - backend store in `./mlruns` or a specified folder\n2. MLflow locally with SQLite:\n  - client: local machine where experiments run\n  - localhost:5000, but no separate server, i.e., no `mlflow server` launched\n  - artifact store in `./mlruns` or a specified folder\n  - **backend store in SQLite or similar DB, hosted locally**\n3. MLflow locally with Tracking Server\n  - client: local machine where experiments run; **client connects via REST to server**\n  - **localhost:5000, with separate server, i.e., launched via `mlflow server`**\n  - artifact store in `./mlruns` or a specified folder\n  - backend store in `./mlruns` or a specified folder\n4. Remote and Distributed: MLflow with remote Tracking Server and cloud/remote storages\n  - client: local machine where experiments run; **client connects via REST to server**\n  - **remotehost:port, remote server launched via `mlflow server` with ports exposed**\n  - **artifact store in an Amazon S3 bucket**\n  - **backend store in PostgreSQL DB hosted on an another machine/node**\n\nSee all the parameters of the CLI command [`mlflow server`](https://mlflow.org/docs/latest/cli.html#mlflow-server). Here some example calls:\n\n```bash\n# Scenario 3: MLflow locally with Tracking Server\n# --backend-store-uri: We specify our backend store, here a SQLite DB\n# --default-artifact-root: Directory where artifacts are stored, by default mlruns, here ./mlflow-artifacts \n# --host, --port: Where the server is running, and the port; here localhost:5000\nmlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./mlflow-artifacts --host 127.0.0.1 --port 5000\n# Then, we can browse http.://127.0.0.1:5000\n# In the exeperimentes, the tracking URI is http.://127.0.0.1:5000\n\n# Scenario 4: Remote and Distributed: MLflow with remote Tracking Server and cloud/remote storages\nmlflow server --backend-store-uri postgresql://user:password@postgres:5432/mlflowdb --default-artifact-root s3://bucket_name --host remote_host --no-serve-artifacts\n```\n\n## 8. MLflow Model Component\n\nThe MLflow Model Component allows to package models for deployment (similar to ONNX):\n\n- Standard formats are used, along with dependencies.\n- Reproducibility and reusability is enabled, by tracking lineage.\n- Flexibility is allowed, by enabling realtime/online and batch inference.\n\nAdditionally, we have \n\n- a central repository\n- and an API.\n\nThe Model Component consists of\n\n- a **storage format**:\n  - how they are packages and saved\n  - all the contents in the package: metadata, version, hyperparameters, etc.\n  - format itself: a directory, a single file, a Docker image, etc.\n- a **signature**:\n  - input and output types and shapes\n  - used by the API\n- the **API**:\n  - REST standardized interface\n  - synch / asych\n  - online and batch inference\n  - usable in various environments\n- a [**flavor**](https://mlflow.org/docs/latest/models.html#built-in-model-flavors):\n  - the serialization and storing method\n  - each framework has its own methods\n\n### Storage Format: How the Models are Packages and Saved\n\nIf we save a model locally using `mlflow.log_model()`, we'll get a local folder in the run `artifacts` with the following files:\n\n```bash\nconda.yaml            # conda environment\ninput_example.json    # few rows of the dataset that serve as input example\nMLmodel               # YAML, most important file: model packagaing described and referenced here\nmodel.pkl             # serialized model binary\npython_env.yaml       # virtualenv \nrequirements.txt      # dependencies for virtualenv\n```\n\nThose files ensure that the model environment and its environment are saved in a reproducible manner; we could set up a new environment with the same characteristics and start using the PKL.\n\nThe file `input_example.json` contains 2 rows of the input dataset:\n\n```json\n{\n  \"columns\": [\"Unnamed: 0\", \"fixed acidity\", \"volatile acidity\", \"citric acid\", \"residual sugar\", \"chlorides\", \"free sulfur dioxide\", \"total sulfur dioxide\", \"density\", \"pH\", \"sulphates\", \"alcohol\"],\n  \"data\": [[1316, 5.4, 0.74, 0.0, 1.2, 0.041, 16.0, 46.0, 0.99258, 4.01, 0.59, 12.5], [1507, 7.5, 0.38, 0.57, 2.3, 0.106, 5.0, 12.0, 0.99605, 3.36, 0.55, 11.4]]\n}\n```\n\n`MLmodel` is the most important file and it describes the model for MLflow; really everything is defined or referenced here, which enables to reproduce the model inference anywhere:\n\n```yaml\nartifact_path: wine_model\nflavors:\n  python_function:\n    env:\n      conda: conda.yaml\n      virtualenv: python_env.yaml\n    loader_module: mlflow.sklearn\n    model_path: model.pkl\n    predict_fn: predict\n    python_version: 3.10.13\n  sklearn:\n    code: null\n    pickled_model: model.pkl\n    serialization_format: cloudpickle\n    sklearn_version: 1.4.1.post1\nmlflow_version: 2.10.2\nmodel_size_bytes: 1263\nmodel_uuid: 14a531b7b86a422bbcedf78e4c23821e\nrun_id: 22e80d6e88a94973893abf8c862ae6ca\nsaved_input_example_info:\n  artifact_path: input_example.json\n  pandas_orient: split\n  type: dataframe\nsignature:\n  inputs: '[{\"type\": \"long\", \"name\": \"Unnamed: 0\", \"required\": true}, {\"type\": \"double\",\n    \"name\": \"fixed acidity\", \"required\": true}, {\"type\": \"double\", \"name\": \"volatile\n    acidity\", \"required\": true}, {\"type\": \"double\", \"name\": \"citric acid\", \"required\":\n    true}, {\"type\": \"double\", \"name\": \"residual sugar\", \"required\": true}, {\"type\":\n    \"double\", \"name\": \"chlorides\", \"required\": true}, {\"type\": \"double\", \"name\": \"free\n    sulfur dioxide\", \"required\": true}, {\"type\": \"double\", \"name\": \"total sulfur dioxide\",\n    \"required\": true}, {\"type\": \"double\", \"name\": \"density\", \"required\": true}, {\"type\":\n    \"double\", \"name\": \"pH\", \"required\": true}, {\"type\": \"double\", \"name\": \"sulphates\",\n    \"required\": true}, {\"type\": \"double\", \"name\": \"alcohol\", \"required\": true}]'\n  outputs: '[{\"type\": \"tensor\", \"tensor-spec\": {\"dtype\": \"float64\", \"shape\": [-1]}}]'\n  params: null\nutc_time_created: '2024-02-27 17:14:24.719815'\n```\n\n### Model Signatures - 05_signatures\n\nThe model signature describes the data input and output types, i.e., the schema.\n\nThe types can be many, as described in [`mlflow.types.DataType`](https://mlflow.org/docs/latest/python_api/mlflow.types.html#mlflow.types.DataType). Among them, we have also `tensors`; these often\n\n- appear when deep learning models are used,\n- have one shape dimension set to `-1`, representing the batch size, which can have arbitrary values.\n\nIf the signature is saved, we can **enforce the signature**, which consists in validating the schema of the input data with the signature. This is somehow similar to using Pydantic. There are several levels of signature enforcement:\n\n- Signature enforcement: type and name\n- Name-ordering: only name order checked and fixed if necessary\n- Input-type: types are checked and casted if necessary\n\nAs shown in the files [`05_signatures/manual_signature.py`](./examples/05_signatures/manual_signature.py) and [`05_signatures/infer_signature.py`](./examples/05_signatures/infer_signature.py), signatures can be defined manually or inferred automatically (preferred, recommended):\n\n```python\nfrom mlflow.models.signature import ModelSignature, infer_signature\nfrom mlflow.types.schema import Schema, ColSpec\n\n# ...\n\n## -- Manually defined signatures (usually, not recommended)\ninput_data = [\n    {\"name\": \"fixed acidity\", \"type\": \"double\"},\n    {\"name\": \"volatile acidity\", \"type\": \"double\"},\n    {\"name\": \"citric acid\", \"type\": \"double\"},\n    {\"name\": \"residual sugar\", \"type\": \"double\"},\n    {\"name\": \"chlorides\", \"type\": \"double\"},\n    {\"name\": \"free sulfur dioxide\", \"type\": \"double\"},\n    {\"name\": \"total sulfur dioxide\", \"type\": \"double\"},\n    {\"name\": \"density\", \"type\": \"double\"},\n    {\"name\": \"pH\", \"type\": \"double\"},\n    {\"name\": \"sulphates\", \"type\": \"double\"},\n    {\"name\": \"alcohol\", \"type\": \"double\"},\n    {\"name\": \"quality\", \"type\": \"double\"}\n]\n\noutput_data = [{'type': 'long'}]\n\ninput_schema = Schema([ColSpec(col[\"type\"], col['name']) for col in input_data])\noutput_schema = Schema([ColSpec(col['type']) for col in output_data])\nsignature = ModelSignature(inputs=input_schema, outputs=output_schema)\n\ninput_example = {\n    \"fixed acidity\": np.array([7.2, 7.5, 7.0, 6.8, 6.9]),\n    \"volatile acidity\": np.array([0.35, 0.3, 0.28, 0.38, 0.25]),\n    \"citric acid\": np.array([0.45, 0.5, 0.55, 0.4, 0.42]),\n    \"residual sugar\": np.array([8.5, 9.0, 8.2, 7.8, 8.1]),\n    \"chlorides\": np.array([0.045, 0.04, 0.035, 0.05, 0.042]),\n    \"free sulfur dioxide\": np.array([30, 35, 40, 28, 32]),\n    \"total sulfur dioxide\": np.array([120, 125, 130, 115, 110]),\n    \"density\": np.array([0.997, 0.996, 0.995, 0.998, 0.994]),\n    \"pH\": np.array([3.2, 3.1, 3.0, 3.3, 3.2]),\n    \"sulphates\": np.array([0.65, 0.7, 0.68, 0.72, 0.62]),\n    \"alcohol\": np.array([9.2, 9.5, 9.0, 9.8, 9.4]),\n    \"quality\": np.array([6, 7, 6, 8, 7])\n}\n\nmlflow.sklearn.log_model(lr, \"model\", signature=signature, input_example=input_example)\n\n## -- Automatically infered signatures (preferred, recommended)\nsignature = infer_signature(X_test, predicted_qualities)\n\ninput_example = {\n    \"columns\": np.array(X_test.columns),\n    \"data\": np.array(X_test.values)\n}\n\nmlflow.sklearn.log_model(lr, \"model\", signature=signature, input_example=input_example)\n```\n\n### Model API\n\nThese are the library calls to store standardized models or interact with them:\n\n```python\n# Model saved to a passed directory: only two flavors: sklearn and pyfunc\nmlflow.save_model(\n  sk_model, # model\n  path, \n  conda_env, # path to a YAML or a dictionary\n  code_paths, # list of local filesystems paths, i.e. code files used to create the model,\n  mlflow_model, # flavor\n  serialization_format,\n  signature,\n  input_example,\n  pip_requirements, # path or list of requirements as strings; not necessary, these are inferred\n  extra_pip_requirements, # we can leave MLflow to infer and then add some explicitly\n  pyfunc_predict_fn, # name of the prediction function, e.g., 'predict_proba'\n  metadata\n)\n\n# Model logged to a local/remote server, which stores it as configured\n# The main difference is that the servers handles it in the model artifacts (locally or remotely)\n# whereas save_model always stores the model locally.\n# Same parameters as save_model, but some new/different ones\nmlflow.log_model(\n  artifact_path, # path for the artifact\n  registered_model_name, # register the model\n  await_registration_for # wait seconds until saving ready version\n)\n\n# Load both the logged/saved model\n# If the model is registered (see model registry section), we can use the URI models:/\u003cname\u003e/\u003cversion\u003e\nmlflow.load_model(\n  model_uri, # the model URI: /path/to/model, s3://buckect/path/to/model, models:/\u003cname\u003e/\u003cversion\u003e, etc.\n  dst_path # path to download the model to\n)\n```\n\n## 9. Handling Customized Models in MLflow\n\nCustom models and custom flavors adress the use-cases in which:\n\n- The ML library/framework is not supported by MLflow.\n- We need more than the library to use our model, i.e., we have a custom Python model (with our own algorithms and libraries).\n\nNote that:\n\n- Custom models refer to own model libraries.\n- Custom flavors refer to own model serialization methods.\n\n### Example: Custom Python Model - 06_custom_libraries\n\nThis section works on the example files [`06_custom_libraries/model_customization.py`](./examples/06_custom_libraries/model_customization.py) and [`06_custom_libraries/load_custom_model.py`](./examples/06_custom_libraries/load_custom_model.py).\n\nWe assume that MLflow does not support Scikit-Learn, so we are going to create a Python model with it. Notes:\n\n- We cannot use `mlflow.sklearn.log_param/metric()` functions, but instead, `mlflow.log_param/metric()`.\n- We cannot use `mlflow.log_model()`, but instead `mlflow.pyfunc.log_model()`.\n\nThe way we create a custom python model is as follows:\n\n- We dump/store all artifacts locally: dataset splits, models, etc.\n- We save their paths in a dictionary called `artifacts`.\n- We derive and create our own model class, based on `mlflow.pyfunc.PythonModel`.\n- We create a dictionary which contains our conda environment.\n- We log the model passing the last 3 objects to `mlflow.pyfunc.log_model`.\n- Then, (usually in another file/run), we can load the saved model using `mlflow.pyfunc.load_model`.\n\nThese are the key parts in [`06_custom_libraries/model_customization.py`](./examples/06_custom_libraries/model_customization.py) and [`06_custom_libraries/load_custom_model.py`](./examples/06_custom_libraries/load_custom_model.py):\n\n```python\n# ...\n\n# Data artifacts\ndata = pd.read_csv(\"../data/red-wine-quality.csv\")\ntrain, test = train_test_split(data)\ndata_dir = 'data'\nPath(data_dir).mkdir(parents=True, exist_ok=True)\ndata.to_csv(data_dir + '/dataset.csv')\ntrain.to_csv(data_dir + '/dataset_train.csv')\ntest.to_csv(data_dir + '/dataset_test.csv')\n\n# Model artifact: we serialize the model with joblib\nmodel_dir = 'models'\nPath(model_dir).mkdir(parents=True, exist_ok=True)\nmodel_path = model_dir + \"/model.pkl\"\njoblib.dump(lr, model_path)\n\n# Artifacts' paths: model and data\n# This dictionary is fetsched later by the mlflow context\nartifacts = {\n    \"model\": model_path,\n    \"data\": data_dir\n}\n\n# We create a wrapper class, i.e.,\n# a custom mlflow.pyfunc.PythonModel\n#   https://mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#mlflow.pyfunc.PythonModel\n# We need to define at least two functions:\n# - load_context\n# - predict\n# We can also define further custom functions if we want\nclass ModelWrapper(mlflow.pyfunc.PythonModel):\n    def load_context(self, context):\n        self.model = joblib.load(context.artifacts[\"model\"])\n\n    def predict(self, context, model_input):\n        return self.model.predict(model_input.values)\n\n# Conda environment\nconda_env = {\n    \"channels\": [\"conda-forge\"],\n    \"dependencies\": [\n        f\"python={sys.version}\", # Python version\n        \"pip\",\n        {\n            \"pip\": [\n                f\"mlflow=={mlflow.__version__}\",\n                f\"scikit-learn=={sklearn.__version__}\",\n                f\"cloudpickle=={cloudpickle.__version__}\",\n            ],\n        },\n    ],\n    \"name\": \"my_env\",\n}\n\n# Log model with all the structures defined above\n# We'll see all the artifacts in the UI: data, models, code, etc.\nmlflow.pyfunc.log_model(\n    artifact_path=\"custom_mlflow_pyfunc\", # the path directory which will contain the model\n    python_model=ModelWrapper(), # a mlflow.pyfunc.PythonModel, defined above\n    artifacts=artifacts, # dictionary defined above\n    code_path=[str(__file__)], # Code file(s), must be in local dir: \"model_customization.py\"\n    conda_env=conda_env\n)\n\n# Usually, we would load the model in another file/session, not in the same run,\n# however, here we do it in the same run.\n# To load the model, we need to pass the model_uri, which can have many forms\n#   https://mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#mlflow.pyfunc.load_model\n# One option:\n#   runs:/\u003cmlflow_run_id\u003e/run-relative/path/to/model, e.g., runs:/98dgxxx/custom_mlflow_pyfunc\n# Usually, we'll get the run_id we want from the UI/DB, etc.; if it's the active run, we can fetch it\nactive_run = mlflow.active_run()\nrun_id = active_run.info.run_id\nloaded_model = mlflow.pyfunc.load_model(model_uri=f\"runs:/{run_id}/{model_artifact_path}\")\n\n# Predict\npredicted_qualities = loaded_model.predict(test_x)\n\n# Evaluate\n(rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)\n```\n\nWhen we run the script and visualize the run in the UI, we can see the following artifacts:\n\n![Artifacts of the Custom Model](./assets/mlflow_custom_model.jpg)\n\nMore information: [MLflow: Creating custom Pyfunc models](https://mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#creating-custom-pyfunc-models).\n\n### Custom Flavors\n\nCustom flavors adress the situation in which we want to have custom serialization methods.\n\nHowever, usually, that's an advanced topic which requires extending MLflow, and we are not going to need it very often.\n\nOfficial docutmentation with example implementation: [Custom Flavors](https://mlflow.org/docs/latest/models.html#custom-flavors).\n\nNecessary steps:\n\n- Serialization and deserialization logic need to be defined.\n- Create a flavor directory structure.\n- Register clustom flavor.\n- Define flavor-specific tools.\n\nIn practice, custom `save_model` and `load_model` functions are implemented (amongst more other) following some standardized specifications.\n\n## 10. MLflow Model Evaluation\n\nMLflow provides evaluation functinalities for MLflow packaged models, i.e., we don't need to evaluate the models using other tools. The advantage is that we get\n\n- performance metrics\n- plots\n- explanations (feature importance, SHAP)\n  \nwith few coding lines, and all those eveluation data are logged. Note: **it works with the python_function (pyfunc) flavor**.\n\nOfficial documentation:\n\n- [Model Evaluation](https://mlflow.org/docs/latest/models.html#model-evaluation).\n- [`mlflow.evaluate()`](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.evaluate).\n\nThe `mlflow.evaluate()` function has the following parameters:\n\n```python\nmlflow.evaluate(\n  model=None, # mlflow.pyfunc.PythonModel or model URI\n  data=None, # evaluation data: np.ndarray, pd.DataFrame, PySpark DF, mlflow.data.dataset.Dataset\n  model_type=None, # 'regressor', 'classifier', 'question-answering', 'text', 'text-summarization'\n  targets=None, # list of evaluation labels\n  dataset_path=None, # path where data is stored\n  feature_names=None, # np.ndarray, pd.DataFrame, PySpark DF\n  evaluators=None, # list of evaluator names, e.g., 'defaults'; all used by default - get all with mlflow.models.list_evaluators()\n  evaluator_config=None, # config dict for evaluators: log_model_explainability, explainability_nsmaples, etc.\n  custom_metrics=None, # list custom defined EvaluationMetric objects\n  custom_artifacts=None, # list of custom artifact functions: dict-\u003eJSON, pd.DataFrame-\u003eCSV\n  validation_thresholds=None, # dictionary with custom thresholds for metrics\n  baseline_model=None, # baseline model to compare against\n  env_manager='local', # env manager to load models in isolated Python envs: 'local' (current env), 'virtualenv' (recommended), 'conda'\n  # More:\n  predictions=None, extra_metrics=None,  model_config=None, baseline_config=None, inference_params=None\n)\n```\n\nThe `'default'` evaluator uses `shap` and we need to manually `pip install shap`. \n\n### Example: Evaluation of a Python Model - 07_evaluation\n\nAn evaluation example is given in [`07_evaluation/evaluate.py`](./examples/07_evaluation/evaluate.py); the `mlflow.evaluate()` call is summarized here:\n\n```python\n# ...\n\n# Log model with all the structures defined above\n# We'll see all the artifacts in the UI: data, models, code, etc.\nmodel_artifact_path = \"custom_mlflow_pyfunc\"\nmlflow.pyfunc.log_model(\n    artifact_path=model_artifact_path, # the path directory which will contain the model\n    python_model=ModelWrapper(), # a mlflow.pyfunc.PythonModel, defined above\n    artifacts=artifacts, # dictionary defined above\n    code_path=[str(__file__)], # Code file(s), must be in local dir: \"model_customization.py\"\n    conda_env=conda_env\n)\n\n# Get model URI and evaluate\n# NOTE: the default evaluator uses shap -\u003e we need to manually pip install shap\nmodel_artifact_uri = mlflow.get_artifact_uri(model_artifact_path)\nmlflow.evaluate(\n    model_artifact_uri, # model URI\n    test, # test split\n    targets=\"quality\",\n    model_type=\"regressor\",\n    evaluators=[\"default\"] # if default, shap is used -\u003e pip install shap\n)\n```\n\nAfter running the evaluation, in the artifacts, we get the SHAP plots as well as the explainer model used.\n\n![SHAP Summary Plot](./assets/shap_summary_plot.png)\n\n\n### Example: Custom Evaluation Metrics and Artifacts - 07_evaluation\n\nWe can create custom evaluation metrics and evaluation artifacts.\nTo that end:\n\n- We create metric computation functions passed to `make_metric`, which creates metric measurement objects.\n- Similarly, we define metric artifact computation functions (e.g., plots).\n- We pass all of that to `mlflow.evaluate()` in its parameters.\n\nAs a result, we will get additional metrics in the DB or extra plots in the artifacts.\n\nIn the following, the most important lines from the example in [`07_evaluation/custom_metrics.py`](./examples/07_evaluation/custom_metrics.py):\n\n```python\nfrom mlflow.models import make_metric\n\n# ...\n\n# Log model with all the structures defined above\n# We'll see all the artifacts in the UI: data, models, code, etc.\nmodel_artifact_path = \"custom_mlflow_pyfunc\"\nmlflow.pyfunc.log_model(\n    artifact_path=model_artifact_path, # the path directory which will contain the model\n    python_model=ModelWrapper(), # a mlflow.pyfunc.PythonModel, defined above\n    artifacts=artifacts, # dictionary defined above\n    code_path=[str(__file__)], # Code file(s), must be in local dir: \"model_customization.py\"\n    conda_env=conda_env\n)\n\n# Custom metrics are created passing a custom defined function\n# to make_metric. We pass to each custom defined function these parameters (fix name):\n# - eval_df: the data\n# - builtin_metric: a dictionry with the built in metrics from mlflow\n# Note: If the args are not used, precede with _: _builtin_metrics, _eval_df\n# else: builtin_metrics, eval_df\ndef squared_diff_plus_one(eval_df, _builtin_metrics):\n    return np.sum(np.abs(eval_df[\"prediction\"] - eval_df[\"target\"] + 1) ** 2)\n\ndef sum_on_target_divided_by_two(_eval_df, builtin_metrics):\n    return builtin_metrics[\"sum_on_target\"] / 2\n\n# In the following we create the metric objects via make_metric\n# and passing the defined functions\nsquared_diff_plus_one_metric = make_metric(\n    eval_fn=squared_diff_plus_one,\n    greater_is_better=False, # low metric value is better\n    name=\"squared diff plus one\"\n)\n\nsum_on_target_divided_by_two_metric = make_metric(\n    eval_fn=sum_on_target_divided_by_two,\n    greater_is_better=True,\n    name=\"sum on target divided by two\"\n)\n\n# We can also create custom artifacts.\n# To that point, we simply define the function\n# which creates the artifact.\n# Parameters:\n# - eval_df, _eval_df\n# - builtin_metrics, _builtin_metrics\n# - artifacts_dir, _artifacts_dir\ndef prediction_target_scatter(eval_df, _builtin_metrics, artifacts_dir):\n    plt.scatter(eval_df[\"prediction\"], eval_df[\"target\"])\n    plt.xlabel(\"Targets\")\n    plt.ylabel(\"Predictions\")\n    plt.title(\"Targets vs. Predictions\")\n    plot_path = os.path.join(artifacts_dir, \"example_scatter_plot.png\")\n    plt.savefig(plot_path)\n    return {\"example_scatter_plot_artifact\": plot_path}\n\n# Now, we run the evaluation wuth custom metrics and artifacts\nartifacts_uri = mlflow.get_artifact_uri(model_artifact_path)\nmlflow.evaluate(\n    artifacts_uri,\n    test,\n    targets=\"quality\",\n    model_type=\"regressor\",\n    evaluators=[\"default\"],\n    # Custom metric objects\n    custom_metrics=[\n        squared_diff_plus_one_metric,\n        sum_on_target_divided_by_two_metric\n    ],\n    # Custom artifact computation functions\n    custom_artifacts=[prediction_target_scatter]\n)\n```\n\n### Example: Evaluation against Baseline - 07_evaluation\n\nWe can define a baseline model and compare against it in the `mlflow.evaluate()` call by checking some thresholds. The baseline model needs to be passed to `mlflow.evaluate()` along with its related artifacts, as well as a `thresholds` dictionary.\n\nAs a result, we will get additional models and artifacts (baseline).\n\nIn the following, the most important lines from the example in [`07_evaluation/validation_threshold.py`](./examples/07_evaluation/validation_threshold.py):\n\n```python\nfrom mlflow.models import make_metric\nfrom sklearn.dummy import DummyRegressor\nfrom mlflow.models import MetricThreshold\n\n# ...\n\n# Model artifact: we serialize the model with joblib\nmodel_dir = 'models'\nPath(model_dir).mkdir(parents=True, exist_ok=True)\nmodel_path = model_dir + \"/model.pkl\"\njoblib.dump(lr, model_path)\n\n# Artifacts' paths: model and data\n# This dictionary is fetsched later by the mlflow context\nartifacts = {\n    \"model\": model_path,\n    \"data\": data_dir\n}\n\n# Save baseline model and an artifacts dictionary\nbaseline_model_path = model_dir + \"/baseline_model.pkl\"\njoblib.dump(baseline_model, baseline_model_path)\nbaseline_artifacts = {\n    \"baseline_model\": baseline_model_path\n}\n\n# We create a wrapper class, i.e.,\n# a custom mlflow.pyfunc.PythonModel\n#   https://mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#mlflow.pyfunc.PythonModel\n# We need to define at least two functions:\n# - load_context\n# - predict\n# We can also define further custom functions if we want\nclass ModelWrapper(mlflow.pyfunc.PythonModel):\n    def __init__(self, artifacts_name):\n        # We use the artifacts_name in order to handle both the baseline \u0026 the custom model\n        self.artifacts_name = artifacts_name\n\n    def load_context(self, context):\n        self.model = joblib.load(context.artifacts[self.artifacts_name])\n\n    def predict(self, context, model_input):\n        return self.model.predict(model_input.values)\n\n# Conda environment\nconda_env = {\n    \"channels\": [\"conda-forge\"],\n    \"dependencies\": [\n        f\"python={sys.version}\", # Python version\n        \"pip\",\n        {\n            \"pip\": [\n                f\"mlflow=={mlflow.__version__}\",\n                f\"scikit-learn=={sklearn.__version__}\",\n                f\"cloudpickle=={cloudpickle.__version__}\",\n            ],\n        },\n    ],\n    \"name\": \"my_env\",\n}\n\n# Log model with all the structures defined above\n# We'll see all the artifacts in the UI: data, models, code, etc.\nmlflow.pyfunc.log_model(\n    artifact_path=\"custom_mlflow_pyfunc\", # the path directory which will contain the model\n    python_model=ModelWrapper(\"model\"), # a mlflow.pyfunc.PythonModel, defined above\n    artifacts=artifacts, # dictionary defined above\n    code_path=[str(__file__)], # Code file(s), must be in local dir: \"model_customization.py\"\n    conda_env=conda_env\n)\n\n# Baseline model\nmlflow.pyfunc.log_model(\n    artifact_path=\"baseline_mlflow_pyfunc\", # the path directory which will contain the model\n    python_model=ModelWrapper(\"baseline_model\"), # a mlflow.pyfunc.PythonModel, defined above\n    artifacts=baseline_artifacts, # dictionary defined above\n    code_path=[str(__file__)], # Code file(s), must be in local dir: \"model_customization.py\"\n    conda_env=conda_env\n)\n\ndef squared_diff_plus_one(eval_df, _builtin_metrics):\n    return np.sum(np.abs(eval_df[\"prediction\"] - eval_df[\"target\"] + 1) ** 2)\n\ndef sum_on_target_divided_by_two(_eval_df, builtin_metrics):\n    return builtin_metrics[\"sum_on_target\"] / 2\n\nsquared_diff_plus_one_metric = make_metric(\n    eval_fn=squared_diff_plus_one,\n    greater_is_better=False,\n    name=\"squared diff plus one\"\n)\n\nsum_on_target_divided_by_two_metric = make_metric(\n    eval_fn=sum_on_target_divided_by_two,\n    greater_is_better=True,\n    name=\"sum on target divided by two\"\n)\n\ndef prediction_target_scatter(eval_df, _builtin_metrics, artifacts_dir):\n    plt.scatter(eval_df[\"prediction\"], eval_df[\"target\"])\n    plt.xlabel(\"Targets\")\n    plt.ylabel(\"Predictions\")\n    plt.title(\"Targets vs. Predictions\")\n    plot_path = os.path.join(artifacts_dir, \"example_scatter_plot.png\")\n    plt.savefig(plot_path)\n    return {\"example_scatter_plot_artifact\": plot_path}\n\nmodel_artifact_uri = mlflow.get_artifact_uri(\"custom_mlflow_pyfunc\")\n\n# After training and logging both the baseline and the (custom) model\n# We define a thresholds dictionary which \nthresholds = {\n    \"mean_squared_error\": MetricThreshold(\n        threshold=0.6,  # Maximum MSE threshold to accept the model, so we require MSE \u003c 0.6\n        min_absolute_change=0.1,  # Minimum absolute improvement compared to baseline\n        min_relative_change=0.05,  # Minimum relative improvement compared to baseline\n        greater_is_better=False  # Lower MSE is better\n    )\n}\n\nbaseline_model_artifact_uri = mlflow.get_artifact_uri(\"baseline_mlflow_pyfunc\")\n\nmlflow.evaluate(\n    model_artifact_uri,\n    test,\n    targets=\"quality\",\n    model_type=\"regressor\",\n    evaluators=[\"default\"],\n    custom_metrics=[\n        squared_diff_plus_one_metric,\n        sum_on_target_divided_by_two_metric\n    ],\n    custom_artifacts=[prediction_target_scatter],\n    validation_thresholds=thresholds,\n    baseline_model=baseline_model_artifact_uri\n)\n```\n\n## 11. MLflow Registry Component\n\nA model registry is a central database where model versions are stored along with their metadata; additionally, we have a UI and APIs to the registry.\n\nThe model artifacts stay where they are after logged; only the reference to it is stored, along with the metadata. The registered models can be see in the **Models** menu (horizontal menu).\n\nPre-requisites:\n\n- Start a server `mlflow server ...` and then `mlflow.set_tracking_uri()` in the code. In my tests, if I start the UI with `mlflow ui` it also works by using `uri=\"http://127.0.0.1:5000/\"`; however, note that depending on where/how we start the server, the `mlruns` folder is placed in different locations...\n- Log the model.\n\n### Registering via UI\n\nWe have several options to register a model:\n\n- After we have logged the model, in the UI: Select experiment, Artifacts, Select model, Click on **Register** (right): New model, write name; the first time we need to write a model name. The next times, if the same model, we choose its name, else we insert a new name. If we register a new model with the same name, its version will be changed.\n- In the `log_model()`, if we pass the parameter `registered_model_name`.\n- By calling `register_model()`.\n\nIn the **Models** menu (horizontal menu), we all the regstered model versions:\n\n- We can/should add descriptions.\n- We should add tags: which are production ready? was a specific data slice used?\n- We can can tags and descriptions at model and version levels: that's important!\n\nIn older MLflow versions, a model could be in 4 stages or phases:\n\n- None\n- Staging: candidate for production, we might want to compare it with other candidates.\n- Production: model ready for production, we can deploy it.\n- Archive: model taken out from production, not usable anymore; however, it still remains in the registry.\n\nNow, those stage values are deprecated; instead, we can use:\n\n- Tags: we can manually tag model versions to be `staging, production, archive`.\n- Aliases: named references for particular model versions; for example, setting a **champion** alias on a model version enables you to fetch the model version by that alias via the `get_model_version_by_alias()` client API or the model URI `models:/\u003cregistered model name\u003e@champion`.\n\n### Registering via API - 08_registry\n\nAs mentioned, we can register a model in the code with two functions:\n\n- In the `log_model()`, if we pass the parameter `registered_model_name`.\n- By calling `register_model()`.\n\nIn the example [`08_registry/registry_log_model.py`](./examples/08_registry/registry_log_model.py) the first approach is used. Here are the most important lines:\n\n```python\n# ...\n\n# Connect to the tracking server\n# Make sure a server is started with the given URI: mlflow server ...\nmlflow.set_tracking_uri(uri=\"http://127.0.0.1:5000/\")\n\n# ...\n\n# The first time, a version 1 is going to be created, then 2, etc.\n# We could in theory log a model which has been trained outside from a run\nmlflow.sklearn.log_model(\n  model,\n  \"model\",\n  registered_model_name='elastcinet-api'\n)\n```\n\nThe function [`resgister_model()`](https://www.mlflow.org/docs/latest/python_api/mlflow.html?highlight=register_model#mlflow.register_model) has the following parameters:\n\n```python\nmlflow.register_model(\n  model_uri, # URI or path\n  name,\n  await_registration_for, # Number of seconds to wait to create ready version\n  tags # dictionary of key-value pairs\n)\n```\n\nIn the example [`08_registry/registry_register_model.py`](./examples/08_registry/registry_register_model.py) we can see how `register_model()` is used; here are the most important lines:\n\n```python\n# ...\n\n# Connect to the tracking server\n# Make sure a server is started with the given URI: mlflow server ...\nmlflow.set_tracking_uri(uri=\"http://127.0.0.1:5000/\")\n\n# ...\n\n# We still need to log themodel, but wince we use register_model\n# here no registered_model_name is passed!\nmlflow.sklearn.log_model(lr, \"model\")\n\n# The model_uri can be a path or a URI, e.g., runs:/...\n# but no models:/ URIs are accepted currently\n# As with registered_model_name, the version is automatically increased\nmlflow.register_model(\n    model_uri=f'runs:/{run.info.run_id}/model',\n    name='elastic-api-2',\n    tags={'stage': 'staging', 'preprocessing': 'v3'}\n)\n\n# We can load the registered model\n# by using an URI like models:/\u003cmodel_name\u003e/\u003cversion\u003e\nld = mlflow.pyfunc.load_model(model_uri=\"models:/elastic-api-2/1\")\npredicted_qualities=ld.predict(test_x)\n```\n\n## 12. MLflow Project Component\n\nMLflow Projects is a component which allows to organize and share our code easily.\n\nInteresting links:\n\n- I have another guide in [MLops Udacity - Reproducible Pipelines](https://github.com/mxagar/mlops_udacity/blob/main/02_Reproducible_Pipelines/MLOpsND_ReproduciblePipelines.md).\n- [The official guide](https://www.mlflow.org/docs/latest/projects.html).\n\nMLflow Projects works with a `MLproject` YAML file placed in the project folder; this configuration file contains information about\n\n- the name of the package/project module,\n- the environment with the dependencies,\n- and the entry points, with their parameters.\n\nHere is an example:\n\n```yaml\nname: \"Elastice Regression project\"\nconda_env: conda.yaml\n\nentry_points:\n  main:\n    command: \"python main.py --alpha={alpha} --l1_ratio={l1_ratio}\"\n    parameters:\n      alpha:\n        type: float\n        default: 0.4\n\n      l1_ratio:\n        type: float\n        default: 0.4\n\n```\n\nWe can **run the project/package in the CLI** as follows:\n\n```bash\ncd ... # we go to the folder where MLproject is\nmlflow run -P alpha=0.3 -P l1_ratio=0.3 .\n```\n\nThe [**environment can be specified**](https://www.mlflow.org/docs/latest/projects.html?highlight=mlproject#mlproject-specify-environment) in several ways:\n\n```yaml\n# Virtualenv (preferred by MLflow)\npython_env: files/config/python_env.yaml\n\n# Conda: conda env export --name \u003cenv_name\u003e \u003e conda.yaml\n# HOWEVER: note that the conda file should be generic for all platforms\n# so sometimes we need to remove some build numbers...\nconda_env: files/config/conda.yaml\n\n# Docker image: we can use a prebuilt image\n# or build one with --build-image\n# Environment variables like MLFLOW_TRACKING_URI are propagated (-e)\n# and the host tracking folder is mounted as a volume (-v)\n# We can also set volumes and env variables (copied or defined)\ndocker_env:\n  image: mlflow-docker-example-environment:1.0 # pre-built\n  # image: python:3.10 # to build with `mlflow run ... --build-image`\n  # image: 012345678910.dkr.ecr.us-west-2.amazonaws.com/mlflow-docker-example-environment:7.0 # fetch from registry\n  volumes: [\"/local/path:/container/mount/path\"]\n  environment: [[\"NEW_ENV_VAR\", \"new_var_value\"], \"VAR_TO_COPY_FROM_HOST_ENVIRONMENT\"]\n```\n\nAn `MLproject` can contain several **entry points**, which are basically points from which the execution of the package starts. Entry points have:\n\n- A name; if one, entry point, usually it's called main.\n- One command, which contains placeholders replaced by the parameter values. The command is the execution call of a script, which should usually parse arguments, i.e., parameter values.\n- The parameters replaced in the commad; they consist of a name, a default value and a type (4 possible: string, float, path, URI); types are validated.\n- An optional environment (e.g., conda), specific for the entry point command execution.\n\nWe can run the project entry points in two ways:\n\n- Via CLI: `mlflow run [OPTIONS] URI`\n  - `URI` can be a local path (e.g., `.`) or a URI to a remote machine, a git repository, etc.\n  - The `OPTIONS` depend on how we run the project (locally, remotely, etc.); see list below.\n- Or within a python module/code: `mlflow.projects.run()`\n\n### CLI Options and Environment Variables\n\nSome `OPTIONS`:\n\n```bash\n# mlflow run -e \u003cmy_entry_point\u003e \u003curi\u003e\n-e, --entry-point \u003cNAME\u003e\n\n# mlflow run -v \u003cabsc123\u003e \u003curi\u003e\n-v, --version \u003cVERSION\u003e\n\n# mlflow run -P \u003cparam1=value1\u003e -P \u003cparam2=value2\u003e \u003curi\u003e\n-P, --param-list \u003cNAME=VALUE\u003e\n\n# mlflow run -A \u003cparam1=value1\u003e \u003curi\u003e\n-A, --docker-args \u003cNAME=VALUE\u003e\n\n# Specific experiement name\n--experiment-name \u003cNAME\u003e\n\n# Specific experiment ID\n--experiment-id \u003cID\u003e\n\n# Specify backend: local, databricks, kubernetes\n-b, --backend \u003cNAME\u003e\n\n# Backend config file\n-c, --backend-config \u003cFILE\u003e\n\n# Specific environment manager: local, virtualenv, conda\n--env-manager \u003cNAME\u003e\n\n# Only valid for local backend\n--storage-dir \u003cDIR\u003e\n\n# Specify run ID\n--run-id \u003cID\u003e\n\n# Specify run name\n--run-name \u003cNAME\u003e\n\n# Build a new docker image\n--build-image\n```\n\nIn addition to the options, we also have important environment variables which can be set; if set, their values are used acordingly by default:\n\n```bash\nMLFLOW_TRACKING_URI\nMLFLOW_EXPERIMENT_NAME\nMLFLOW_EXPERIMENT_ID\nMLFLOW_TMP_DIR # --storage-dir = storage folder for local backend\n```\n\n### Example: Running a Project with the CLI - 09_projects\n\nThe file [`09_projects/main.py`](./examples/09_projects/main.py) can be run without the `MLproject` tool as follows:\n\n```bash\nconda activate mlflow\ncd ...\npython ./main.py --alpha 0.3 --l1_ratio 0.3\n```\n\nHowever, we can use the co-located [`09_projects/MLproject`](./examples/09_projects/MLproject) and run it using `mlflow`:\n\n```bash\nconda activate mlflow\ncd ...\n# Since the only entry point is main, we don't need to specify it (because main is the default)\n# We could try further options, e.g., --experiment-name\nmlflow run -P alpha=0.3 -P l1_ratio=0.3 .\n# The environment will be installed\n# The script from the main entrypoint will be run\n# Advantage wrt. simply running the script: we can run remote scripts\n```\n\nThe main difference is that now we create a specific environment only for running the project/package/module. Additionally, we could run remote modules.\n\n### Example: Running a Project with the Python API - 09_projects\n\nThe file [`09_projects/run.py`](./examples/09_projects/run.py) shows how to execute the `mlflow run` froma Python script:\n\n```python\nimport mlflow\n\nparameters={\n    \"alpha\":0.3,\n    \"l1_ratio\":0.3\n}\n\nexperiment_name = \"Project exp 1\"\nentry_point = \"main\"\n\nmlflow.projects.run(\n    uri=\".\",\n    entry_point=entry_point,\n    parameters=parameters,\n    experiment_name=experiment_name\n)\n```\n\nTo use it:\n\n```bash\nconda activate mlflow\ncd ...\npython run.py\n```\n\n### More Advanced Project Setups\n\nVisit my notes: [MLops Udacity - Reproducible Pipelines](https://github.com/mxagar/mlops_udacity/blob/main/02_Reproducible_Pipelines/MLOpsND_ReproduciblePipelines.md). While the current guide focuses on tracking and model handling, the Udacity notes focus more on how the project pipelines can be built using MLflow. Among others, sophisticated pipelines can be defined so that several components/modules are run one after the other, each storing artifacts used by the one that come later.\n\n## 13. MLflow Client\n\nThe MLflow client is basically the counterpart of the tracking server, i.e., it's the application where we use the API library which communicates with the server. Additionally, MLflow provides a class [`MlflowClient`](https://www.mlflow.org/docs/latest/python_api/mlflow.client.html?highlight=client#module-mlflow.client), which facilitates\n\n- Experiment management\n- Run management and tracking\n- Model versioning and management\n\nHowever, `MlflowClient` does not replace the MLflow library, but it provides extra functionalities to handle the tracking server object. **It provides some of the functionalities of the UI, but via code**.\n\nThe file [`10_client/client_management.py`](./examples/10_client/client_management.py) shows the most important calls to manage MLflow objects via the Python API using `mlflow.client.MlflowClient`:\n\n- Experiments: creating, adding tags, renaming, getting and searching experiments, deleting, restoring.\n- Runs: creating, renaming, settng status getting and searching runs, deleting, restoring.\n- Logging/extracting parameters, metrics and artifacts via the client.\n- Creating and registering model versions, setting tags, searching and getting models, deleting.\n\nThe content of the script:\n\n```python\nimport pandas as pd\nimport numpy as np\nfrom sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.linear_model import ElasticNet\n\nimport mlflow\nfrom mlflow import MlflowClient\nfrom mlflow.entities import ViewType\n\n# We need to start the server in another Termnial first: mlflow server\nmlflow.set_tracking_uri(\"http://127.0.0.1:5000\")\n\n# Out client object\nclient = MlflowClient()\n\n# Create experiment: this can be executed only once\n# The id is a string\nNUM = 21\nexp_name = \"Experiment with client \" + str(NUM)\nexperiment_id = client.create_experiment(\n    name = exp_name,\n    tags = {\"version\": \"v1\", \"priority\": \"P1\"}\n)\n# Delete and restore an experiment\nclient.delete_experiment(experiment_id=experiment_id)\nclient.restore_experiment(experiment_id=experiment_id)\n\nprint(f'Experiment id: {experiment_id}')\n\n# Set a tag\nclient.set_experiment_tag(experiment_id=experiment_id, key=\"framework\", value=\"sklearn\")\n\n# Get experiment by name or id\nexperiment = client.get_experiment_by_name(name=\"Test\") # None returned if doesn't exist\nif experiment is None:\n    experiment = client.get_experiment(experiment_id=experiment_id)\n\n# Rename experiment\nclient.rename_experiment(experiment_id=experiment_id, new_name=\"NEW \"+exp_name)\n\n# Get properties from experiment\nprint(f'Experiment name: {experiment.name}')\nprint(f'Experiment id: {experiment.experiment_id}')\nprint(f'Experiment artifact location: {experiment.artifact_location}')\nprint(f'Experiment tags: {experiment.tags}')\nprint(f'Experiment lifecycle stage: {experiment.lifecycle_stage}')\n\n# Search experiments\n# https://www.mlflow.org/docs/latest/python_api/mlflow.client.html#mlflow.client.MlflowClient.search_experiments\nexperiments = client.search_experiments(\n    view_type = ViewType.ALL, # ACTIVE_ONLY, DELETED_ONLY, ALL\n    filter_string = \"tags.`version` = 'v1' AND tags.`framework` = 'sklearn'\",\n    order_by = [\"experiment_id ASC\"],\n    #max_results, # maximum number of experiments we want\n)\n\n# Loop and print experiments\nfor exp in experiments:\n    print(f\"Experiment Name: {exp.name}, Experiment ID: {exp.experiment_id}\")\n\n# This function simply creates a run object\n# and doesn't run any code;\n# so it's different from mlflow.start_run() or mlflow.projects.run()\n# Use-case: we want to prepare the run object, but not run any ML code yet\n# The status of the run is UNFINISHED;\n# we need to change it when finished with set_terminated or update_run\nrun = client.create_run(\n    experiment_id = experiment_id,\n    tags = {\n        \"Version\": \"v1\",\n        \"Priority\": \"P1\"\n    },\n    run_name = \"run from client\",\n    #start_time # when we'd like to start the run, if not provided, now\n)\n\n# The created run is not active,\n# so we cannot log data without explicitly activating it or passing its id to the log function\n# We can log via the client object\nclient.log_param(\n    run_id=run.info.run_id,\n    key=\"alpha\",\n    value=0.3\n)\nclient.log_metric(\n    run_id=run.info.run_id,\n    key=\"r2\",\n    value=0.9\n)\n\n# Similarly, we have:\n# - client.log_artifact()\n# - client.log_image()\n# ...\nclient.log_artifact(\n    run_id=run.info.run_id,\n    local_path=\"../data/red-wine-quality.csv\"\n)\n\n# Get a run object by its id\nrun = client.get_run(run_id=run.info.run_id)\n\n# Run properties\nprint(f\"Run tags: {run.data.tags}\")\nprint(f\"Experiment id: {run.info.experiment_id}\")\nprint(f\"Run id: {run.info.run_id}\")\nprint(f\"Run name: {run.info.run_name}\")\nprint(f\"Lifecycle stage: {run.info.lifecycle_stage}\")\nprint(f\"Status: {run.info.status}\")\n\n# Extract metrics\n# NOTE: metrics always have a step and a timestamp, because they are thought for DL apps\n# in which we'd like to save a metric for every batch or epoch\n# Thus, even though here we have saved only one element for r2, we still get a list\n# with one element, and the element is an mlflow.entities.Metric object\n# https://www.mlflow.org/docs/latest/python_api/mlflow.entities.html#mlflow.entities.Metric\nmetric_list = client.get_metric_history(\n    run_id=run.info.run_id,\n    key=\"r2\"\n)\n\nfor metric in metric_list:\n    print(f\"Step: {metric.step}, Timestamp: {metric.timestamp}, Value: {metric.value}\")\n\n# Extract info of artifacts\nartifacts = client.list_artifacts(run_id=run.info.run_id)\n\nfor artifact in artifacts:\n    print(f\"Artifact: {artifact.path}\")\n    print(f\"Size: {artifact.file_size}\")\n\n# Change run status to FINISHED\n# We can also use client.update_run(), but that call also can change the name of the run\nclient.set_terminated(\n    run_id=run.info.run_id,\n    status=\"FINISHED\" # 'RUNNING', 'SCHEDULED', 'FINISHED', 'FAILED', 'KILLED'\n)\n\nrun = client.get_run(run_id=run.info.run_id)\nprint(f\"Lifecycle stage: {run.info.lifecycle_stage}\")\nprint(f\"Status: {run.info.status}\")\n\n# We can delete a run: active -\u003e deleted\n# and we can also restore it\nclient.delete_run(run_id=run.info.run_id)\n\nrun = client.get_run(run_id=run.info.run_id)\nprint(f\"Lifecycle stage: {run.info.lifecycle_stage}\")\nprint(f\"Status: {run.info.status}\")\n\nclient.restore_run(run_id=run.info.run_id)\n\nrun = client.get_run(run_id=run.info.run_id)\nprint(f\"Lifecycle stage: {run.info.lifecycle_stage}\")\nprint(f\"Status: {run.info.status}\")\n\n# Freeze run_id\nrun_id = run.info.run_id\n\n# We can search for runs\nruns = client.search_runs(\n    experiment_ids=[\"6\", \"10\", \"12\", \"14\", \"22\"],\n    run_view_type=ViewType.ACTIVE_ONLY,\n    order_by=[\"run_id ASC\"],\n    filter_string=\"run_name = 'Mlflow Client Run'\"\n)\n\nfor run in runs:\n    print(f\"Run name: {run.info.run_name}, Run ID: {run.info.run_id}\")\n\n# Get previous run again\nrun = client.get_run(run_id=run_id)\nprint(f\"Lifecycle stage: {run.info.lifecycle_stage}\")\nprint(f\"Status: {run.info.status}\")\n\n# Create a registered model\n# BUT, this only creates an entry in the Models tab, there is no model yet!\nclient.create_registered_model(\n    name=\"lr-model\"+\"-\"+run_id[:3], # We should not add the run_id start, this is only for my tests...\n    # These tags are set at models level, not version level\n    tags={\n        \"framework\": \"sklearn\",\n        \"model\": \"ElasticNet\"\n    },\n    description=\"Elastic Net model trained on red wine quality dataset\"\n)\n\n# To work with a model, we need to create and log one first\n# Then we add it to the model registry using create_model_version()\ndata = pd.read_csv(\"../data/red-wine-quality.csv\")\ntrain, test = train_test_split(data)\ntrain_x = train.drop([\"quality\"], axis=1)\ntrain_y = train[[\"quality\"]]\nlr = ElasticNet(alpha=0.35, l1_ratio=0.3, random_state=42)\nlr.fit(train_x, train_y)\nartifact_path = \"model\"\nmlflow.sklearn.log_model(sk_model=lr, artifact_path=artifact_path)\n\n# To add a model, we do it with create_model_version()\nclient.create_model_version(\n    name=\"lr-model\"+\"-\"+run_id[:3], # We should not add the run_id start, this is only for my tests...\n    source=f\"runs:/{run_id}/{artifact_path}\",\n    # These tags are set at models level, not version level\n    tags={\n        \"framework\": \"sklearn\",\n        \"hyperparameters\": \"alpha and l1_ratio\"\n    },\n    description=\"A second linear regression model trained with alpha and l1_ratio prameters.\"\n)\n\n# Set tags at version level\n# An alternative: update_model_version()\nclient.set_model_version_tag(\n    name=\"lr-model\"+\"-\"+run_id[:3], # We should not add the run_id start, this is only for my tests...\n    version=\"1\",\n    key=\"framework\",\n    value=\"sklearn\"\n)\n\n# We can get model versions with\n# - get_latest_version()\n# - get_model_version()\n# - get_model_version_by_alias()\nmv = client.get_model_version(\n    name=\"lr-model\"+\"-\"+run_id[:3], # We should not add the run_id start, this is only for my tests...\n    version=\"1\"\n)\n\nprint(\"Name:\", mv.name)\nprint(\"Version:\", mv.version)\nprint(\"Tags:\", mv.tags)\nprint(\"Description:\", mv.description)\nprint(\"Stage:\", mv.current_stage)\n\n# We can also SEARCH for model versions\nmvs = client.search_model_versions(\n    filter_string=\"tags.framework = 'sklearn'\",\n    max_results=10,\n    order_by=[\"name ASC\"]\n)\n\nfor mv in mvs:\n    print(f\"Name {mv.name}, tags {mv.tags}\")\n\n# Delete a model version\nclient.delete_model_version(\n    name=\"lr-model\"+\"-\"+run_id[:3], # We should not add the run_id start, this is only for my tests...\n    version=\"1\"\n)\n```\n\n## 14. MLflow CLI Commands\n\nMLflow has an extensive set of [CLI tools](https://mlflow.org/docs/latest/cli.html).\n\nFirst, we need to start a server.\n\n```bash\nconda activate mlflow\ncd ...\nmlflow server\n```\n\nThen, in another terminal, we can use the [MLflow CLI](https://mlflow.org/docs/latest/cli.html):\n\n```bash\nconda activate mlflow\ncd ...\n# We might need to set the environment variable\n# MLFLOW_TRACKING_URI=\"http://127.0.0.1:5000\"\n\n# Check configuration to see everything is correctly set up\n# We get: Python \u0026 MLflow version, URIs, etc.\nmlflow doctor\n\n# Use --mask-envs to avoid showing env variable values, in case we have secrets\nmlflow doctor --mask-envs\n\n# List artifacts\n# We can also use the artifact path instead of the run_id\nmlflow artifacts list --run-id \u003crun_id\u003e\nmlflow artifacts list --run-id 58be5cac691f44f98638f03550ac2743\n\n# Download artifacts\n# --dst-path: local path to which the artifacts are downloaded (created if inexistent)\nmlflow artifacts download --run-id \u003crun_id\u003e --dst-path cli_artifact\nmlflow artifacts download --run-id 58be5cac691f44f98638f03550ac2743 --dst-path cli_artifact\n\n# Upload/log artifacts\n# --local-dir: local path where the artifact is\n# --artifact-path: the path of the artifact in the mlflow data system\nmlflow artifacts log-artifacts --local-dir cli_artifact --run-id \u003crun_id\u003e --artifact-path cli_artifact\n\n# Upgrade the schema of an MLflow tracking database to the latest supported version\nmlflow db upgrade sqlite:///mlflow.db\n\n# Download to a local CSV all the runs (+ info) of an experiment\nmlflow experiments csv --experiment-id \u003cexperiment_id\u003e --filename experiments.csv\nmlflow experiments csv --experiment-id 932303397918318318 --filename experiments.csv\n\n# Create experiment; id is returned\nmlflow experiments create --experiment-name cli_experiment # experiment_id: 794776876267367931\n\nmlflow experiments rename --experiment-id \u003cexperiment_id\u003e --new-name test1\n\nmlflow experiments delete --experiment-id \u003cexperiment_id\u003e\n\nmlflow experiments restore --experiment-id \u003cexperiment_id\u003e\n\nmlflow experiments search --view \"all\" \n\nmlflow experiments csv --experiment-id \u003cexperiment_id\u003e --filename test.csv\n\n# List the runs of an experiment\nmlflow runs list --experiment-id \u003cexperiment_id\u003e --view \"all\"\nmlflow runs list --experiment-id 932303397918318318 --view \"all\"\n\n# Detailed information of a run: JSON with all the information returned\nmlflow runs describe --run-id \u003crun_id\u003e\nmlflow runs describe --run-id 58be5cac691f44f98638f03550ac2743\n\nmlflow runs delete --run-id \n\nmlflow runs restore --run-id \n```\n\n## 15. AWS Integration with MLflow\n\nIn this section an example project is built entirely on AWS:\n\n- Used AWS services: CodeCommit, Sagemaker (ML), EC2 (MLflow) and S3 (storage).\n- Problem/dataset: [House price prediction (regression)](https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques).\n\nThe code of the project is on a different repository connected to CodeCommit. However, I have added the final version to [`examples/housing-price-aws/`](./examples/housing-price-aws/).\n\nArchitecture of the implementation:\n\n![AWS Example Architecture](./assets/aws_example_architecture.jpg)\n\n- We can use Github or AWS CodeCommit to host the code.\n- Code development and buld tests are local.\n- We push the code to the remote repository.\n- MLflow server is on an EC2 instance (parameters, metrics, metadata stored in the tracking server VM).\n- All the model artifacts stored in an S3 bucket.\n- We will compare with the UI different model versions and select one.\n- Then, deployment comes: we build a docker image and set a SageMaker endpoint.\n- Once deployed, we'll test the inference.\n\nNotes:\n\n- Create a **non-committed** folder, e.g., `credentials`, where files with secrets and specific URIs will be saved.\n- We can stop the compute services while not used to save costs (stopped compute services don't iincur costs).\n  - The EC2 instance requires to manually launch the MLflow server if restarted; the local data persists.\n  - The SageMaker notebook requisres to re-install any packages we have installed (e.g., mlflow); the local data persists.\n- When we finish, remove all resources! See section [Clean Up](#clean-up).\n\n### AWS Account Setup\n\nSteps:\n\n- First, we need to create an account (set MFA).\n- Then, we log in with the root user.\n- After that we create a specific IAM user. Note IAM users have 3-4 elements:\n  - IAM ID\n  - Username\n  - Password\n  - Option: MFA code, specific for the IAM user\n\n    Username (up right) \u003e Security credentials\n    Users (left menu) \u003e Create a new user\n      Step 1: User details\n        User name: mlflow-user\n        Provide access to AWS Management Cosole\n        I want to create an IAM user\n        Choose: Custom/autogenerated password\n        Uncheck: Create new PW after check-in\n      Step 2: Set perimissions\n        Usually we grant only the necessary permissions\n        In this case, we \"Attach policies directly\"\n          Check: AdministratorAccess\n          Alternative:\n            Check: SageMaker, CodeCommit, S3, etc.\n        Next\n      Step 3: Review and create\n        Next\n      Step 4: Retrieve password\n        Save password\n        Activate MFA, specific for the IAM user!\n\n    We get an IAM user ID and a password\n    We can now use the IAM sign-in adress to log in\n      https://\u003cIAM-ID\u003e.signin.aws.amazon.com/console\n    or select IAM an introduce the IAM ID\n      IAM ID: xxx\n      Username: mlflow-user\n      Password: yyy\n    \nNow, we sign out and sign in again with the IAM user credentials.\nNext, we need to create **Access Keys**:\n\n    Sign in as IAM user\n    User (up left) \u003e Security credentials\n    Select: Application running on AWS compute service\n      We download the access keys for the services\n      we are going to use: EC2, S3, etc.\n      Confirm: I understand the recommendation\n    Create access key\n    Download and save securely: mlflow-user_accessKeys.csv\n    IMPORTANT: redentials are shown only now!\n\nWe should create a `.env` file which contains\n\n```\nAWS_ACCESS_KEY_ID=\"...\"\nAWS_SECRET_ACCESS_KEY=\"...\"\nAWS_DEFAULT_REGION=\"eu-central-1\"\n```\n\nThen, using `python-dotenv`, we can load these variables to the environment from any Python file, when needed:\n\n```python\nfrom dotenv import load_dotenv\n\n# Load environment variables in .env:\n# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION\nload_dotenv()\n```\n\nHowever, note that these variables are needed only locally, since all AWS environment in which we log in using the IAM role have already the credentials!\n\n### Setup AWS CodeCommit, S3, and EC2\n\nIn this section, the basic AWS components are started manually and using the web UI. I think the overall setup is not really secure, because everything is public and can be accessed from anywhere, but I guess the focus of the course is not AWS nor security...\n\nAWS CodeCommit Repository:\n\n    Search: CodeCommit\n    Select Region, e.g., EU-central-1 (Frankfurt)\n    Create repository\n      Name (unique): mlflow-housing-price-example\n      Description: Example repository in which house prices are regressed while using MLflow for tracking operations.\n      Create\n\n    Get credentials\n      https://docs.aws.amazon.com/codecommit/latest/userguide/setting-up-gc.html?icmpid=docs_acc_console_connect_np#setting-up-gc-iam\n      Download credentials and save them securely\n\n      Username (up right) \u003e Security credentials\n      AWS CodeCommit credentials (2nd tab, horizontal menu)\n        HTTPS Git credentials for AWS CodeCommit: Generate credentials\n          Download credentials: mlflow-user_codecommit_credentials.csv\n          IMPORTANT: only shown once!\n          Save them in secure place!\n\n    Clone repository in local folder:\n      git clone https://git-codecommit.eu-central-1.amazonaws.com/v1/repos/mlflow-housing-price-example\n      The first time we need to introduce our credentials downloaded before: username, pw\n      If we commit \u0026 push anything, we should see the changes in the CodeCommit repository (web UI)\n\nAWS S3 Bucket:\n\n    Search: S3\n    Create bucket\n      Region: EU-central-1 (Frankfurt)\n      Bucket name (unique): houing-price-mlflow-artifacts\n      Uncheck: Block all public access\n        Check: Acknowledge objects public\n    Leave rest default\n    Create bucket\n\nAWS EC2:\n\n    Search: EC2\n    Launch instance\n      We use a minimum configuration\n      Number or instances: 1\n      Image: Ubuntu\n      Instance Type: t2.micro\n      Name: mlflow-server\n      Key pair (login): needed if we want to connect to EC2 from local machine\n        Create a new key pair\n        Name: mlflow-server-kp\n        Dafault: RSA, .pem\n        Create key pair\n          File downloaded: mlflow-server-kp.pem\n      Network settings\n        Create a security group\n        Allow SSH traffic from anywhere\n        Allow HTTPS traffic from interent\n        Allow HTTP traffic from internet\n      Rest: default\n      Launch instance\n    EC2 \u003e Instances (left panel): Select instance\n      When instance is running:\n      Connect\n      Shell opens on Web UI\n\nIn the shell, we install the dependencies and start the MLflow server.\nThe tool [pipenv](https://pipenv.pypa.io/en/latest/) is used to create the environment; `pipenv` is similar to `pip-tools` or `poetry`. We could connect locally using the `mlflow-server-kp.pem` we have downloaded, too.\n\nThe following commands need to be run in the EC2 instance:\n\n```bash\n# Update packages list\nsudo apt update\n\n# Install pip\nsudo apt install python3-pip\n\n# Install dependencies\nsudo pip3 install pipenv virtualenv\n\n# Create project folder\nmkdir mlflow\ncd mlflow\n\n# Install dependencies\npipenv install mlflow awscli boto3 setuptools\n\n# Start a shell with the virtualenv activated. Quit with 'exit'.\npipenv shell\n\n# Configure AWS: Now, we need the Access Key Credentials crated in the account setup\naws configure\n# AWS Access Key ID: xxx\n# AWS Secret Access Key: yyy\n# Default region: press Enter\n# Default output: press Enter\n\n# Start MLflow server\n# Host to all: 0.0.0.0\n# Set \n# - the backend store: SQLite DB\n# - and the artifact store: the bucket name we have specified\nmlflow server -h 0.0.0.0 --backend-store-uri sqlite:///mlflow.db --default-artifact-root s3://houing-price-mlflow-artifacts\n```\n\nAfter the last command, the server is launched:\n    \n    ...\n    Listening at: http://0.0.0.0:5000\n    ...\n\nTo use it, we need to expose the port `5000` of the EC2 instance:\n\n    EC2 \u003e Instances (left panel): Select instance\n      Scroll down \u003e Security tab (horizontal menu)\n      We click on our segurity group: sg-xxx\n      Edit inbound rules\n        Add rule\n          Type: Custom TCP\n          Port: 5000\n          Source: Anywhere-IPv4 (0.0.0.0)\n        Save rules\n\nAdditionally, we copy to a safe place the public DNS of the EC2 instance:\n\n    EC2 \u003e Instances (left panel): Select instance (we go back to our instance)\n      Copy the Public IPv4 DNS (instance summary dashboard), e.g.:\n        ec2-\u003cIP-number\u003e.\u003cregion\u003e.amazonaws.com\n\nNow, we can open the browser and paste the DNS followed by the port number 5000; the Mlflow UI will open!\n\n    ec2-\u003cIP-number\u003e.\u003cregion\u003e.amazonaws.com:5000\n\nWe can close the window of the EC2 instance shell on the web UI; the server will continue running.\n\n**If we stop the EC2 server to safe costs, we need to re-start it, open a terminal and restart the server again by executing the commands above. However, note that the the public DNS might change!** The artifacts will persist (they are in S3) and the backend store which is local to the EC2 instance (mlflow.db), too.\n\nIf we want to connect locally to the EC2 instance, first we make sure that the port 22 is exposed to inbound connections from anywhere (follow the same steps as for opening port 5000). Then, we can `ssh` as follows:\n\n```bash\n# Go to the folder where the PEM credentials are\ncd .../examples/housing-price-aws/credentials\n\n# On Unix, we need to make the PEM file readable only for the owner\n# chmod 600 mlflow-server-kp.pem\n\n# Connect via SSH\n# Replace:\n# - username: ubuntu if Ubuntu, ec2-user is Amazon Linux\n# - Public-DNS: ec2-\u003cIP-number\u003e.eu-central-1.compute.amazonaws.com\n# - key-pair.pem: mlflow-server-kp.pem\nssh -i \u003ckey-pair.pem\u003e \u003cusername\u003e@\u003cPublic-DNS\u003e\n```\n\n### Code Respository and Development\n\nFrom now on, we work ok the repository cloned locally.\n\nRespository structure:\n\n```\nC:.\n│   conda.yaml        # Environment\n│   data.py           # Data pre-processing\n│   deploy.py         # Deploy model to Sagemaker endopoint image\n│   MLproject\n│   params.py         # Hyperparameter search space\n│   run.py            # Run training using mlflow \u0026 MLproject\n│   predict.py        # Inference\n│   train.py          # Entrypoint for MLproject\n│   eval.py           # EValuation metrics\n│\n├───credentials/\n│\n└───data/\n        test.csv\n        train.csv\n```\n\nDataset schema:\n\n```\n #   Column         Non-Null Count  Dtype  \n---  ------         --------------  -----  \n 0   Id             1460 non-null   int64  \n 1   MSSubClass     1460 non-null   int64  \n 2   MSZoning       1460 non-null   object \n 3   LotFrontage    1201 non-null   float64\n 4   LotArea        1460 non-null   int64  \n 5   Street         1460 non-null   object \n 6   Alley          91 non-null     object \n 7   LotShape       1460 non-null   object \n 8   LandContour    1460 non-null   object \n 9   Utilities      1460 non-null   object \n 10  LotConfig      1460 non-null   object \n 11  LandSlope      1460 non-null   object \n 12  Neighborhood   1460 non-null   object \n 13  Condition1     1460 non-null   object \n 14  Condition2     1460 non-null   object \n 15  BldgType       1460 non-null   object \n 16  HouseStyle     1460 non-null   object \n 17  OverallQual    1460 non-null   int64  \n 18  OverallCond    1460 non-null   int64  \n 19  YearBuilt      1460 non-null   int64  \n...\n 79  SaleCondition  1460 non-null   object \n 80  SalePrice      1460 non-null   int64  \u003c- TARGET!\n```\n\n#### Data Preprocessing\n\nAll the data preprocessing happens in `data.py`:\n\n- Datasets loaded (train \u0026 test).\n- Train/validation split.\n- Target/independent variable selection.\n- Missing value imputation with KNN.\n- Categorical feature one-hot encoding.\n- **The transformed `X_train`, `X_val` and `test` are the product.**\n\n**IMPORTANT NOTE**: Even though the `data.py` script works with the environment, it has some issues for newer Scikit-Learn versions and the code needs to be updated accordingly: `OneHotEncoder` returns a sparse matrix and, on top pf that, we should apply it to categoricals only, thus we proably need a `ColumnTransformer`.\n\n#### Training\n\nThe training happens in \n\n- `params.py`:\n- `eval.py`:\n- `train.py`: \n\nHere, we start using `mlflow`; however, the `mlflow` dependencies and calls are only in `train.py`. Additionally, those dependencies/calls are as generic as possible, i.e., we don't define any experiment/run ids/names, tracking URIs, etc. The idea is to have the code as reusable as possible, and we leave any configuration to `MLproject` and higher level files, like `run.py`:\n\n```python\nimport argparse\nimport mlflow\nfrom sklearn.linear_model import Ridge, ElasticNet\nfrom xgboost import XGBRegressor\nfrom sklearn.model_selection import ParameterGrid\nfrom data import X_train, X_val, y_train, y_val\nfrom params import ridge_param_grid, elasticnet_param_grid, xgb_param_grid\nfrom eval import eval_metrics\n\n\ndef train(model_name):\n    \n    # Select model and parameter grid based on input argument\n    # Dafault: XGBRegressor\n    model_cls = XGBRegressor\n    param_grid = xgb_param_grid\n    log_model = mlflow.xgboost.log_model\n    if model_name == 'ElasticNet':\n        model_cls = ElasticNet\n        param_grid = elasticnet_param_grid\n        log_model = mlflow.sklearn.log_model\n    elif model_name == 'Ridge':\n        model_cls = Ridge\n        param_grid = ridge_param_grid\n        log_model = mlflow.sklearn.log_model\n    else:\n        # Defaults to XGBRegressor if --model is not provided or is incorrect\n        pass\n\n    # NOTE: We usually don't hard-code any experiment/run name/ids,\n    # these are set dynamically\n    # and here MLproject contains the important configuration,\n    # so that the script/code is generic!\n    # Also, it is common to have a run.py file which uses hard-coded values (i.e., URIs).\n    # Loop through the hyperparameter combinations and log results in separate runs\n    for params in ParameterGrid(param_grid):\n        with mlflow.start_run():\n            # Fit model\n            model = model_cls(**params)\n            model.fit(X_train, y_train)\n\n            # Evaluate trained model\n            y_pred = model.predict(X_val)\n            metrics = eval_metrics(y_val, y_pred)\n\n            # Logging the inputs and parameters\n            mlflow.log_params(params)\n            mlflow.log_metrics(metrics)\n\n            # Log the trained model\n            log_model(\n                model,\n                model_name,\n                input_example=X_train[:5],\n                code_paths=['train.py', 'data.py', 'params.py', 'eval.py']\n            )\n\nif __name__ == \"__main__\":\n    # Parse arguments with a default model\n    parser = argparse.ArgumentParser(description='Train a model.')\n    parser.add_argument('--model', type=str, choices=['ElasticNet', 'Ridge', 'XGBRegressor'], default='XGBRegressor', help='The model to train. Defaults to XGBRegressor.')\n    args = parser.parse_args()\n\n    train(args.model)\n\n```\n\n#### MLproject file and Running Locally\n\nThe `MLproject` file contains only one entry point:\n\n```yaml\nname: \"Housing Price Prediction\"\n\nconda_env: conda.yaml\n\nentry_points:\n  main:\n    parameters:\n      model: {type: string, default: \"ElasticNet\", choices: [\"ElasticNet\", \"Ridge\", \"XGBRegressor\"]}\n    command: \"python train.py --model {model}\"\n```\n\nEven though we could run the training via CLI, it is common to use the Python MLflow API, as done here with `run.py`:\n\n```python\nimport mlflow\n\nmodels = [\"ElasticNet\", \"Ridge\", \"XGBRegressor\"]\nentry_point = \"main\"\n\n# We will change this depending on local tests / AWS runs\n#mlflow.set_tracking_uri(\"http://ec2-\u003cIP-number\u003e.\u003cregion\u003e.amazonaws.com:5000\")\nmlflow.set_tracking_uri(\"http://127.0.0.1:5000\")\n\nfor model in models:\n    experiment_name = model\n    mlflow.set_experiment(experiment_name)\n    \n    mlflow.projects.run(\n        uri=\".\",\n        entry_point=entry_point,\n        parameters={\"model\": model},\n        env_manager=\"conda\"\n    )\n```\n\nTo use `run.py`:\n\n```bash\n# Terminal 1: Start MLflow Tracking Server\nconda activate mlflow\ncd .../mlflow-housing-price-example\nmlflow server\n\n# Terminal 2: Run pipeline\n# Since we are running a hyperparameter tuning\n# the execution might take some time\n# WARNING: the XGBoostRegressor model has many parameter combinations!\nconda activate mlflow\ncd .../mlflow-housing-price-example\npython run.py\n\n# Browser: Open URI:5000 == http://127.0.0.1:5000\n# We should see 3 (Ridge, ElasticNet, XGBoostRegressor) experiments with several runs each,\n# containing metrics, parameters, artifacts (dataset \u0026 model), etc.\n```\n\n![MLflow UI: Local Runs](./assets/mlflow_local_runs_example.jpg)\n\nAfter we have finished, we commit to the AWS CodeCommit repo.\n\n### Setup AWS Sagemaker\n\nWe log in with the IAM user credentials.\n\nAWS SageMaker is the service from AWS to do anything related to ML: from notebook to deployment.\n\nTo set Sagemaker up, we need to:\n\n- Add our repository\n- Create a new IAM role with permissions for: CodeCommit (repo), S3 (store), ECR (build container image for deployment)\n  - Note: we have create an IAM user, but now we need an IAM role, these are different things.\n- Create a notebook instance: **even though it's called notebook, it's really a Jupyter Lab server where we have Python scripts, a terminal and also notebooks; in fact we're going to use the Terminal along with scripts, no the notebooks**.\n\nTo add our repository to SageMaker:\n\n    Search: SageMaker\n    Left panel: Notebook \u003e Git repositories\n      Add repository\n        We can choose AWS CodeCommit / Github / Other\n        Choose AWS CodeCommit\n        Select\n          Our repo: mlflow-housing-price-example\n          Branch: master\n          Name: mlflow-housing-price-example\n      Add repository\n\nTo create a new IAM role with the necessari permissions:\n\n    Username (up left) \u003e Security credentials\n    Access management (left panel): (IAM) roles \u003e Create role\n      AWS Service\n      Use case: SageMaker\n        The role will have SageMaker execution abilities\n      Next\n      Name\n        Role name: house-price-role\n      Create role\n\n    Now, we need to give it more permissions: CodeCommit, S3, ECR (to build image container)\n    IAM Roles: Open role 'house-price-role'\n      Permission policies: we should see AmazonSageMakerFullAccess\n      Add permissions \u003e Attach policies:\n        AWSCodeCommitFullAccess\n        AWSCodeCommitPowerUser\n        AmazonS3FullAccess\n        EC2InstanceProfileForImageBuilderECRContainerBuilds\n\nTo create a notebook instance on SageMaker:\n\n    Left panel: Notebook \u003e Notebook instances\n      Create notebook instance\n        Notebook instance settings\n          Name: house-price-nb\n          Instance type: ml.t3.large (smaller ones might fail)\n        Permissions and encryption\n          IAM role: house-price-role (just created)\n          Enable root access to the notebook\n        Git repositories: we add our repo\n          Default repository: mlflow-housing-price-example - AWS CodeCommit\n        Create notebook instance\n\nThen, the notebook instance is created; we wait for it to be in service, then, we `Open JupyterLab`. We will see that the CodeCommit repository has been cloned to the Jupyter Lab file system: `/home/ec2-user/SageMaker/mlflow-housing-price-example`.\n\nIn the Jupyter Lab server instance, we can open\n\n- Code files: however, we ususally develop on our local machine.\n- Terminals: however, the conda.yaml environment is not really installed.\n- Notebooks: we don't need to use them really, we can do everything with our Python files an the terminal.\n- etc.\n\nIf we locally/remotely modify anything in the files, we can synchronize as usually with the repository; in a Terminal:\n\n```bash\ncd .../mlflow-housing-price-example\ngit pull\ngit push\n```\n\nTherefore, we can actually develop on our local machine and \n\n- `git push` on our machine\n- `git pull` on a SageMaker notebook terminal\n\n**If we stop the SageMaker Notebook instance to safe costs, we need to re-start it.** The notebook local data will persist, but if we have installed anything in the environment (e.g., mlflow), we need to re-install it.\n\n### Training on AWS Sagemaker\n\nWe open the notebook instance on SageMaker:\n\n    Left panel: Notebook \u003e Notebook instances\n      house-price-nb \u003e Open JupyterLab\n\nThen, in the Jupyter Lab instance:\n\n- `git pull` in a Terminal, just in case\n- Change the tracking server URI to the AWS EC2 public DNS in the `run.py` (we could use `dotenv` instead...)\n\n    ```\n    PREVIOUS: http://127.0.0.1:5000\n    NEW: http://ec2-\u003cIP-number\u003e.\u003cregion\u003e.amazonaws.com:5000 (Public IPv4 from the EC2 instance)\n    ```\n- Run the `run.py` script in the Termnial:\n\n    ```bash\n    # Go to the repo folder, if not in there\n    cd .../mlflow-housing-price-example\n    \n    # We install mlflow just to be able to load MLproject and execute run.py\n    # which will then set the correct conda environment in conda.yaml\n    pip install mlflow\n\n    # Run all experiments\n    python run.py\n    ```\n- Open the MLflow UI hosted on AWS with the browser: `http://ec2-\u003cIP-number\u003e.\u003cregion\u003e.amazonaws.com:5000`\n\nNow, we can see that the entries that appear in the server/UI hosted on AWS are similar to the local ones; however:\n\n- On AWS, if we pick a run and check the artifact URIs in it, we'll see they are of the form `s3://\u003cbucket-name\u003e/...`.\n- If we go to the S3 UI on the AWS web interface and open the created bucket, we will see the artifacts.\n\n### Model Comparison and Evaluation\n\nModel comparison and evaluation is done as briefly introduced in [MLflow UI - 01_tracking](#mlflow-ui---01_tracking):\n\n- We pick an experiment.\n- We select the runs we want, e.g., all.\n- We click on `Compare`.\n- We can use either the plots or the metrics.\n- We select the best one (e.g., the one with the best desired metric), open the Run ID and click on `Register model`.\n  - Create a model, e.g., `best`.\n    - Note: if we create one model registry and assign the different experiment models as versions to it we can compare the selected ones in the registry.\n    - Alternative: we create a registry of each model: `elasticnet`, `ridge`, `xgboost`.\n  - Go to the `Models` tab (main horizontal menu), click on a model version, add tags, e.g., `staging`.\n\n![Comparing Runs: Plots](./assets/mlflow_comparing_runs_plot.jpg)\n![Comparing Runs: Metrics](./assets/mlflow_comparing_runs_metrics.jpg)\n![Comparing Runs: Registering](./assets/mlflow_register_ui.jpg)\n\nUsually, we add the alias `production` to the final selected model.\n\nTo get the **model URI**, select the model version we want, go to the run link and find the artifact path, e.g.:\n\n    s3://\u003cbucket-name\u003e/\u003cexperiment-id\u003e/\u003crun-id\u003e/artifacts/XGBRegressor\n\n\n### Deployment on AWS Sagemaker\n\nIn order to deploy a model, first we need to build a parametrized docker image with the command [`mlflow sagemaker build-and-push-container`](https://mlflow.org/docs/latest/cli.html#mlflow-sagemaker-build-and-push-container). This command doesn't directly deploy a model; instead, it prepares the necessary environment for model deployment, i.e., it sets the necessary dependencies in an image.\n\n```bash\n# Open a terminal in the SageMaker Jupyter Lab instance\ncd .../mlflow-housing-price-example\n# We should see the conda.yaml in here\n\n# Build parametrized image and push it to ECR\n# --container: image name\n# --env-manager: local, virtualenv, conda\nmlflow sagemaker build-and-push-container --container xgb --env-manager conda\n```\n\nWe can check that the image is pushed to AWS ECR\n\n    Search: ECR\n    Private registry (left panel): Repositories - the image with the name in --container should be there\n      We can copy the image URI, e.g.\n      \u003cID\u003e.ecr.\u003cregion\u003e.amazonaws.com/\u003cimage-name\u003e:\u003c\u003e\n\nAfter the image is pushed, we deploy a container of it by running `deploy.py`, where many image parameters are defined:\n\n- Bucket name\n- Image URL\n- Instance type\n- Endpoint name\n- Model URI\n- ...\n\nThese parameters are passed to the function [create_deployment](https://mlflow.org/docs/latest/python_api/mlflow.sagemaker.html?highlight=create_deployment#mlflow.sagemaker.SageMakerDeploymentClient.create_deployment):\n\n```python\nimport mlflow.sagemaker\nfrom mlflow.deployments import get_deploy_client\n\n# Specify a unique endpoint name (lower letters)\n# FIXME: some the following should be (environment/repo) variables\nendpoint_name = \"house-price-prod\"\n# Get model URI from MLflow model registry (hosted on AWS)\nmodel_uri = \"s2://\u003cbucket-name\u003e/...\"\n# Get from IAM roles\nexecution_role_arn = \"arn:aws:aim:...\"\n# Get from S3\nbucket_name = \"...\"\n# Get from ECR: complete image name with tag\nimage_url = \"\u003cID\u003e.ecr.eu-central.\u003cregion\u003e.amazonaws.com/\u003cimage-name\u003e:\u003ctag\u003e\"\nflavor = \"python_function\"\n\n# Define the missing configuration parameters as a dictionary:\n# region, instance type, instance count, etc.\nconfig = {\n    \"execution_role_arn\": execution_role_arn,\n    \"bucket_name\": bucket_name,\n    \"image_url\": image_url,\n    \"region_name\": \"eu-central-1\", # usually, same as rest of services\n    \"archive\": False,\n    \"instance_type\": \"ml.m5.xlarge\", # https://aws.amazon.com/sagemaker/pricing/instance-types/\n    \"instance_count\": 1,\n    \"synchronous\": True\n}\n\n# Initialize a deployment client for SageMaker\nclient = get_deploy_client(\"sagemaker\")\n\n# Create the deployment\nclient.create_deployment(\n    name=endpoint_name,\n    model_uri=model_uri,\n    flavor=flavor,\n    config=config,\n)\n```\n\nAfter we have modified manually the `deploy.py` in the AWS Jupyter Lab instance, we can run it on a Terminal:\n\n```bash\n# Go to folder\ncd .../mlflow-housing-price-example\n\n# Install mlflow, if not present (e.g., if the notebook was re-started)\npip install mlflow\n\n# Deploy: This takes 5 minutes...\npython deploy.py\n```\n\nWhen the deployment finished, we should see an endpoint entry in AWS:\n\n    Search: Sagemaker\n    Inference (left menu/panel) \u003e Endpoints\n      house-price-prod should be there\n\nNow, the endpoint is up and running, ready to be used for inferences.\n\n### Model Inference\n\nFinally, we can run the inference, either locally (if the `.env` file is properly set) or remotely (in the Sagemaker notebook instance); this is the script `predict.py` for that:\n\n```python\nfrom data import test\nimport boto3\nimport json\nfrom dotenv import load_dotenv\n\n# Load environment variables in .env:\n# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION\nload_dotenv()\n\n# Defined in deploy.py\n# FIXME: These values should be (environment/repo) variables\nendpoint_name = \"house-price-prod\"\nregion = 'eu-central-1'\n\nsm = boto3.client('sagemaker', region_name=region)\nsmrt = boto3.client('runtime.sagemaker', region_name=region)\n\ntest_data_json = json.dumps({'instances': test[:20].toarray()[:, :-1].tolist()})\n\nprediction = smrt.invoke_endpoint(\n    EndpointName=endpoint_name,\n    Body=test_data_json,\n    ContentType='application/json'\n)\n\nprediction = prediction['Body'].read().decode(\"ascii\")\n\nprint(prediction)\n```\n\n### Clean Up\n\nWe need to remove these AWS services (in the region we have worked):\n\n- [x] SageMaker Endpoint (Inference)\n- [x] SageMaker Notebook\n- [x] S3 bucket\n- [x] ECR image (Repository)\n- [x] EC2 instance (Terminate)\n\nWe can keep the role and keys.\n\n## Authorship\n\nMikel Sagardia, 2024.  \nYou are free to use this guide as you wish, but please link back to the source and don't forget the original, referenced creators, which did the hardest work of compiling all the information.  \nNo guarantees.  \n\n## Interesting Links\n\n- My notes: [mlops_udacity](https://github.com/mxagar/mlops_udacity)\n- [From Experiments 🧪 to Deployment 🚀: MLflow 101 | Part 01](https://medium.com/towards-artificial-intelligence/from-experiments-to-deployment-mlflow-101-40638d0e7f26)\n- [From Experiments 🧪 to Deployment 🚀: MLflow 101 | Part 02](https://medium.com/@afaqueumer/from-experiments-to-deployment-mlflow-101-part-02-f386022afdc6)\n- [Comprehensive Guide to MlFlow](https://towardsdatascience.com/comprehensive-guide-to-mlflow-b84086b002ae)\n- [Streamline Your Machine Learning Workflow with MLFlow](https://www.datacamp.com/tutorial/mlflow-streamline-machine-learning-workflow)\n- [MLOps-Mastering MLflow: Unlocking Efficient Model Management and Experiment Tracking](https://medium.com/gitconnected/mlops-mastering-mlflow-unlocking-efficient-model-management-and-experiment-tracking-d9d0e71cc697)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmxagar%2Fmlflow_guide","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmxagar%2Fmlflow_guide","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmxagar%2Fmlflow_guide/lists"}