https://github.com/lazauk/azureml-monitoring-datacollector

Real-time data collection from the model behind Azure Machine Learning's managed online endpoint.
https://github.com/lazauk/azureml-monitoring-datacollector

azure data-collector inference machine-learning monitoring

Last synced: 3 months ago
JSON representation

Real-time data collection from the model behind Azure Machine Learning's managed online endpoint.

Host: GitHub
URL: https://github.com/lazauk/azureml-monitoring-datacollector
Owner: LazaUK
License: mit
Created: 2024-05-08T21:13:42.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-11-19T14:39:46.000Z (11 months ago)
Last Synced: 2025-04-04T04:41:18.057Z (6 months ago)
Topics: azure, data-collector, inference, machine-learning, monitoring
Language: Jupyter Notebook
Homepage:
Size: 399 KB
Stars: 2
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Inference data collection from Azure ML managed online endpoints

As described in [Azure ML documentation](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-collect-production-data), you can _azureml-ai-monitoring_ Python package to collect real-time inference data received and produced by your machine learning model, deployed to Azure ML managed online endpoint.

This repo provides all the required resources to deploy and test a Data Collector solution end-to-end.

## 1 - Dependency files

Successful deployment depends on the following 3 files, borrowed from the original [Azure ML examples](https://github.com/Azure/azureml-examples/tree/main/sdk/python/endpoints/online/model-1) repo: _inference model_, _environment configuration_ and _scoring script_.

### 1.1 - Inference model

**_sklearn_regression_model.pkl_** is a SciKit-Learn sample regression model in a pickle format. We'll re-use it "as is".

### 1.2 - Environment configuration

**_conda.yaml_** is our Conda file, to define running environment for our machine learning model. It has been modified to include the following AzureML monitoring Python package.

```

azureml-ai-monitoring

```

### 1.3 - Scoring script

**_score_datacollector.py_** is a Python script, used by the managed online endpoint to feed and retrieve data from our inference model. This script was updated to enable data collection operations.

_Collector_ and _BasicCorrelationContext_ classes are referenced, along with the _pandas_ package. Inclusion of pandas is crucial, as Data Collector at the time of writing was able to log directly only DataFrames.

``` Python

from azureml.ai.monitoring import Collector

from azureml.ai.monitoring.context import BasicCorrelationContext

import pandas as pd

```

_init_ function initialises global Data Collector variables.

``` Python

global inputs_collector, outputs_collector, artificial_context

inputs_collector = Collector(name='model_inputs')          

outputs_collector = Collector(name='model_outputs')

artificial_context = BasicCorrelationContext(id='Laziz_Demo')

```

"_model_inputs_" and "_model_outputs_" are reserved Data Collector names, used to auto-register relevant Azure ML data assets.

![Screenshot_1.3](images/screenshot_1_3.png)

_run_ function contains 2 data processing blocks. First, we convert our input inference data into pandas DataFrame to log it along with our correlation context.

``` Python

input_df = pd.DataFrame(data)

context = inputs_collector.collect(input_df , artificial_context)

```

The same operation is then performed with the model's prediction to log it in the Data Collctor's output.

``` Python

output_df = pd.DataFrame(result)

outputs_collector.collect(output_df, context)

```

## 2 - Solution deployment and testing

To deploy and test Data Collector, you can execute cells in the provided Jupyter notebook.

### 2.1 - System configuration

You would need to set values of your Azure subscription, resource group and Azure ML workspace name.

``` Python

subscription_id = ""

resource_group_name = ""

workspace_name = ""

```

### 2.2 - Model deployment options

You may upload local model for initial testing (Option 1).

``` Python

model = Model(path = "./model/sklearn_regression_model.pkl")

```

However, recommended and more robust option is to register the model in your Azure ML (Option 2), as it provides better management control, eliminates model's re-upload and enables more controlled reproducibility of the testing results.

``` Python

file_model = Model(

    path="./model/",

    type=AssetTypes.CUSTOM_MODEL,

    name="scikit-model",

    description="SciKit model created from local file",

)

ml_client.models.create_or_update(file_model)

```

### 2.3 - Model and environment references

Ensure that you refer the right version of your registered inference model and environment.

``` Python

model = "scikit-model:1"

env = "azureml:scikit-env:2"

```

### 2.4 - Activation of Data Collector objects

You need to enable explicitly both input and output collectors, referred in the scoring script.

``` Python

collections = {

    'model_inputs': DeploymentCollection(

        enabled="true",

    ),

    'model_outputs': DeploymentCollection(

        enabled="true",

    )

}

data_collector = DataCollector(collections=collections)

```

Those values can be passed then to _data_collector_ parameter of **_ManagedOnlineDeployment_**.

``` Python

deployment = ManagedOnlineDeployment(

    ...

    data_collector=data_collector

)

```

### 2.5 - Testing data collection process with sample request data

Once the inference model is deployed to managed online endpoint, you can test data logging with provided _sample-request.json_ file.

``` Python

ml_client.online_endpoints.invoke(

    endpoint_name=endpoint_name,

    deployment_name="blue",

    request_file="./sample-request.json",

)

```

Logged inference data can be found in **_workspaceblobstore (Default)_**, unless you define custom paths for the input and output data collectors in Step 2.4 above.

![Screenshot_2.5](images/screenshot_2_5.png)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lazauk/azureml-monitoring-datacollector

Awesome Lists containing this project

README