https://github.com/alan-turing-institute/learning-machines-drift

A Python package for monitoring dataset drift in secure environments
https://github.com/alan-turing-institute/learning-machines-drift

hut23

Last synced: 2 months ago
JSON representation

A Python package for monitoring dataset drift in secure environments

Host: GitHub
URL: https://github.com/alan-turing-institute/learning-machines-drift
Owner: alan-turing-institute
Created: 2022-03-02T11:34:12.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-03-22T12:09:56.000Z (over 2 years ago)
Last Synced: 2025-04-25T23:03:48.599Z (3 months ago)
Topics: hut23
Language: Python
Homepage:
Size: 3.82 MB
Stars: 4
Watchers: 5
Forks: 2
Open Issues: 6
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Learning Machines

A Python package for monitoring dataset drift in production ML pipelines.

Built to run in any environment without uploading your data to external services.

## Background

More [background](background.md) on learning machines.

## Getting started

### Requirements

- Python 3.9

### Install

To install the latest version, run the following:

```shell

pip install -U learning-machines-drift

```

### Example usage

A [simple example](examples/simple_example/main.py) along with the [below](examples/simple_example/readme_example.py):

```python

from learning_machines_drift import Dataset, Display, FileBackend, Monitor, Registry

from learning_machines_drift.datasets import example_dataset

# Make a registry to store datasets

registry = Registry(tag="tag", backend=FileBackend("backend"))

# Save example reference dataset of 100 samples

registry.save_reference_dataset(Dataset(*example_dataset(100, seed=0)))

# Log example dataset with 80 samples

with registry:

    registry.log_dataset(Dataset(*example_dataset(80, seed=1)))

# Monitor to interface with registry and load datasets

monitor = Monitor(tag="tag", backend=registry.backend).load_data()

# Measure drift and display results as a table

Display().table(monitor.metrics.scipy_kolmogorov_smirnov())

```

## Development

### Install

For a local copy:

```shell

git clone [email protected]:alan-turing-institute/learning-machines-drift

cd learning-machines-drift

```

To install:

```shell

poetry install

```

To install with `dev` and `docs` dependencies:

```shell

poetry install --with dev,docs

```

### Tests

Run:

```shell

poetry run pytest

```

### pre-commit checks

Run:

```shell

poetry run pre-commit run --all-files

```

To run checks before every commit, install as a pre-commit hook:

```shell

poetry run pre-commit install

```

## Other tools

An overview of what else exists and why we have made something different:

- Cloud based

    - [Azure dataset monitor](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-monitor-datasets?tabs=python)

- Python

    - [Evidently](https://github.com/evidentlyai/evidently)

    - [whylogs](https://github.com/whylabs/whylogs)

- ML pipelines: End to end machine learning lifecycle

    - [MLFlow](https://mlflow.org/)

### What LM does differently

- No vendor lock in

- Run on any platform, in any environment (your local machine, cloud, on-premises)

- Work with existing Python frameworks (e.g. scikit-learn)

- Open source

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/alan-turing-institute/learning-machines-drift

Awesome Lists containing this project

README