https://github.com/alan-turing-institute/learning-machines-drift
A Python package for monitoring dataset drift in secure environments
https://github.com/alan-turing-institute/learning-machines-drift
hut23
Last synced: 5 months ago
JSON representation
A Python package for monitoring dataset drift in secure environments
- Host: GitHub
- URL: https://github.com/alan-turing-institute/learning-machines-drift
- Owner: alan-turing-institute
- Created: 2022-03-02T11:34:12.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-03-22T12:09:56.000Z (about 2 years ago)
- Last Synced: 2024-03-20T20:58:16.115Z (about 1 year ago)
- Topics: hut23
- Language: Python
- Homepage:
- Size: 3.82 MB
- Stars: 4
- Watchers: 5
- Forks: 2
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Learning Machines
A Python package for monitoring dataset drift in production ML pipelines.
Built to run in any environment without uploading your data to external services.
## Background
More [background](background.md) on learning machines.
## Getting started
### Requirements
- Python 3.9### Install
To install the latest version, run the following:
```shell
pip install -U learning-machines-drift
```### Example usage
A [simple example](examples/simple_example/main.py) along with the [below](examples/simple_example/readme_example.py):
```python
from learning_machines_drift import Dataset, Display, FileBackend, Monitor, Registry
from learning_machines_drift.datasets import example_dataset# Make a registry to store datasets
registry = Registry(tag="tag", backend=FileBackend("backend"))# Save example reference dataset of 100 samples
registry.save_reference_dataset(Dataset(*example_dataset(100, seed=0)))# Log example dataset with 80 samples
with registry:
registry.log_dataset(Dataset(*example_dataset(80, seed=1)))# Monitor to interface with registry and load datasets
monitor = Monitor(tag="tag", backend=registry.backend).load_data()# Measure drift and display results as a table
Display().table(monitor.metrics.scipy_kolmogorov_smirnov())
```## Development
### Install
For a local copy:
```shell
git clone [email protected]:alan-turing-institute/learning-machines-drift
cd learning-machines-drift
```To install:
```shell
poetry install
```To install with `dev` and `docs` dependencies:
```shell
poetry install --with dev,docs
```### Tests
Run:
```shell
poetry run pytest
```### pre-commit checks
Run:
```shell
poetry run pre-commit run --all-files
```To run checks before every commit, install as a pre-commit hook:
```shell
poetry run pre-commit install
```## Other tools
An overview of what else exists and why we have made something different:
- Cloud based
- [Azure dataset monitor](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-monitor-datasets?tabs=python)
- Python
- [Evidently](https://github.com/evidentlyai/evidently)
- [whylogs](https://github.com/whylabs/whylogs)- ML pipelines: End to end machine learning lifecycle
- [MLFlow](https://mlflow.org/)### What LM does differently
- No vendor lock in
- Run on any platform, in any environment (your local machine, cloud, on-premises)
- Work with existing Python frameworks (e.g. scikit-learn)
- Open source