Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/mle-infrastructure/mle-logging

Lightweight ML Experiment Logging 📖
https://github.com/mle-infrastructure/mle-logging

Last synced: 2 months ago
JSON representation

Lightweight ML Experiment Logging 📖

Host: GitHub
URL: https://github.com/mle-infrastructure/mle-logging
Owner: mle-infrastructure
License: mit
Created: 2021-08-03T13:27:02.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2023-03-07T16:37:05.000Z (over 1 year ago)
Last Synced: 2024-01-25T21:37:34.983Z (5 months ago)
Language: Python
Homepage: https://mle-infrastructure.github.io/mle_logging
Size: 9.4 MB
Stars: 77
Watchers: 1
Forks: 6
Open Issues: 1
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Lists

awesome-stars - mle-infrastructure/mle-logging - A Lightweight ML Experiment Logging Tool 📖 (Python)
awesome-stars - mle-infrastructure/mle-logging - Lightweight ML Experiment Logging 📖 (Python)

README

        # A Lightweight Logger for ML Experiments 📖

[![Pyversions](https://img.shields.io/pypi/pyversions/mle-logging.svg?style=flat-square)](https://pypi.python.org/pypi/mle-logging)

[![PyPI version](https://badge.fury.io/py/mle-logging.svg)](https://badge.fury.io/py/mle-logging)

[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

[![codecov](https://codecov.io/gh/mle-infrastructure/mle-logging/branch/main/graph/badge.svg)](https://codecov.io/gh/mle-infrastructure/mle-logging)

[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mle-infrastructure/mle-logging/blob/main/examples/getting_started.ipynb)



Simple logging of statistics, model checkpoints, plots and other objects for your Machine Learning Experiments (MLE). Furthermore, the `MLELogger` comes with smooth multi-seed result aggregation and combination of multi-configuration runs. For a quickstart check out the [notebook blog](https://github.com/mle-infrastructure/mle-logging/blob/main/examples/getting_started.ipynb) 🚀

## The API 🎮

```python

from mle_logging import MLELogger

# Instantiate logging to experiment_dir

log = MLELogger(time_to_track=['num_updates', 'num_epochs'],

                what_to_track=['train_loss', 'test_loss'],

                experiment_dir="experiment_dir/",

                model_type='torch')

time_tic = {'num_updates': 10, 'num_epochs': 1}

stats_tic = {'train_loss': 0.1234, 'test_loss': 0.1235}

# Update the log with collected data & save it to .hdf5

log.update(time_tic, stats_tic)

log.save()

```

You can also log model checkpoints, matplotlib figures and other `.pkl` compatible objects.

```python

# Save a model (torch, tensorflow, sklearn, jax, numpy)

import torchvision.models as models

model = models.resnet18()

log.save_model(model)

# Save a matplotlib figure as .png

fig, ax = plt.subplots()

log.save_plot(fig)

# You can also save (somewhat) arbitrary objects .pkl

some_dict = {"hi" : "there"}

log.save_extra(some_dict)

```

Or do everything in a single line...

```python

log.update(time_tic, stats_tic, model, fig, extra, save=True)

```

### File Structure & Re-Loading 📚

![](https://github.com/mle-infrastructure/mle-logging/blob/main/docs/mle_logger_structure.png?raw=true)

The `MLELogger` will create a nested directory, which looks as follows:

```

experiment_dir

├── extra: Stores saved .pkl object files

├── figures: Stores saved .png figures

├── logs: Stores .hdf5 log files (meta, stats, time)

├── models: Stores different model checkpoints

    ├── init: Stores initial checkpoint

    ├── final: Stores most recent checkpoint

    ├── every_k: Stores every k-th checkpoint provided in update

    ├── top_k: Stores portfolio of top-k checkpoints based on performance

├── tboards: Stores tensorboards for model checkpointing

├── .json: Copy of configuration file (if provided)

```

For visualization and post-processing load the results via

```python

from mle_logging import load_log

log_out = load_log("experiment_dir/")

# The results can be accessed via meta, stats and time keys

# >>> log_out.meta.keys()

# odict_keys(['experiment_dir', 'extra_storage_paths', 'fig_storage_paths', 'log_paths', 'model_ckpt', 'model_type'])

# >>> log_out.stats.keys()

# odict_keys(['test_loss', 'train_loss'])

# >>> log_out.time.keys()

# odict_keys(['time', 'num_epochs', 'num_updates', 'time_elapsed'])

```

If an experiment was aborted, you can reload and continue the previous run via the `reload=True` option:

```python

log = MLELogger(time_to_track=['num_updates', 'num_epochs'],

                what_to_track=['train_loss', 'test_loss'],

                experiment_dir="experiment_dir/",

                model_type='torch',

                reload=True)

```

## Installation ⏳

A PyPI installation is available via:

```

pip install mle-logging

```

If you want to get the most recent commit, please install directly from the repository:

```

pip install git+https://github.com/mle-infrastructure/mle-logging.git@main

```

## Advanced Options 🚴

### Merging Multiple Logs 👫

**Merging Multiple Random Seeds** 🌱 + 🌱

```python

from mle_logging import merge_seed_logs

merge_seed_logs("multi_seed.hdf", "experiment_dir/")

log_out = load_log("experiment_dir/")

# >>> log.eval_ids

# ['seed_1', 'seed_2']

```

**Merging Multiple Configurations** 🔖 + 🔖

```python

from mle_logging import merge_config_logs, load_meta_log

merge_config_logs(experiment_dir="experiment_dir/",

                  all_run_ids=["config_1", "config_2"])

meta_log = load_meta_log("multi_config_dir/meta_log.hdf5")

# >>> log.eval_ids

# ['config_2', 'config_1']

# >>> meta_log.config_1.stats.test_loss.keys()

# odict_keys(['mean', 'std', 'p50', 'p10', 'p25', 'p75', 'p90']))

```

### Plotting of Logs 🧑‍🎨

```python

meta_log = load_meta_log("multi_config_dir/meta_log.hdf5")

meta_log.plot("train_loss", "num_updates")

```

### Storing Checkpoint Portfolios 📂

**Logging every k-th checkpoint update** ❗ ⏩ ... ⏩ ❗

```python

# Save every second checkpoint provided in log.update (stored in models/every_k)

log = MLELogger(time_to_track=['num_updates', 'num_epochs'],

                what_to_track=['train_loss', 'test_loss'],

                experiment_dir='every_k_dir/',

                model_type='torch',

                ckpt_time_to_track='num_updates',

                save_every_k_ckpt=2)

```

**Logging top-k checkpoints based on metric** 🔱

```python

# Save top-3 checkpoints provided in log.update (stored in models/top_k)

# Based on minimizing the test_loss metric

log = MLELogger(time_to_track=['num_updates', 'num_epochs'],

                what_to_track=['train_loss', 'test_loss'],

                experiment_dir="top_k_dir/",

                model_type='torch',

                ckpt_time_to_track='num_updates',

                save_top_k_ckpt=3,

                top_k_metric_name="test_loss",

                top_k_minimize_metric=True)

```

### Weights&Biases Backend Integration 🧑‍🎨

You can also use W&B as a backend for logging. All results are stored as before but additionally we report to the W&B server:

```python

# Provide all configuration details as option

log = MLELogger(time_to_track=['num_updates', 'num_epochs'],

                what_to_track=['train_loss', 'test_loss'],

                use_wandb=True,

                wandb_config={

                  "key": "sadfasd",  # Only needed if not logged in

                  "entity": "roberttlange",  # Only needed if not logged in

                  "project": "some-project-name",

                  "group": "some-group-name"

                })

```

### Citing the MLE-Infrastructure ✏️

If you use `mle-logging` in your research, please cite it as follows:

```

@software{mle_infrastructure2021github,

  author = {Robert Tjarko Lange},

  title = {{MLE-Infrastructure}: A Set of Lightweight Tools for Distributed Machine Learning Experimentation},

  url = {http://github.com/mle-infrastructure},

  year = {2021},

}

```

## Development 👷

You can run the test suite via `python -m pytest -vv tests/`. If you find a bug or are missing your favourite feature, feel free to create an issue and/or start [contributing](CONTRIBUTING.md) 🤗.