https://github.com/saleepeppe/logpose

Log library for python (suitable for ML prototyping and jupyter projects in general)
https://github.com/saleepeppe/logpose

data-science jupyter-notebook logging machine-learning python

Last synced: 3 months ago
JSON representation

Log library for python (suitable for ML prototyping and jupyter projects in general)

Host: GitHub
URL: https://github.com/saleepeppe/logpose
Owner: saleepeppe
License: apache-2.0
Created: 2018-03-25T18:42:45.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2018-06-28T21:28:02.000Z (about 8 years ago)
Last Synced: 2024-11-29T15:51:49.416Z (over 1 year ago)
Topics: data-science, jupyter-notebook, logging, machine-learning, python
Language: Python
Size: 34.2 KB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # logpose

Log library for Data Science written in Python (suitable for ML prototyping and jupyter projects in general).

- Do you prototype in jupyter before deploy your code?

- Do you run several python simulations and after a while you forget to keep track of your trials?

- Are you looking for a python library to store all the hyper-parameters tested tuning a ML algorithm?

**Logpose** is here for you! Logpose generates YAML files to track each simulation run. 

At the moment the library is under development. Use it but be aware that the API might change.

I daily use this library for work, so it might not have reached a stable point but it is usable.

Any suggestion, feature request or support is more than welcome.

# Install

Logpose is also installable via pip:

```

pip install logpose

```

It might land on conda in the future.

# Usage

Let us assume you have a python script that resembles to:

```python

from random import random

def messy_code(a):

    print('Crazy python stuff here {}'.format(a))

# Sorry for trolling

my_params = random() 

messy_code(my_params)

my_params_2 = random()

messy_code(my_params_2)

```

You run this script several times per day and you need to keep track of what is happening.

Here it comes logpose.

```python

import logpose as lp

# Create your logpose

my_logpose = lp.Logpose('My Simulation', 'Try to rule the world')

# Add a trace

my_logpose.add_route('Preprocessing', 'Kill all the enemies')

my_params = random()

# Keep track of your input parameters

my_logpose.add_parameter('Preprocessing', ('my_params', my_params))

messy_code(my_params)

# Get some benchmarks

my_logpose.bench_it()

my_logpose.add_trace('Prototype', 'Conquer the Russia')

my_params_2 = random()

my_logpose.add_parameter('Prototype', ('my_params', my_params_2))

messy_code(my_params_2)

my_logpose.bench_it()

```

You only need the patience of changing the `'description'` parameter in `Logpose('name', 'description')` at each run, to make the logpose meaningful (for you) in the future.

Infact, at each run logpose saves a YAML file in the folder **.lp/{name}** of your project.

# More details

A jupyter notebook containing a script for data science can be summarised in:

1. **ETL**: retrive data from the source, loading data, feature selection, feature construction, etc.

2. **ML Model**: construction of the train, test set, adoption of serveral ML algorithms, cross validation 

3. **Results**: analysis over the metrics obtained via the previous step

This script might change several times, even during the same day. The dataset as well might change.

Whenever you run the script you have to keep track of many variables such as: the number of records in the dataset, the feature selected, the features constructed, the hyper-parameters of the ML algorithm, the metrics and so on.

Logpose creates a folder where all these information will be stored into YAML files and grouped by simulation name, which can be viewed and compared later on.

### The code

The logpose module contains mainly two relevant classes:

- `Logpose` generates logs with the help of its methods,

- `History` access the logs created via a `Logpose` object.

Let us assume we have a jupyter representing a simulation with a xgboost algorithm.

#### Logpose object

The `Logpose` object is a container which creates a folder named with the same name given to the object.

```python

import logpose as lp

# Create your logpose

my_logpose = lp.Logpose('xgboost', 'Attempt number 1 with xgboost, max_depth = 15')

```

The code above creates the path `/lp/xgboost` in the directory where the code file is located.

Since a `Logpose` instance is just an empty box, it is necessary to add task to log. A task is a unit piece of code to log, and in the framework of **logpose** is called a `Route`. A `Route` be created by calling the `add_route()` method.

```python

import logpose as lp

# Create your logpose

my_logpose = lp.Logpose('xgboost', 'Attempt number 1 with xgboost, max_depth = 15')

''' CODE MISSING '''

my_logpose.add_route('Model', 'No cross validation')

import xgboost as xgb

param = {'max_depth': 2, 'eta': 1, 'silent': 1, 'objective': 'binary:logistic' }

num_round = 2

bst = xgb.train(param, dtrain, num_round)

my_logpose.bench_it('Model')

my_logpose.save()

''' CODE MISSING '''

```

The code above shows how to create a `Route` and how to close it via the method `bench_it()`.

To summarise, the methods `add_route()` and `bench_it()` are respectively the opening and closing tag, wrapping the code we aim to log.

Now, the `Logpose` instance is aware of a task to log and it is also timing it. The code written above, would create a log file in the folder **./.lp/xgboost** in which are stored the description of the `Logpose`, the characteristics (name and description) of the `Route` and its execution time. The file is saved once the meth `save()` is called.

In order to track also the xgboost parameters, we have to use the method `add_parameters()`.

```python

import logpose as lp

# Create your logpose

my_logpose = lp.Logpose('xgboost', 'Attempt number 1 with xgboost, max_depth = 15')

''' CODE MISSING '''

my_logpose.add_route('Model', 'No cross validation')

import xgboost as xgb

param = {'max_depth': 2, 'eta': 1, 'silent': 1, 'objective': 'binary:logistic' }

my_logpose.add_parameters('Model', param)

num_round = 2

my_logpose.add_parameters('Model', ('num_round', 2))

bst = xgb.train(param, dtrain, num_round)

my_logpose.bench_it('Model')

my_logpose.save()

''' CODE MISSING '''

```

In this way also the parameters (`param` and `num_round`) would get saved inside the logfile.

Linked to the same logpose instance you can add multiple routes, structuring the log in the way you like. For example, following the scheme described above:

```python

import logpose as lp

# Create your logpose

my_logpose = lp.Logpose('xgboost', 'Attempt number 1 with xgboost, max_depth = 15')

my_logpose.add_route('Preprocessing', 'Remove missing values')

''' CODE MISSING '''

my_logpose.bench_it('Preprocessing')

my_logpose.add_route('Model', 'No cross validation')

import xgboost as xgb

param = {'max_depth': 2, 'eta': 1, 'silent': 1, 'objective': 'binary:logistic' }

my_logpose.add_parameters('Model', param)

num_round = 2

my_logpose.add_parameters('Model', ('num_round', 2))

bst = xgb.train(param, dtrain, num_round)

my_logpose.bench_it('Model')

my_logpose.add_route('Results', 'Metrics over train and validation sets')

''' CODE MISSING '''

my_logpose.bench_it('Results')

my_logpose.save()

```

#### History object

If you want to check what happened in a simulation where you have used a `Logpose`, you just need to check in the **./.lp** folder. 

Alternatively, it is possible to create a `History` object.

```python

history_log = lp.History('xgboost')

my_simulations = history_log.events

```

The `History` attribute `events` provides the list of all the YAML files located in the logpose folder **./.lp/xgboost/**. To load an event:

```python

import pandas

logpose_info, logpose_df = history_log.load_event(my_simulations[0], pandas = True)

```

When the `pandas` parameter is `True`, the method `load_event()` returns 2d-tuple where the second element is a pandas dataframe.

To compare all the events, the log files stored in the folder, you can use the method `compare()`.

```python

history_df = history_log.compare(pandas = True)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/saleepeppe/logpose

Awesome Lists containing this project

README