An open API service indexing awesome lists of open source software.

https://github.com/vsoch/watchme-sklearn

use watchme psutils monitor wrapper to record metrics for a set of sklearn functions
https://github.com/vsoch/watchme-sklearn

decorators monitoring psutils python watchme watchme-psutils

Last synced: about 1 month ago
JSON representation

use watchme psutils monitor wrapper to record metrics for a set of sklearn functions

Awesome Lists containing this project

README

          

# Watchme Sklearn

This is an example of using [watchme](https://vsoch.github.io/watchme),
specifically the [psutils](https://vsoch.github.io/watchme/watchers/psutils/#1-the-monitor-pid-task) decorator,
to monitor resource usage for various functions run within Python. Since we build the dependencies
into a Singularity container, and since Singularity has access to
our home, the watcher and data are saved on the host with no extra work needed.

**Note** I created the watcher repository with watchme first, and
then added the extra files for the README.md and [container](Singularity).
If you use a decorator, you don't technically need to do this - the
Python files being decorated can be separate from the watchme base with
results. I wanted to keep them together, so I chose to add these files
after.

## 1. Build the Container

First, build the Singularity container with Python dependencies installed:

```bash
sudo singularity build watchme-sklearn.sif Singularity
```

## 2. Run

Next, running the container is going to create a watcher called "watchme-sklearn"
which by default will go into your `$HOME/.watchme` folder. You'll see
the watcher generated, followed by the function runs.

```bash
singularity run watchme-sklearn.sif

Adding watcher /home/vanessa/.watchme/watchme-sklearn...
Generating watcher config /home/vanessa/.watchme/watchme-sklearn/watchme.cfg

=============================================================================
Manifold learning on handwritten digits: Locally Linear Embedding, Isomap...
=============================================================================

An illustration of various embeddings on the digits dataset.

The RandomTreesEmbedding, from the :mod:`sklearn.ensemble` module, is not
technically a manifold embedding method, as it learn a high-dimensional
representation on which we apply a dimensionality reduction method.
However, it is often useful to cast a dataset into a representation in
which the classes are linearly-separable.

t-SNE will be initialized with the embedding that is generated by PCA in
this example, which is not the default setting. It ensures global stability
of the embedding, i.e., the embedding does not depend on random
initialization.

Linear Discriminant Analysis, from the :mod:`sklearn.discriminant_analysis`
module, and Neighborhood Components Analysis, from the :mod:`sklearn.neighbors`
module, are supervised dimensionality reduction method, i.e. they make use of
the provided labels, contrary to other methods.

Computing random projection
Computing PCA projection
Computing Linear Discriminant Analysis projection
Computing Isomap projection
Done.
Computing LLE embedding
Done. Reconstruction error: 1.63546e-06
Computing modified LLE embedding
Done. Reconstruction error: 0.360659
Computing Hessian LLE embedding
Done. Reconstruction error: 0.212804
Computing LTSA embedding
Done. Reconstruction error: 0.212804
Computing MDS embedding
Done. Stress: 157308701.864713
Computing Spectral embedding
Computing t-SNE embedding
```

The functions are run fairly quickly, so we measure every quarter of a second.
Watchme creates the the git repo and commits data to it (each time the decorator function is run,
a `decorator-psutils-` folder is created with a result.json. Every
commit will coincide with a list of timepoints run for a single function. Here is
what the repository looks like after the run (without adding these files
yet):

```bash
$ tree
.
├── decorator-psutils-hessian_lle_embedding
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-isomap_projection
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-lda_projection
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-lle_embedding
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-ltsa_embedding
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-mds_embedding
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-modified_lle_embedding
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-pca_projection
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-plot_digits
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-plot_embedding
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-random_2d_projection
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-spectral_embedding
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-tsne_embedding
│   ├── result.json
│   └── TIMESTAMP
└── watchme.cfg

13 directories, 27 files
```

And you would next be able to push directly to a new GitHub repository:

```bash
cd $HOME/.watchme/watchme-sklearn
git remote add origin https://github.com/vsoch/watchme-sklearn.git
git push -u origin master
``

(add a README to have better documentation about what you've done).
Or you can export full data for any particular decorator to analyze:

```bash
watchme export watchme-sklearn decorator-psutils-plot_digits result.json --json
```

What is exporter? Each commit coincides
Here is a programmatic way to export all results to a "data" folder in the repository:

```bash
mkdir -p data
for folder in $(find . -maxdepth 1 -type d -name 'decorator*' -print); do
folder="${folder//.\/}"
watchme export watchme-sklearn $folder --out data/$folder.json result.json --json --force
done
```

## Advanced

If you already have a watchme repository, and it's located somewhere non-traditional,
you can have watchme generate results in the folder where you happen to be by
exporting the WATCHME_BASE_DIR first.

```bash
export WATCHME_BASE_DIR=$(dirname $PWD)
```

And for a run from within a Singularity, container you would need to have this export as a `SINGULARITYENV_`

```bash
export SINGULARITYENV_WATCHME_BASE_DIR=$(dirname $PWD)
```