Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/rmcgibbo/osprey

osprey is the plumbing for machine learning hyperparameter optimization.
https://github.com/rmcgibbo/osprey

Last synced: about 2 months ago
JSON representation

osprey is the plumbing for machine learning hyperparameter optimization.

Host: GitHub
URL: https://github.com/rmcgibbo/osprey
Owner: rmcgibbo
License: apache-2.0
Fork: true (msmbuilder/osprey)
Created: 2014-12-15T07:35:09.000Z (almost 10 years ago)
Default Branch: master
Last Pushed: 2015-12-05T00:41:10.000Z (almost 9 years ago)
Last Synced: 2024-07-10T12:06:33.176Z (3 months ago)
Language: Python
Homepage: http://osprey.rtfd.org
Size: 256 KB
Stars: 3
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        Osprey

======

[![Build Status](https://travis-ci.org/pandegroup/osprey.svg?branch=master)](https://travis-ci.org/pandegroup/osprey) [![PyPi version](https://pypip.in/v/osprey/badge.svg)](https://pypi.python.org/pypi/osprey/) [![Supported Python versions](https://pypip.in/py_versions/osprey/badge.svg)](https://pypi.python.org/pypi/osprey/) [![License](https://pypip.in/license/osprey/badge.svg)](https://pypi.python.org/pypi/osprey/)

[![Documentation Status](https://readthedocs.org/projects/osprey/badge/?version=latest)](http://osprey.rtfd.org)

osprey is an easy-to-use tool for hyperparameter optimization for machine

learning algorithms in python using scikit-learn (or using scikit-learn

compatible APIs).

Each osprey experiment combines an dataset, an estimator, a search space

(and engine), cross validation and asynchronous serialization for distributed

parallel optimization of model hyperparameters.



  Full documentation



Example (with [MSMBuilder](https://github.com/msmbuilder/msmbuilder) models/datasets)

-------------------------------------------------------------

```

$ cat config.yaml

estimator:

  eval_scope: msmbuilder

  eval: |

    Pipeline([

        ('featurizer', DihedralFeaturizer(types=['phi', 'psi'])),

        ('cluster', MiniBatchKMeans()),

        ('msm', MarkovStateModel(n_timescales=5, verbose=False)),

    ])

search_space:

  cluster__n_clusters:

    min: 10

    max: 100

    type: int

  featurizer__types:

    choices:

      - ['phi', 'psi']

      - ['phi', 'psi', 'chi1']

   type: enum

cv: 5

dataset_loader:

  name: mdtraj

  params:

    trajectories: ~/local/msmbuilder/Tutorial/XTC/*/*.xtc

    topology: ~/local/msmbuilder/Tutorial/native.pdb

    stride: 1

trials:

    uri: sqlite:///osprey-trials.db

```

Then run `osprey worker`. You can run multiple parallel instances

of `osprey worker` simultaneously on a cluster too.

```

$ osprey worker config.yaml

======================================================================

= osprey is a tool for machine learning hyperparameter optimization. =

======================================================================

osprey version:  0.2_10_g18392d9_dirty-py2.7.egg

time:            October 27, 2014 10:44 PM

hostname:        dn0a230538.sunet

cwd:             /private/var/folders/yb/vpt17lxs67vf02qpvgvjrc5m0000gn/T/tmpDgBwlU

pid:             99407

Loading config file:     config.yaml...

Loading trials database: sqlite:///osprey-trials.db (table = "trials")...

Loading dataset...

  100 elements without labels

Instantiated estimator:

  Pipeline(steps=[('featurizer', DihedralFeaturizer(sincos=True, types=['phi', 'psi'])), ('tica', tICA(gamma=0.05, lag_time=1, n_components=4, weighted_transform=False)), ('cluster', MiniBatchKMeans(batch_size=100, compute_labels=True, init='k-means++',

        init_size=None, max_iter=100, max_no_improvement=...toff=1, lag_time=1, n_timescales=5, prior_counts=0,

         reversible_type='mle', verbose=False))])

Hyperparameter search space:

  featurizer__types        	(enum)    choices = (['phi', 'psi'], ['phi', 'psi', 'chi1'])

  cluster__n_clusters      	(int)         10 <= x <= 100

----------------------------------------------------------------------

Beginning iteration                                              1 / 1

----------------------------------------------------------------------

History contains: 0 trials

Choosing next hyperparameters with random...

  {'cluster__n_clusters': 20, 'featurizer__types': ['phi', 'psi']}

Fitting 5 folds for each of 1 candidates, totalling 5 fits

[Parallel(n_jobs=1)]: Done   1 jobs       | elapsed:    0.3s

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    1.8s finished

---------------------------------

Success! Model score = 4.080646

(best score so far   = 4.080646)

---------------------------------

1/1 models fit successfully.

time:         October 27, 2014 10:44 PM

elapsed:      4 seconds.

osprey worker exiting.

```

You can dump the database to JSON or CSV with `osprey dump`.

Installation

------------

```

# grab the latest version from github

$ pip install git+git://github.com/pandegroup/osprey.git

```

```

# or clone the repo yourself and run `setup.py`

$ git clone https://github.com/pandegroup/osprey.git

$ cd osprey && python setup.py install

```

Dependencies

------------

- `six`

- `pyyaml`

- `numpy`

- `scikit-learn`

- `sqlalchemy`

- `hyperopt` (recommended, required for `engine=hyperopt_tpe`)

- `scipy` (optional, for testing)

- `nose` (optional, for testing)

On python2.6, the `argparse` and `importlib` backports are also required