Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sinhrks/daskperiment

Reproducibility for Humans: A lightweight tool to perform reproducible machine learning experiment.
https://github.com/sinhrks/daskperiment

dask machine-learning reproducibility

Last synced: 3 months ago
JSON representation

Reproducibility for Humans: A lightweight tool to perform reproducible machine learning experiment.

Host: GitHub
URL: https://github.com/sinhrks/daskperiment
Owner: sinhrks
License: bsd-3-clause
Created: 2019-01-24T23:34:59.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2019-04-24T05:36:47.000Z (over 5 years ago)
Last Synced: 2024-06-23T16:31:59.775Z (5 months ago)
Topics: dask, machine-learning, reproducibility
Language: Python
Homepage:
Size: 2.22 MB
Stars: 24
Watchers: 3
Forks: 5
Open Issues: 17
Metadata Files:
- Readme: README.rst
- License: LICENSE

Awesome Lists containing this project

README

        daskperiment

============

.. image:: https://img.shields.io/pypi/v/daskperiment.svg

  :target: https://pypi.python.org/pypi/daskperiment/

.. image:: https://readthedocs.org/projects/daskperiment/badge/?version=latest

  :target: http://daskperiment.readthedocs.org/en/latest/

  :alt: Latest Docs

.. image:: https://travis-ci.org/sinhrks/daskperiment.svg?branch=master

  :target: https://travis-ci.org/sinhrks/daskperiment

.. image:: https://codecov.io/gh/sinhrks/daskperiment/branch/master/graph/badge.svg

  :target: https://codecov.io/gh/sinhrks/daskperiment

Overview

~~~~~~~~

`daskperiment` is a tool to perform reproducible machine learning experiment.

It allows users to define and manage the history of trials

(given parameters, results and execution environment).

The package is built on `Dask`, a package for parallel computing with task

scheduling. Each experiment trial is internally expressed as `Dask` computation

graph, and can be executed in parallel.

Benefits

~~~~~~~~

- Compatibility with standard Python/Jupyter environment (and optionally with standard KVS).

  - No need to set up server applications

  - No need to registrate on any cloud services

  - Run on standard / customized Python shells

- Intuitive user interface

  - Few modifications on existing codes are needed

  - Trial histories are logged automatically (no need to write additional codes for logging)

  - `Dask` compatible API

  - Easily accessible experiments history (with `pandas` basic operations)

  - Less managiment works on Git (no need to make branch per trials)

  - (Experimental) Web dashboard to manage trial history

- Traceability of experiment related information

  - Trial result and its (hyper) parameters.

  - Code contexts

  - Environment information

    - Device information

    - OS information

    - Python version

    - Installed Python packages and its version

    - Git information

- Reproducibility

  - Check function purity (each step should return the same output for the same inputs)

  - Automatic random seeding

- Auto saving and loading of previous experiment history

- Parallel execution of experiment steps

- Experiment sharing

  - Redis backend

  - MongoDB backend

Future Scope

~~~~~~~~~~~~

- More efficient execution.

  - Omit execution if depending parameters are the same

  - Distributed execution