Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/evizero/supervisedlearning.jl

Front-end interface for supervised machine learning
https://github.com/evizero/supervisedlearning.jl

Last synced: about 2 months ago
JSON representation

Front-end interface for supervised machine learning

Host: GitHub
URL: https://github.com/evizero/supervisedlearning.jl
Owner: Evizero
License: other
Created: 2015-08-20T14:29:39.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2016-05-13T16:00:16.000Z (over 8 years ago)
Last Synced: 2024-10-12T14:51:05.599Z (3 months ago)
Language: Julia
Size: 49.8 KB
Stars: 5
Watchers: 6
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

        # SupervisedLearning

[![Project Status: Suspended - Initial development has started, but there has not yet been a stable, usable release; work has been stopped for the time being but the author(s) intend on resuming work.](http://www.repostatus.org/badges/latest/suspended.svg)](http://www.repostatus.org/#suspended)

[![License](http://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat)](LICENSE.md)

Work in progress for a front-end supervised learning framework. Currently the focus is on creating a pure Julia package for SVMs in [KSVM.jl](https://github.com/Evizero/KSVM.jl)

[![Build Status](https://travis-ci.org/Evizero/SupervisedLearning.jl.svg?branch=master)](https://travis-ci.org/Evizero/SupervisedLearning.jl)

The goal of this library is manyfold:

- **Education:** allow the user to play around with the models, solvers, etc. for educational purposes. Provide a good base for course exercises. For example visualizing the learning curve of neural networks using different optimization algorithms.

- **Research:** Swap out parts of the machine learning pipeline with custom implementations without losing the ability to utilize the rest of the framework. For example to prototype new prediction models.

- **Application:** Porcelain interface to apply machine learning to given datasets in a convenient way. There might be multiple high-level interface for different usergroups (e.g. one that mimics R's caret package)

## Planned High-level API

The following code should already work

```Julia

using SupervisedLearning

using RDatasets

using UnicodePlots

data = dataset("datasets", "mtcars")

# In this case the dataset will be in-memory and encoded to -1, 1

# There will also be support for datastreaming from HDF5

problemSet = dataSource(AM ~ DRat + WT + DRat&WT, data, SignedClassEncoding)

# Convenient to use with UnicodePlots

print(barplot(classDistribution(problemSet)...))

# Methods for splitting the abstract data sets

trainSet, testSet = splitTrainTest!(problemSet, p_train = .75)

# Specifies the model and modelspecific parameter

model = Classifier.LogisticRegression(l2_coef=0.1)

# Backend for neural networks will be Mocha.jl or OnlineAI.jl

# model = Classifier.FeedForwardNeuralNetwork([4,5,7],[ReLu,ReLu,ReLu])

# train! mutates the model state

#  * the do-block is the callback function which also allows for early stopping

#  * In the regression case Solver.GradientDescent() will result in using Regression.jl, 

#    otherwise (in most deterministic cases) Optim.jl

#  * There will also be stochastic gradient descent with minibatches

train!(model, trainSet, Solver.GradientDescent(), max_iter = 10000, break_every = 100) do

  # You can also use the callback to execute any code

  # For example to print informative messages

  println("Testset accuracy: ", accuracy(model, testSet))

  

  # You can easily store custom learning curves or other arbitrary values

  # They will be linked to the correct iteration automatically

  remember!(model, :testsetCost, cost(model, testSet))

end

# The loss of the training set is stored by default and can be accessed with trainingCurve

# x is a Vector{Int} of iterations with stepsize break_every,

# y is a Vector{Float64} where y[i] is the cost of the trainSet at x[i]

x, y = trainingCurve(model)

print(lineplot(x, y, title = "Learning curve for trainSet"))

# Customly stored curves can be accessed with "history"

# x is a Vector{Int} of iterations (exact values depend on when you called remember!),

# y is a Vector{Float64} where y[i] is the cost of the testSet at x[i]

x, y = history(model, :testsetCost)

print(lineplot(x, y, title = "Learning curve for testSet"))

ŷ = predict(model, testSet) # what the model says

t = groundtruth(testSet) # what it should be

```

## Planned Mid-level API

This is just a rough draft and still object to change

```Julia

using SupervisedLearning

using RDatasets

data = dataset("datasets", "mtcars")

# In this case the dataset will be in-memory.

# Specifying the encoding is not necessary.

# The model will select the encoding it needs automatically

# Trees for example don't need an encoding at all.

problemSet = dataSource(AM ~ DRat + WT, data)

# Methods for splitting the abstract data sets

trainSet, testSet = splitTrainTest!(problemSet, p_train = .75)

# Perform a gridsearch over an arbitrary modelspace

gsResult = gridsearch([.001, .01, .1], [.0001, .0003]) do lr, lambda

  # Perform cross validation to get a good estimate for the hyperparameter performance

  cvResult = crossvalidate(trainSet, k = 5) do trainFold, valFold

    # Specify the model and model-specific parameters

    model = Classifier.LogisticRegression(l2_coef = lambda)

    # Specify the solver and solver-specific parameters

    solver = Solver.NaiveGradientDescent(learning_rate = lr, normalize_gradient = false)

    # train! mutates the model state

    train!(model, trainFold, solver, max_iter = 1000)

    # make sure to return the trained model

    model

  end

  # You can return a model or a cvResult to gridsearch

  cvResult

end

# Plot the final accuracy of all trained models using UnicodePlots

print(barplot(accuracy(gsResult, testSet)...))

# Get the best model

bestModel = gsResult.bestModel

ŷ = predict(bestModel, testSet)

```