Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/evizero/supervisedlearning.jl
Front-end interface for supervised machine learning
https://github.com/evizero/supervisedlearning.jl
Last synced: about 2 months ago
JSON representation
Front-end interface for supervised machine learning
- Host: GitHub
- URL: https://github.com/evizero/supervisedlearning.jl
- Owner: Evizero
- License: other
- Created: 2015-08-20T14:29:39.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2016-05-13T16:00:16.000Z (over 8 years ago)
- Last Synced: 2024-10-12T14:51:05.599Z (3 months ago)
- Language: Julia
- Size: 49.8 KB
- Stars: 5
- Watchers: 6
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# SupervisedLearning
[![Project Status: Suspended - Initial development has started, but there has not yet been a stable, usable release; work has been stopped for the time being but the author(s) intend on resuming work.](http://www.repostatus.org/badges/latest/suspended.svg)](http://www.repostatus.org/#suspended)
[![License](http://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat)](LICENSE.md)Work in progress for a front-end supervised learning framework. Currently the focus is on creating a pure Julia package for SVMs in [KSVM.jl](https://github.com/Evizero/KSVM.jl)
[![Build Status](https://travis-ci.org/Evizero/SupervisedLearning.jl.svg?branch=master)](https://travis-ci.org/Evizero/SupervisedLearning.jl)
The goal of this library is manyfold:
- **Education:** allow the user to play around with the models, solvers, etc. for educational purposes. Provide a good base for course exercises. For example visualizing the learning curve of neural networks using different optimization algorithms.
- **Research:** Swap out parts of the machine learning pipeline with custom implementations without losing the ability to utilize the rest of the framework. For example to prototype new prediction models.
- **Application:** Porcelain interface to apply machine learning to given datasets in a convenient way. There might be multiple high-level interface for different usergroups (e.g. one that mimics R's caret package)## Planned High-level API
The following code should already work
```Julia
using SupervisedLearning
using RDatasets
using UnicodePlotsdata = dataset("datasets", "mtcars")
# In this case the dataset will be in-memory and encoded to -1, 1
# There will also be support for datastreaming from HDF5
problemSet = dataSource(AM ~ DRat + WT + DRat&WT, data, SignedClassEncoding)# Convenient to use with UnicodePlots
print(barplot(classDistribution(problemSet)...))# Methods for splitting the abstract data sets
trainSet, testSet = splitTrainTest!(problemSet, p_train = .75)# Specifies the model and modelspecific parameter
model = Classifier.LogisticRegression(l2_coef=0.1)# Backend for neural networks will be Mocha.jl or OnlineAI.jl
# model = Classifier.FeedForwardNeuralNetwork([4,5,7],[ReLu,ReLu,ReLu])# train! mutates the model state
# * the do-block is the callback function which also allows for early stopping
# * In the regression case Solver.GradientDescent() will result in using Regression.jl,
# otherwise (in most deterministic cases) Optim.jl
# * There will also be stochastic gradient descent with minibatches
train!(model, trainSet, Solver.GradientDescent(), max_iter = 10000, break_every = 100) do
# You can also use the callback to execute any code
# For example to print informative messages
println("Testset accuracy: ", accuracy(model, testSet))
# You can easily store custom learning curves or other arbitrary values
# They will be linked to the correct iteration automatically
remember!(model, :testsetCost, cost(model, testSet))
end# The loss of the training set is stored by default and can be accessed with trainingCurve
# x is a Vector{Int} of iterations with stepsize break_every,
# y is a Vector{Float64} where y[i] is the cost of the trainSet at x[i]
x, y = trainingCurve(model)
print(lineplot(x, y, title = "Learning curve for trainSet"))# Customly stored curves can be accessed with "history"
# x is a Vector{Int} of iterations (exact values depend on when you called remember!),
# y is a Vector{Float64} where y[i] is the cost of the testSet at x[i]
x, y = history(model, :testsetCost)
print(lineplot(x, y, title = "Learning curve for testSet"))ŷ = predict(model, testSet) # what the model says
t = groundtruth(testSet) # what it should be
```## Planned Mid-level API
This is just a rough draft and still object to change
```Julia
using SupervisedLearning
using RDatasetsdata = dataset("datasets", "mtcars")
# In this case the dataset will be in-memory.
# Specifying the encoding is not necessary.
# The model will select the encoding it needs automatically
# Trees for example don't need an encoding at all.
problemSet = dataSource(AM ~ DRat + WT, data)# Methods for splitting the abstract data sets
trainSet, testSet = splitTrainTest!(problemSet, p_train = .75)# Perform a gridsearch over an arbitrary modelspace
gsResult = gridsearch([.001, .01, .1], [.0001, .0003]) do lr, lambda# Perform cross validation to get a good estimate for the hyperparameter performance
cvResult = crossvalidate(trainSet, k = 5) do trainFold, valFold# Specify the model and model-specific parameters
model = Classifier.LogisticRegression(l2_coef = lambda)# Specify the solver and solver-specific parameters
solver = Solver.NaiveGradientDescent(learning_rate = lr, normalize_gradient = false)# train! mutates the model state
train!(model, trainFold, solver, max_iter = 1000)# make sure to return the trained model
model
end# You can return a model or a cvResult to gridsearch
cvResult
end# Plot the final accuracy of all trained models using UnicodePlots
print(barplot(accuracy(gsResult, testSet)...))# Get the best model
bestModel = gsResult.bestModel
ŷ = predict(bestModel, testSet)
```