Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/picnicml/doddle-model
:cake: doddle-model: machine learning in Scala.
https://github.com/picnicml/doddle-model
breeze data-science doddle-model machine-learning scala
Last synced: about 2 months ago
JSON representation
:cake: doddle-model: machine learning in Scala.
- Host: GitHub
- URL: https://github.com/picnicml/doddle-model
- Owner: picnicml
- License: apache-2.0
- Created: 2018-02-09T13:54:54.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2024-07-29T21:13:26.000Z (6 months ago)
- Last Synced: 2024-08-04T00:06:28.770Z (5 months ago)
- Topics: breeze, data-science, doddle-model, machine-learning, scala
- Language: Scala
- Homepage: https://picnicml.github.io
- Size: 580 KB
- Stars: 137
- Watchers: 14
- Forks: 23
- Open Issues: 34
-
Metadata Files:
- Readme: README.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- awesome-scala - doddle-model - model: machine learning in Scala. | ![GitHub stars](https://img.shields.io/github/stars/picnicml/doddle-model) ![GitHub commit activity](https://img.shields.io/github/commit-activity/y/picnicml/doddle-model) (Table of Contents / Science and Data Analysis)
README
---
Latest Release
Build Status
Coverage
Code Quality
License
Chat
---
`doddle-model` is an in-memory machine learning library that can be summed up with three main characteristics:
* it is built on top of [Breeze](https://github.com/scalanlp/breeze)
* it provides [immutable estimators](https://en.wikipedia.org/wiki/Immutable_object) that are a _doddle_ to use in parallel code
* it exposes its functionality through a [scikit-learn](https://github.com/scikit-learn/scikit-learn)-like API [2] in idiomatic Scala using [typeclasses](https://en.wikipedia.org/wiki/Type_class)#### How does it compare to existing solutions?
`doddle-model` takes the position of scikit-learn in Scala and as a consequence, it's much more lightweight than e.g. Spark ML. Fitted models can be deployed anywhere, from simple applications to concurrent, distributed systems built with Akka, Apache Beam or a framework of your choice. Training of estimators happens in-memory, which is advantageous unless you are dealing with enormous datasets that absolutely cannot fit into RAM.### Installation
The project is published for Scala versions 2.11, 2.12 and 2.13. Add the dependency to your SBT project definition:
```scala
libraryDependencies ++= Seq(
"io.github.picnicml" %% "doddle-model" % "",
// add optionally to utilize native libraries for a significant performance boost
"org.scalanlp" %% "breeze-natives" % "1.0"
)
```
Note that the latest version is displayed in the _Latest Release_ badge above and that the _v_ prefix should be removed from the SBT definition.### Getting Started
For a complete list of code examples see [doddle-model-examples](https://github.com/picnicml/doddle-model-examples).### Contributing
Want to help us? :raised_hands: We have a [document](https://github.com/picnicml/doddle-model/blob/master/.github/CONTRIBUTING.md) that will make deciding how to do that much easier.### Performance
Performance of implementations is described [here](https://github.com/picnicml/doddle-model/wiki/Performance). Also, take a peek at what's written in that document if you encounter `java.lang.OutOfMemoryError: Java heap space`.### Core Maintainers
This is a collaborative project which wouldn't be possible without all the [awesome contributors](https://github.com/picnicml/doddle-model/graphs/contributors). The core team currently consists of the following developers:
- [@inejc](https://github.com/inejc)
- [@matejklemen](https://github.com/matejklemen)### Resources
* [1] [Pattern Recognition and Machine Learning, Christopher Bishop](http://www.springer.com/gp/book/9780387310732)
* [2] [API design for machine learning software: experiences from the scikit-learn project, L. Buitinck et al.](https://arxiv.org/abs/1309.0238)
* [3] [UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science, Dua, D. and Karra Taniskidou, E.](http://archive.ics.uci.edu/ml)