https://github.com/kvh/ramp

Rapid Machine Learning Prototyping in Python
https://github.com/kvh/ramp

Last synced: 6 months ago
JSON representation

Rapid Machine Learning Prototyping in Python

Host: GitHub
URL: https://github.com/kvh/ramp
Owner: kvh
License: mit
Archived: true
Created: 2012-05-04T08:22:19.000Z (about 13 years ago)
Default Branch: master
Last Pushed: 2015-11-11T00:11:56.000Z (over 9 years ago)
Last Synced: 2024-11-06T22:52:02.758Z (6 months ago)
Language: Python
Size: 717 KB
Stars: 650
Watchers: 63
Forks: 98
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

starred-awesome - ramp - Rapid Machine Learning Prototyping in Python (Python)

README

        Ramp - Rapid Machine Learning Prototyping

=========================================

Ramp is a python library for rapid prototyping of machine learning

solutions. It's a light-weight [pandas](http://pandas.pydata.org)-based 

machine learning framework pluggable with existing 

python machine learning and statistics tools 

([scikit-learn](http://scikit-learn.org), [rpy2](http://rpy.sourceforge.net/rpy2.html), etc.).

Ramp provides a simple, declarative syntax for

exploring features, algorithms and transformations quickly and

efficiently.

Documentation: http://ramp.readthedocs.org

**Why Ramp?**

 *  **Clean, declarative syntax**

    

 *  **Complex feature transformations**

    Chain and combine features:

```python

Normalize(Log('x'))

Interactions([Log('x1'), (F('x2') + F('x3')) / 2])

```

    Reduce feature dimension:

```python

DimensionReduction([F('x%d'%i) for i in range(100)], decomposer=PCA(n_components=3))

```

    Incorporate residuals or predictions to blend with other models:

```python

Residuals(simple_model_def) + Predictions(complex_model_def)

```

 * **Data context awareness**

    Any feature that uses the target ("y") variable will automatically respect the

    current training and test sets. Similarly, preparation data (a feature's mean and stdev, for example)

    is stored and tracked between data contexts.

 *  **Composability**

    All features, estimators, and their fits are composable, pluggable and storable.

 *  **Easy extensibility**

    Ramp has a simple API, allowing you to plug in estimators from

    scikit-learn, rpy2 and elsewhere, or easily build your own feature

    transformations, metrics, feature selectors, reporters, or estimators.

## Quick start

[Getting started with Ramp: Classifying insults](http://www.kenvanharen.com/2012/11/getting-started-with-ramp-detecting.html)

Or, the quintessential Iris example:

```python

import pandas

from ramp import *

import urllib2

import sklearn

from sklearn import decomposition

# fetch and clean iris data from UCI

data = pandas.read_csv(urllib2.urlopen(

    "http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"))

data = data.drop([149]) # bad line

columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']

data.columns = columns

# all features

features = [FillMissing(f, 0) for f in columns[:-1]]

# features, log transformed features, and interaction terms

expanded_features = (

    features +

    [Log(F(f) + 1) for f in features] +

    [

        F('sepal_width') ** 2,

        combo.Interactions(features),

    ]

)

# Define several models and feature sets to explore,

# run 5 fold cross-validation on each and print the results.

# We define 2 models and 4 feature sets, so this will be

# 4 * 2 = 8 models tested.

shortcuts.cv_factory(

    data=data,

    target=[AsFactor('class')],

    metrics=[

        [metrics.GeneralizedMCC()],

        ],

    # report feature importance scores from Random Forest

    reporters=[

        [reporters.RFImportance()],

        ],

    # Try out two algorithms

    model=[

        sklearn.ensemble.RandomForestClassifier(

            n_estimators=20),

        sklearn.linear_model.LogisticRegression(),

        ],

    # and 4 feature sets

    features=[

        expanded_features,

        # Feature selection

        [trained.FeatureSelector(

            expanded_features,

            # use random forest's importance to trim

            selectors.RandomForestSelector(classifier=True),

            target=AsFactor('class'), # target to use

            n_keep=5, # keep top 5 features

            )],

        # Reduce feature dimension (pointless on this dataset)

        [combo.DimensionReduction(expanded_features,

                            decomposer=decomposition.PCA(n_components=4))],

        # Normalized features

        [Normalize(f) for f in expanded_features],

    ]

)

```

## Status

Ramp is alpha currently, so expect bugs, bug fixes and API changes.

## Requirements

 * Numpy

 * Scipy    

 * Pandas

 * PyTables

 * Sci-kit Learn

 * gensim

## Author

Ken Van Haren. Email with feedback/questions: [email protected] [@squaredloss](http://twitter.com/squaredloss)

## Contributors

 * [John McDonnell](https://github.com/johnmcdonnell)

 * [Rob Story](https://github.com/wrobstory)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kvh/ramp

Awesome Lists containing this project

README