https://github.com/mmourafiq/pandas2sklearn

An integration of pandas dataframes with scikit learn.
https://github.com/mmourafiq/pandas2sklearn

Last synced: 3 months ago
JSON representation

An integration of pandas dataframes with scikit learn.

Host: GitHub
URL: https://github.com/mmourafiq/pandas2sklearn
Owner: mmourafiq
License: mit
Created: 2015-05-06T22:38:29.000Z (about 10 years ago)
Default Branch: master
Last Pushed: 2023-07-06T21:05:00.000Z (about 2 years ago)
Last Synced: 2025-03-26T23:42:45.921Z (4 months ago)
Language: Python
Homepage:
Size: 22.5 KB
Stars: 6
Watchers: 1
Forks: 2
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        [![Build Status](https://travis-ci.org/mouradmourafiq/pandas2sklearn.svg?branch=master)](https://travis-ci.org/mouradmourafiq/pandas2sklearn)

# pandas2sklearn

An integration of [pandas](http://pandas.pydata.org/) dataframes with [scikit learn](http://scikit-learn.org/stable/).

The module contains:

 * dealing with dataframes in a scikit learn `DataSet` fashion.

 * transformation mechanism that can be easily integrated in scikit learn pipelines, `DataSetTransformer`.

 

# Installation

The module can be easily installed with pip:

```conslole

> pip install pandas2sklearn

```

# Tests

The module contains some basic testing of the provided functionalities.

```console

> py.test

```

# Usage

The module contains two classes:

## DataSet

The `DataSet` is wrapper around pandas `DataFrame`, that converts you can use to select:

 * id

 * features

 * target

 

Example, suppose we have a `DataFrame` that has the following columns;

`df.coumns` = `id, FN1, FN2, FN3, FN4, FN5, FC1, FC2, FC3, FC4, FC5, FC6, target`

```python

from pandas_sklearn import DataSet

dataset = DataSet(df, target_column='target', id_column='id')

dataset.has_target() == True

dataset.has_id() == True

dataset.target == df['target']

dataset.id == df['id']

dataset.target_names == ['FN1', 'FN2', 'FN3', 'FN4', 'FN5', 'FC1', 'FC2', 'FC3', 'FC4', 'FC5', 'FC6']

dataset.data == df[['FN1', 'FN2', 'FN3', 'FN4', 'FN5', 'FC1', 'FC2', 'FC3', 'FC4', 'FC5', 'FC6']]

# removing some features that are not needed FN4, FN5, FC1, FC5, FC6

dataset.set_feature_names(usage=DataSet.EXCLUDE, columns=['FN4', 'FN5', 'FC1', 'FC5', 'FC6'])

dataset.target_names == ['FN1', 'FN2', 'FN3', 'FC2', 'FC3', 'FC4']

# converting the dataset to dictionary

dataset.to_dict() == [

    {'FN1': 12, 'FN2': 23, 'FC2': 'coffee', 'FC2': 'xbox one', 'FC4': 'inch'},

    ...

]

```

## DataSetTransformer

A feature wise transformer, applies a scikit-learn transformer to one or more features. e.g.

```python

DataSetTransformer([

    (['petal length (cm)', 'petal width (cm)'], StandardScaler()),

    ('sepal length (cm)', MinMaxScaler()),

    ('sepal width (cm)', None),

]))

```

It could be used together with pipelines, e.g.

```python

pipeline = Pipeline([

    ('preprocess', DataSetTransformer([

        (['petal length (cm)', 'petal width (cm)'], StandardScaler()),

        ('sepal length (cm)', MinMaxScaler()),

        ('sepal width (cm)', None),

    ])),

    ('classify', SVC(kernel='linear'))

])

```

# Credit

The `DataSetTransformer` is based on the work of Ben Hamner and Paul Butler.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mmourafiq/pandas2sklearn

Awesome Lists containing this project

README