https://github.com/pr38/dask_backward_feature_selection

Backward step-wise feature selection using Dask, scikit-learn compatible
https://github.com/pr38/dask_backward_feature_selection

dask feature-selection machine-learning python scikit-learn

Last synced: 5 months ago
JSON representation

Backward step-wise feature selection using Dask, scikit-learn compatible

Host: GitHub
URL: https://github.com/pr38/dask_backward_feature_selection
Owner: pr38
License: apache-2.0
Created: 2020-03-23T21:26:48.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2020-12-21T20:03:02.000Z (over 4 years ago)
Last Synced: 2025-01-04T15:44:08.272Z (6 months ago)
Topics: dask, feature-selection, machine-learning, python, scikit-learn
Language: Python
Size: 44.9 KB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        ## Dask Backward Feature Selection

Backward step-wise feature selection using Dask, scikit-learn compatible.

Scale out feature seletion using distributed computing/Dask!

I created this due to the fact that mlxtend's SequentialFeatureSelector did not use joblib in a Dask compatable way.

Install

-------

> pip install git+https://github.com/pr38/dask_backward_feature_selection

Example Usage

-------

```python 

import numpy as np

import pandas as pd

from sklearn.tree import DecisionTreeRegressor

from sklearn.datasets import load_boston

from dask.distributed import Client, LocalCluster

from dask_backward_feature_selection import DaskBackwardFeatureSelector

#You should be useing Dask's yarn or kubernates cluster deployments

#if you are going to be running this localy you are better off useing mlxtend's SequentialFeatureSelector 

cluster = LocalCluster(3)

client = Client(cluster)

boston = load_boston()

X = boston['data']

y = boston['target']

dfs = DaskBackwardFeatureSelector(DecisionTreeRegressor(),client)

#kwargs for DaskBackwardFeatureSelector are:

#k_features: the smallest combination of features DaskBackwardFeatureSelector will examine.

#cv: if "cv" is an int, it will refer to the number of  cross validation folds for each feature combination tested. 

#cv can also be a scikitlearn CV class.

#scoring: can be string (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.get_scorer.html#sklearn.metrics.get_scorer)

#, or a scikitlearn scoring class.

#if scatter is true, each thread in the cluster will keep a copy of the training data and estimator.

dfs.fit(X,y)

#positions of top performing combination of features in X matrix.

dfs.k_feature_idx_

#we can treat DaskBackwardFeatureSelector as an estimator after training.

dfs.predict(X)

#also DaskBackwardFeatureSelector can act as transformer.

dfs.transform(X,y)

#finally we can examine the best performing feature combinations for each step, for other use cases (ie:one-standard-error rule).

pd.DataFrame(dfs.metric_dict_ )

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pr38/dask_backward_feature_selection

Awesome Lists containing this project

README