https://github.com/rakibhhridoy/machinelearning-featureselection

Before training a model or feed a model, first priority is on data,not in model. The more data is preprocessed and engineered the more model will learn. Feature selectio one of the methods processing data before feeding the model. Various feature selection techniques is shown here.
https://github.com/rakibhhridoy/machinelearning-featureselection

extratreesclassifier feature-selection gridsearchcv lasso-regression logistic-regression machine-learning numpy pandas pca rfe rfecv scikit-learn selectkbest

Last synced: 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/rakibhhridoy/machinelearning-featureselection
Owner: rakibhhridoy
Created: 2020-07-21T09:44:04.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2020-08-18T10:10:42.000Z (almost 5 years ago)
Last Synced: 2025-02-17T02:41:55.390Z (5 months ago)
Topics: extratreesclassifier, feature-selection, gridsearchcv, lasso-regression, logistic-regression, machine-learning, numpy, pandas, pca, rfe, rfecv, scikit-learn, selectkbest
Language: Python
Homepage: https://rakibhhridoy.github.io
Size: 1000 Bytes
Stars: 1
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: readme.md

Awesome Lists containing this project

README

        # *Machine Learning Feature Selection*

>Step By step

1. *importing libraries & functions*

```python

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn.linear_model import Lasso

from sklearn.model_selection import GridSearchCV

from sklearn.feature_selection import SelectKBest

from sklearn.feature_selection import f_classif

from sklearn.ensemble import ExtraTreesClassifier

from sklearn.decomposition import PCA

import os

from sklearn.feature_selection import RFE

from sklearn.linear_model import LogisticRegression

```

2. *loading datasets*

```python

file = os.getcwd()+"/datasets_228_482_diabetes.csv"

names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']

df = pd.read_csv(file, names = names)

array = df.values

X = array[:, 0:8]

y = array[:,8]

```

3. Different feature selection techniques:

> SelectKBest

```python

test = SelectKBest(score_func = f_classif, k=4) 

fit = test.fit(X,y)

features = fit.transform(X)

corr_p = df['skin'].corr(df['class'])

print(corr_p)

print(features[0:5,:])

model = LogisticRegression(solver = 'lbfgs')

rfe = RFE(model, 3)

fit = rfe.fit(X,y)

print('Num features: %d' % fit.n_features_)

print('Selected features: %s' % fit.support_)

print('feature ranking: %s' % fit.ranking_)

```

>ExtraTreeClasssifier

```python

model = ExtraTreesClassifier(n_estimators=10)

model.fit(X,y)

print(model.feature_importances_)

```

>Dimensionality Reduction- PCA

```python

pca = PCA(n_components = 3)

fit = pca.fit(X,y)

print('Explained Variance: %s'% fit.explained_variance_ratio_)

print(fit.components_)

```

> best params and score findings

```python

lasso = Lasso()

parameters = {'alpha': [1e-15,1e-10, 1e-8, 1e-4, 1e-3,1e-2,1,5,10,20]}

lasso_regressor = GridSearchCV(lasso, parameters, scoring = 'neg_mean_squared_error', cv=5)

lasso_regressor.fit(X,y)

print(lasso_regressor.best_params_)

print(lasso_regressor.best_score_)

```

#### *Get Touch With Me*

Connect- [Linkedin](https://linkedin.com/in/rakibhhridoy) 


Website- [RakibHHridoy](https://rakibhhridoy.github.io)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rakibhhridoy/machinelearning-featureselection

Awesome Lists containing this project

README