Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rakibhhridoy/machinelearning-featureselection
Before training a model or feed a model, first priority is on data,not in model. The more data is preprocessed and engineered the more model will learn. Feature selectio one of the methods processing data before feeding the model. Various feature selection techniques is shown here.
https://github.com/rakibhhridoy/machinelearning-featureselection
extratreesclassifier feature-selection gridsearchcv lasso-regression logistic-regression machine-learning numpy pandas pca rfe rfecv scikit-learn selectkbest
Last synced: about 2 hours ago
JSON representation
Before training a model or feed a model, first priority is on data,not in model. The more data is preprocessed and engineered the more model will learn. Feature selectio one of the methods processing data before feeding the model. Various feature selection techniques is shown here.
- Host: GitHub
- URL: https://github.com/rakibhhridoy/machinelearning-featureselection
- Owner: rakibhhridoy
- Created: 2020-07-21T09:44:04.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-08-18T10:10:42.000Z (about 4 years ago)
- Last Synced: 2023-10-20T22:45:06.664Z (about 1 year ago)
- Topics: extratreesclassifier, feature-selection, gridsearchcv, lasso-regression, logistic-regression, machine-learning, numpy, pandas, pca, rfe, rfecv, scikit-learn, selectkbest
- Language: Python
- Homepage: https://rakibhhridoy.github.io
- Size: 1000 Bytes
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# *Machine Learning Feature Selection*
>Step By step
1. *importing libraries & functions*
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Lasso
from sklearn.model_selection import GridSearchCV
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.decomposition import PCA
import osfrom sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
```2. *loading datasets*
```python
file = os.getcwd()+"/datasets_228_482_diabetes.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
df = pd.read_csv(file, names = names)array = df.values
X = array[:, 0:8]
y = array[:,8]
```3. Different feature selection techniques:
> SelectKBest
```python
test = SelectKBest(score_func = f_classif, k=4)
fit = test.fit(X,y)features = fit.transform(X)
corr_p = df['skin'].corr(df['class'])
print(corr_p)print(features[0:5,:])
model = LogisticRegression(solver = 'lbfgs')
rfe = RFE(model, 3)
fit = rfe.fit(X,y)print('Num features: %d' % fit.n_features_)
print('Selected features: %s' % fit.support_)
print('feature ranking: %s' % fit.ranking_)
```
>ExtraTreeClasssifier
```python
model = ExtraTreesClassifier(n_estimators=10)
model.fit(X,y)print(model.feature_importances_)
```
>Dimensionality Reduction- PCA
```python
pca = PCA(n_components = 3)
fit = pca.fit(X,y)print('Explained Variance: %s'% fit.explained_variance_ratio_)
print(fit.components_)
```
> best params and score findings
```python
lasso = Lasso()parameters = {'alpha': [1e-15,1e-10, 1e-8, 1e-4, 1e-3,1e-2,1,5,10,20]}
lasso_regressor = GridSearchCV(lasso, parameters, scoring = 'neg_mean_squared_error', cv=5)
lasso_regressor.fit(X,y)print(lasso_regressor.best_params_)
print(lasso_regressor.best_score_)
```#### *Get Touch With Me*
Connect- [Linkedin](https://linkedin.com/in/rakibhhridoy)
Website- [RakibHHridoy](https://rakibhhridoy.github.io)