Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/anujdutt9/feature-selection-for-machine-learning

Methods with examples for Feature Selection during Pre-processing in Machine Learning.
https://github.com/anujdutt9/feature-selection-for-machine-learning

correlation feature-selection machine-learning python36

Last synced: 5 days ago
JSON representation

Methods with examples for Feature Selection during Pre-processing in Machine Learning.

Awesome Lists containing this project

README

        

# Feature Selection for Machine Learning

***This repository contains the code for three main methods in Machine Learning for Feature Selection i.e. Filter Methods, Wrapper Methods and Embedded Methods. All code is written in Python 3.***

**Status:** Ongoing

# Requirements

**1. Python 3.5 +**

**2. Jupyter Notebook**

**3. Scikit-Learn**

**4. Numpy [+mkl for Windows]**

**5. Pandas**

**6. Matplotlib**

**7. Seaborn**

**8. mlxtend**

# Datasets

**1.** [Santander Customer Satisfaction Dataset](https://www.kaggle.com/c/santander-customer-satisfaction)

**2.** [BNP Paribas Cardif Claims Management Dataset](https://www.kaggle.com/c/bnp-paribas-cardif-claims-management)

**3.** [Titanic Disaster Dataset](https://www.kaggle.com/c/titanic/data)

**4.** [Housing Prices Dataset](https://www.kaggle.com/c/house-prices-advanced-regression-techniques)

# Filter Methods

| S.No. | Name | About | Status |
| ----- | ----------------- | ------------------------------------------------------------------ | ------------ |
| 1. | Constant Feature Elimination | This notebook explains how to remove the constant features during pre-processing step. | Completed |
| 2. | Quasi-Constant Feature Elimination | This notebook explains how to get the Quasi-Constant features and remove them during pre-processing. | Completed |
| 3. | Duplicate Features Elimination | This notebook explains how to find the duplicate features in a dataset and remove them. | Completed |
| 4. | Correlation | This notebook explains how to get the correlation between features and between features and target and choose the best features. | Completed |
| 5. | Machine Learning Pipeline | This notebook explains how to use all the above methods in a ML pipeline with performance comparison. | Completed |
| 6. | Mutual Information | This notebook explains the concept of Mutual Information using classification and Regression to find the best features from a dataset. | Completed |
| 7. | Fisher Score Chi Square | This notebook explains the concept of Fisher Score chi2 for feature selection. | Completed |
| 8. | Univariate Feature Selection | This notebook explains the concept of Univariate Feature Selection using Classification and Regression. | Completed |
| 9. | Univariate ROC/AUC/MSE | This notebook explains the concept of Univariate Feature Selection using ROC AUC scoring.| Completed |
| 10. | Combining all Methods | This notebook compares the combined performance of all methods explained. | Completed |

# Wrapper Methods

| S.No. | Name | About | Status |
| ----- | ----------------- | ------------------------------------------------------------------ | ------------ |
| 1. | Step Forward Feature Selection | This notebook explains the concept of Step Forward Feature Selection. | Completed |
| 2. | Step Backward Feature Selection | This notebook explains the concept of Step Backward Feature Selection. |Completed|
| 3. | Exhaustive Search Feature Selection | This notebook explains the concept of Exhaustive Search Feature Selection.| Completed |

# Embedded Methods

| S.No. | Name | About | Status |
| ----- | ----------------- | ------------------------------------------------------------------ | ------------ |