Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/solegalli/feature-engineering-for-machine-learning

Code repository for the online course Feature Engineering for Machine Learning
https://github.com/solegalli/feature-engineering-for-machine-learning

data-science feature-engineering feature-extraction machine-learning python

Last synced: 13 days ago
JSON representation

Code repository for the online course Feature Engineering for Machine Learning

Awesome Lists containing this project

README

        

![PythonVersion](https://img.shields.io/badge/python-3.7%20|3.8%20|%203.9%20|%203.10-success)
[![License https://github.com/solegalli/feature-engineering-for-machine-learning/blob/master/LICENSE](https://img.shields.io/badge/license-BSD-success.svg)](https://github.com/solegalli/feature-engineering-for-machine-learning/blob/master/LICENSE)
[![Sponsorship https://www.trainindata.com/](https://img.shields.io/badge/Powered%20By-TrainInData-orange.svg)](https://www.trainindata.com/)

## Feature Engineering for Machine Learning - Code Repository

[](https://www.trainindata.com/p/feature-engineering-for-machine-learning)

Code repository for the online course [Feature Engineering for Machine Learning](https://www.trainindata.com/p/feature-engineering-for-machine-learning)

**Launched**: November, 2017

**Actively maintained**.

[](https://www.trainindata.com/p/feature-engineering-for-machine-learning)

## Table of Contents

1. **Introduction: Variable Types**
1. Numerical Variables: Discrete and continuous
2. Categorical Variables: Nominal and Ordinal
3. Datetime variables
4. Mixed variables: strings and numbers

2. **Variable Characteristics**
1. Missing Data
2. Cardinality
3. Category Frequency
4. Distributions
5. Outliers
6. Magnitude

3. **Missing Data Imputation**
1. Mean and Median Imputation
2. Arbitrary value imputation
3. End of Tail Imputation
4. Frequent category imputation
5. Adding string missing
6. Random Sample Imputation
7. Adding a missing indicator
8. Imputation with Scikit-learn
9. Imputation with Feature-engine

4. **Multivariate Imputation**
1. MICE
2. KNN imputation

5. **Categorical Variable Encoding**
1. One hot encoding: simple and of frequent categories
2. Ordinal encoding: arbitrary and ordered
3. Target mean encoding
4. Weight of evidence
6. Rare Label encoding
7. Encoding with Scikit-learn
8. Encoding with Feature-engine
9. Encoding with category encoders

6. **Variable Transformation**
1. Log, power and reciprocal
2. Box-Cox
3. yeo-Johnson
4. Transformation with Scikit-learn
5. Transformation with Feature-engine

7. **Discretisation**
1. Arbitrary
2. Equal-frequency discretisation
3. Equal-width discretisation
4. K-means discretisation
5. Discretisation with trees
6. Discretisation with Scikit-learn
7. Discretisation with Feature-engine

8. **Outliers**
1. Capping
2. Trimming

9. **Datetime**
1. Extracting day, month, week, etc
2. Extracting hr, min, sec, etc
3. Capturing elapsed time
4. Working with timezones

10. **Mixed variables**
1. Creating new variables from strings and numbers

11. **Feature creation**
1. Sum, prod, count, mean, std, etc
2. Div, sub
3. Polynomial expansion
4. Splines

12. **Feature Scaling**
1. Standardisation
2. MinMaxScaling
3. MaxAbsoluteScaling
4. RobustScaling

13. **Pipelines**
1. Classification Pipeline
2. Regression Pipeline
3. Pipeline with cross-validation

## Links

- [Online Course](https://www.trainindata.com/p/feature-engineering-for-machine-learning)