Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/solegalli/feature-engineering-for-machine-learning
Code repository for the online course Feature Engineering for Machine Learning
https://github.com/solegalli/feature-engineering-for-machine-learning
data-science feature-engineering feature-extraction machine-learning python
Last synced: 4 days ago
JSON representation
Code repository for the online course Feature Engineering for Machine Learning
- Host: GitHub
- URL: https://github.com/solegalli/feature-engineering-for-machine-learning
- Owner: solegalli
- License: other
- Created: 2019-05-01T08:05:42.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2023-12-05T12:07:10.000Z (about 1 year ago)
- Last Synced: 2025-01-10T19:16:52.019Z (11 days ago)
- Topics: data-science, feature-engineering, feature-extraction, machine-learning, python
- Language: Jupyter Notebook
- Homepage: https://www.trainindata.com/p/feature-engineering-for-machine-learning
- Size: 24.7 MB
- Stars: 379
- Watchers: 6
- Forks: 402
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
![PythonVersion](https://img.shields.io/badge/python-3.7%20|3.8%20|%203.9%20|%203.10-success)
[![License https://github.com/solegalli/feature-engineering-for-machine-learning/blob/master/LICENSE](https://img.shields.io/badge/license-BSD-success.svg)](https://github.com/solegalli/feature-engineering-for-machine-learning/blob/master/LICENSE)
[![Sponsorship https://www.trainindata.com/](https://img.shields.io/badge/Powered%20By-TrainInData-orange.svg)](https://www.trainindata.com/)## Feature Engineering for Machine Learning - Code Repository
[](https://www.trainindata.com/p/feature-engineering-for-machine-learning)
Code repository for the online course [Feature Engineering for Machine Learning](https://www.trainindata.com/p/feature-engineering-for-machine-learning)
**Launched**: November, 2017
**Actively maintained**.
[](https://www.trainindata.com/p/feature-engineering-for-machine-learning)
## Table of Contents
1. **Introduction: Variable Types**
1. Numerical Variables: Discrete and continuous
2. Categorical Variables: Nominal and Ordinal
3. Datetime variables
4. Mixed variables: strings and numbers2. **Variable Characteristics**
1. Missing Data
2. Cardinality
3. Category Frequency
4. Distributions
5. Outliers
6. Magnitude3. **Missing Data Imputation**
1. Mean and Median Imputation
2. Arbitrary value imputation
3. End of Tail Imputation
4. Frequent category imputation
5. Adding string missing
6. Random Sample Imputation
7. Adding a missing indicator
8. Imputation with Scikit-learn
9. Imputation with Feature-engine4. **Multivariate Imputation**
1. MICE
2. KNN imputation5. **Categorical Variable Encoding**
1. One hot encoding: simple and of frequent categories
2. Ordinal encoding: arbitrary and ordered
3. Target mean encoding
4. Weight of evidence
6. Rare Label encoding
7. Encoding with Scikit-learn
8. Encoding with Feature-engine
9. Encoding with category encoders6. **Variable Transformation**
1. Log, power and reciprocal
2. Box-Cox
3. yeo-Johnson
4. Transformation with Scikit-learn
5. Transformation with Feature-engine7. **Discretisation**
1. Arbitrary
2. Equal-frequency discretisation
3. Equal-width discretisation
4. K-means discretisation
5. Discretisation with trees
6. Discretisation with Scikit-learn
7. Discretisation with Feature-engine8. **Outliers**
1. Capping
2. Trimming9. **Datetime**
1. Extracting day, month, week, etc
2. Extracting hr, min, sec, etc
3. Capturing elapsed time
4. Working with timezones
10. **Mixed variables**
1. Creating new variables from strings and numbers11. **Feature creation**
1. Sum, prod, count, mean, std, etc
2. Div, sub
3. Polynomial expansion
4. Splines
12. **Feature Scaling**
1. Standardisation
2. MinMaxScaling
3. MaxAbsoluteScaling
4. RobustScaling13. **Pipelines**
1. Classification Pipeline
2. Regression Pipeline
3. Pipeline with cross-validation## Links
- [Online Course](https://www.trainindata.com/p/feature-engineering-for-machine-learning)