{"id":15627639,"url":"https://github.com/solegalli/feature-engineering-for-machine-learning","last_synced_at":"2025-04-04T11:10:28.907Z","repository":{"id":38833113,"uuid":"184386548","full_name":"solegalli/feature-engineering-for-machine-learning","owner":"solegalli","description":"Code repository for the online course Feature Engineering for Machine Learning","archived":false,"fork":false,"pushed_at":"2023-12-05T12:07:10.000Z","size":25913,"stargazers_count":389,"open_issues_count":0,"forks_count":406,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-03-28T10:04:27.369Z","etag":null,"topics":["data-science","feature-engineering","feature-extraction","machine-learning","python"],"latest_commit_sha":null,"homepage":"https://www.trainindata.com/p/feature-engineering-for-machine-learning","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/solegalli.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["solegalli"]}},"created_at":"2019-05-01T08:05:42.000Z","updated_at":"2025-03-27T20:56:58.000Z","dependencies_parsed_at":"2024-10-23T00:08:47.532Z","dependency_job_id":null,"html_url":"https://github.com/solegalli/feature-engineering-for-machine-learning","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solegalli%2Ffeature-engineering-for-machine-learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solegalli%2Ffeature-engineering-for-machine-learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solegalli%2Ffeature-engineering-for-machine-learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solegalli%2Ffeature-engineering-for-machine-learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/solegalli","download_url":"https://codeload.github.com/solegalli/feature-engineering-for-machine-learning/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247166156,"owners_count":20894652,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","feature-engineering","feature-extraction","machine-learning","python"],"created_at":"2024-10-03T10:18:05.334Z","updated_at":"2025-04-04T11:10:28.890Z","avatar_url":"https://github.com/solegalli.png","language":"Jupyter Notebook","funding_links":["https://github.com/sponsors/solegalli"],"categories":[],"sub_categories":[],"readme":"﻿![PythonVersion](https://img.shields.io/badge/python-3.7%20|3.8%20|%203.9%20|%203.10-success)\n[![License https://github.com/solegalli/feature-engineering-for-machine-learning/blob/master/LICENSE](https://img.shields.io/badge/license-BSD-success.svg)](https://github.com/solegalli/feature-engineering-for-machine-learning/blob/master/LICENSE)\n[![Sponsorship https://www.trainindata.com/](https://img.shields.io/badge/Powered%20By-TrainInData-orange.svg)](https://www.trainindata.com/)\n\n## Feature Engineering for Machine Learning - Code Repository\n\n[\u003cimg src=\"./course-banner.png\"\u003e](https://www.trainindata.com/p/feature-engineering-for-machine-learning)\n\nCode repository for the online course [Feature Engineering for Machine Learning](https://www.trainindata.com/p/feature-engineering-for-machine-learning)\n\n**Launched**: November, 2017\n\n**Actively maintained**.\n\n[\u003cimg src=\"./feml_logo.png\" width=\"248\"\u003e](https://www.trainindata.com/p/feature-engineering-for-machine-learning)\n\n## Table of Contents\n\n1. **Introduction: Variable Types**\n\t1. Numerical Variables: Discrete and continuous\n\t2. Categorical Variables: Nominal and Ordinal\n\t3. Datetime variables\n\t4. Mixed variables: strings and numbers\n\n2. **Variable Characteristics**\n\t1. Missing Data \n\t2. Cardinality\n\t3. Category Frequency\n\t4. Distributions\n\t5. Outliers\n\t6. Magnitude\n\n3. **Missing Data Imputation**\n\t1. Mean and Median Imputation\n\t2. Arbitrary value imputation\n\t3. End of Tail Imputation\n\t4. Frequent category imputation\n\t5. Adding string missing\n\t6. Random Sample Imputation\n\t7. Adding a missing indicator\n\t8. Imputation with Scikit-learn\n\t9. Imputation with Feature-engine\n\n4. **Multivariate Imputation**\n\t1. MICE\n\t2. KNN imputation\n\n5. **Categorical Variable Encoding**\n\t1. One hot encoding: simple and of frequent categories\n\t2. Ordinal encoding: arbitrary and ordered\n\t3. Target mean encoding\n\t4. Weight of evidence\n\t6. Rare Label encoding\n\t7. Encoding with Scikit-learn\n\t8. Encoding with Feature-engine\n\t9. Encoding with category encoders\n\n6. **Variable Transformation**\n\t1. Log, power and reciprocal\n\t2. Box-Cox\n\t3. yeo-Johnson\n\t4. Transformation with Scikit-learn\n\t5. Transformation with Feature-engine\n\n7. **Discretisation**\n\t1. Arbitrary\n\t2. Equal-frequency discretisation\n\t3. Equal-width discretisation\n\t4. K-means discretisation\n\t5. Discretisation with trees\n\t6. Discretisation with Scikit-learn\n\t7. Discretisation with Feature-engine\n\n8. **Outliers**\n\t1. Capping\n\t2. Trimming\n\n9. **Datetime**\n\t1. Extracting day, month, week, etc\n\t2. Extracting hr, min, sec, etc\n\t3. Capturing elapsed time\n\t4. Working with timezones\n\t\n10. **Mixed variables**\n\t1. Creating new variables from strings and numbers\n\n11. **Feature creation**\n\t1. Sum, prod, count, mean, std, etc\n\t2. Div, sub\n\t3. Polynomial expansion\n\t4. Splines\n\t\n12. **Feature Scaling**\n\t1. Standardisation\n\t2. MinMaxScaling\n\t3. MaxAbsoluteScaling\n\t4. RobustScaling\n\n13. **Pipelines**\n\t1. Classification Pipeline\n\t2. Regression Pipeline\n\t3. Pipeline with cross-validation\n\n\n## Links\n\n- [Online Course](https://www.trainindata.com/p/feature-engineering-for-machine-learning)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsolegalli%2Ffeature-engineering-for-machine-learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsolegalli%2Ffeature-engineering-for-machine-learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsolegalli%2Ffeature-engineering-for-machine-learning/lists"}