An open API service indexing awesome lists of open source software.

https://github.com/maryamteimouri/dataanalysis-and-knowledgediscovery

This project aims to practice the steps of Crisp Data Mining ( CRISP-DM ). The repository includes 3 phases, data understanding, supervised learning, and unsupervised learning.
https://github.com/maryamteimouri/dataanalysis-and-knowledgediscovery

agglomerative-clustering classification clustering crisp-dm cross-validation data-preprocessing data-visualization dendogram k-means-clustering one-hot-encode pca principal-component-analysis z-score

Last synced: 4 months ago
JSON representation

This project aims to practice the steps of Crisp Data Mining ( CRISP-DM ). The repository includes 3 phases, data understanding, supervised learning, and unsupervised learning.

Awesome Lists containing this project

README

          

# Data Analysis and Knowledge Discovery
This project aims to practice the steps of Crisp Data Mining ( CRISP-DM ).
The repository includes 3 phases, data understanding, supervised learning, and unsupervised learning.

- In P1, data understanding, I practice looking at the data and **checking data quality** by plotting numeric and categorical features. Also, I apply some **preprocessing** methods like **min-max scaling to [0,1]**, **standardizing the features to 0 mean and unit variance**, and **one-hot encoding**.

- In P2, supervised learning, 3 **classification** methods are implemented; **K nearest neighbor (KNN), ride regression, and KNN regression**. For **hyperparameter optimization**, I used **one-leave-out cross-validation**.

- In P3, Unsupervised learning, some preprocessing for data visualization methods are implemented; **z-score standardization**, **principal component analysis (PCA)**, and **dendrograms**. Moreover, two clustering methods are applied; **Agglomerative hierarchical** and **K-means clustering**.