https://github.com/maryamteimouri/dataanalysis-and-knowledgediscovery
This project aims to practice the steps of Crisp Data Mining ( CRISP-DM ). The repository includes 3 phases, data understanding, supervised learning, and unsupervised learning.
https://github.com/maryamteimouri/dataanalysis-and-knowledgediscovery
agglomerative-clustering classification clustering crisp-dm cross-validation data-preprocessing data-visualization dendogram k-means-clustering one-hot-encode pca principal-component-analysis z-score
Last synced: 4 months ago
JSON representation
This project aims to practice the steps of Crisp Data Mining ( CRISP-DM ). The repository includes 3 phases, data understanding, supervised learning, and unsupervised learning.
- Host: GitHub
- URL: https://github.com/maryamteimouri/dataanalysis-and-knowledgediscovery
- Owner: maryamteimouri
- License: gpl-3.0
- Created: 2024-02-10T16:26:44.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-11T10:39:07.000Z (over 2 years ago)
- Last Synced: 2024-02-14T11:35:20.867Z (over 2 years ago)
- Topics: agglomerative-clustering, classification, clustering, crisp-dm, cross-validation, data-preprocessing, data-visualization, dendogram, k-means-clustering, one-hot-encode, pca, principal-component-analysis, z-score
- Language: Jupyter Notebook
- Homepage:
- Size: 481 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Data Analysis and Knowledge Discovery
This project aims to practice the steps of Crisp Data Mining ( CRISP-DM ).
The repository includes 3 phases, data understanding, supervised learning, and unsupervised learning.
- In P1, data understanding, I practice looking at the data and **checking data quality** by plotting numeric and categorical features. Also, I apply some **preprocessing** methods like **min-max scaling to [0,1]**, **standardizing the features to 0 mean and unit variance**, and **one-hot encoding**.
- In P2, supervised learning, 3 **classification** methods are implemented; **K nearest neighbor (KNN), ride regression, and KNN regression**. For **hyperparameter optimization**, I used **one-leave-out cross-validation**.
- In P3, Unsupervised learning, some preprocessing for data visualization methods are implemented; **z-score standardization**, **principal component analysis (PCA)**, and **dendrograms**. Moreover, two clustering methods are applied; **Agglomerative hierarchical** and **K-means clustering**.