https://github.com/maryamteimouri/dataanalysis-and-knowledgediscovery

This project aims to practice the steps of Crisp Data Mining ( CRISP-DM ). The repository includes 3 phases, data understanding, supervised learning, and unsupervised learning.
https://github.com/maryamteimouri/dataanalysis-and-knowledgediscovery

agglomerative-clustering classification clustering crisp-dm cross-validation data-preprocessing data-visualization dendogram k-means-clustering one-hot-encode pca principal-component-analysis z-score

Last synced: 6 months ago
JSON representation

This project aims to practice the steps of Crisp Data Mining ( CRISP-DM ). The repository includes 3 phases, data understanding, supervised learning, and unsupervised learning.

Host: GitHub
URL: https://github.com/maryamteimouri/dataanalysis-and-knowledgediscovery
Owner: maryamteimouri
License: gpl-3.0
Created: 2024-02-10T16:26:44.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-02-11T10:39:07.000Z (over 2 years ago)
Last Synced: 2024-02-14T11:35:20.867Z (over 2 years ago)
Topics: agglomerative-clustering, classification, clustering, crisp-dm, cross-validation, data-preprocessing, data-visualization, dendogram, k-means-clustering, one-hot-encode, pca, principal-component-analysis, z-score
Language: Jupyter Notebook
Homepage:
Size: 481 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Data Analysis and Knowledge Discovery
This project aims to practice the steps of Crisp Data Mining ( CRISP-DM ).
The repository includes 3 phases, data understanding, supervised learning, and unsupervised learning.

- In P1, data understanding, I practice looking at the data and **checking data quality** by plotting numeric and categorical features. Also, I apply some **preprocessing** methods like **min-max scaling to [0,1]**, **standardizing the features to 0 mean and unit variance**, and **one-hot encoding**.

- In P2, supervised learning, 3 **classification** methods are implemented; **K nearest neighbor (KNN), ride regression, and KNN regression**. For **hyperparameter optimization**, I used **one-leave-out cross-validation**.

- In P3, Unsupervised learning, some preprocessing for data visualization methods are implemented; **z-score standardization**, **principal component analysis (PCA)**, and **dendrograms**. Moreover, two clustering methods are applied; **Agglomerative hierarchical** and **K-means clustering**.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/maryamteimouri/dataanalysis-and-knowledgediscovery

Awesome Lists containing this project

README