Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sirinemaaroufi/ml_clustering_explorations
This repository contains a series of notebooks exploring various clustering techniques in machine learning.
https://github.com/sirinemaaroufi/ml_clustering_explorations
clustering-methods comparison dbscan-clustering dimentionality-reduction gmm-clustering hierarchical-clustering kmeans-clustering ml pca python
Last synced: 2 days ago
JSON representation
This repository contains a series of notebooks exploring various clustering techniques in machine learning.
- Host: GitHub
- URL: https://github.com/sirinemaaroufi/ml_clustering_explorations
- Owner: SirineMaaroufi
- Created: 2024-11-03T11:07:05.000Z (3 days ago)
- Default Branch: main
- Last Pushed: 2024-11-03T11:19:23.000Z (3 days ago)
- Last Synced: 2024-11-03T12:18:03.314Z (3 days ago)
- Topics: clustering-methods, comparison, dbscan-clustering, dimentionality-reduction, gmm-clustering, hierarchical-clustering, kmeans-clustering, ml, pca, python
- Language: Jupyter Notebook
- Homepage:
- Size: 1.11 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# π ML Clustering Explorations
This repository contains a series of notebooks exploring various clustering techniques in machine learning. Each notebook demonstrates different methods, compares their performance, and provides insights into their strengths and weaknesses. These explorations are useful for understanding data science clustering techniques, such as choosing the right method for different data structures and how these methods handle different clustering challenges.## π Notebooks Overview
1. **π Comparing K-Means and DBSCAN Algorithms**
This notebook explores two popular clustering algorithms, **K-Means** and **DBSCAN**. We compare their behavior on different datasets, examining how they respond to various shapes, densities, and distributions of data points.2. **π Gaussian Mixture Models (GMM)**
This notebook dives into **Gaussian Mixture Models (GMM)** for clustering. We cover the probabilistic foundations of GMMs, model fitting, and visualization. The notebook highlights how GMMs can capture complex cluster shapes by assuming data comes from a mixture of Gaussian distributions.3. **π Principal Component Analysis (PCA)**
Here, we explore **Principal Component Analysis (PCA)** as a tool for dimensionality reduction and feature extraction. Although PCA isnβt a clustering method itself, this notebook demonstrates how it can aid clustering by reducing data dimensionality and improving the interpretability of results.4. **π³ Hierarchical Clustering**
This notebook examines **Hierarchical Clustering**, an algorithm that builds nested clusters in a tree structure (dendrogram). We explore different linkage criteria, visualize the dendrogram, and discuss how to select the optimal number of clusters.## π Getting Started
### π§ Prerequisites
- Python 3.12.7
- Jupyter Notebook
- Common Python libraries such as:
- ![NumPy](https://img.shields.io/badge/NumPy-013243?style=for-the-badge&logo=numpy&logoColor=white)
- ![Pandas](https://img.shields.io/badge/Pandas-150458?style=for-the-badge&logo=pandas&logoColor=white)
- ![Matplotlib](https://img.shields.io/badge/Matplotlib-35495E?style=for-the-badge&logo=python&logoColor=white)
- ![Seaborn](https://img.shields.io/badge/Seaborn-3776AB?style=for-the-badge&logo=python&logoColor=white)
- ![scikit-learn](https://img.shields.io/badge/scikit--learn-F7931E?style=for-the-badge&logo=scikit-learn&logoColor=white)
Install the required libraries using:
```bash
pip install numpy pandas matplotlib seaborn scikit-learn
```### π Repo Structure
```bash
ML_Clustering_Explorations/
β
βββ KMeans_vs_DBSCAN.ipynb
βββ GMM.ipynb
βββ PCA.ipynb
βββ Hierarchical_Clustering.ipynb
```