https://github.com/subhashpolisetti/clustering-techniques-and-embeddings
This repository includes Colab notebooks demonstrating various clustering algorithms, from scratch-based methods to advanced deep learning models and embeddings. Each notebook features explanations, visualizations, and quality evaluation metrics for clustering performance.
https://github.com/subhashpolisetti/clustering-techniques-and-embeddings
anomaly-detection clustering-algorithm hierarchical-clustering kmeans-clustering multimodal time-series
Last synced: 3 months ago
JSON representation
This repository includes Colab notebooks demonstrating various clustering algorithms, from scratch-based methods to advanced deep learning models and embeddings. Each notebook features explanations, visualizations, and quality evaluation metrics for clustering performance.
- Host: GitHub
- URL: https://github.com/subhashpolisetti/clustering-techniques-and-embeddings
- Owner: subhashpolisetti
- Created: 2024-12-07T03:13:35.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-12-10T23:40:17.000Z (7 months ago)
- Last Synced: 2025-02-08T09:26:42.176Z (5 months ago)
- Topics: anomaly-detection, clustering-algorithm, hierarchical-clustering, kmeans-clustering, multimodal, time-series
- Language: Jupyter Notebook
- Homepage:
- Size: 14.4 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Clustering Assignment
This repository contains Colab notebooks that demonstrate a variety of clustering algorithms and techniques. The implementations range from scratch-based methods to advanced embeddings using deep learning models. Each notebook includes detailed documentation, visualizations, and quality measures for evaluating clustering results.
## Table of Contents
1. K-Means Clustering from Scratch
2. Hierarchical Clustering
3. Gaussian Mixture Models Clustering
4. DBSCAN Clustering Using PyCaret
5. Anomaly Detection Using PyOD
6. Clustering Time Series Data
7. Document Clustering with LLM Embeddings
8. Image Clustering Using ImageBind
9. Audio Clustering Using ImageBind---
## 1. K-Means Clustering from Scratch
This notebook implements the K-Means clustering algorithm from scratch, covering the steps of centroid initialization, iterative cluster assignment, and centroid updates. It includes visualizations and evaluation metrics such as inertia to assess clustering performance.colab Link: https://github.com/subhashpolisetti/Clustering-Techniques-and-Embeddings/blob/main/1_KMeans_Clustering_on_Income_Data.ipynb
---
## 2. Hierarchical Clustering
This notebook demonstrates hierarchical clustering using Python libraries. It includes visualizations of cluster merging through dendrograms and explanations on how to interpret hierarchical trees for clustering.colab Link: https://github.com/subhashpolisetti/Clustering-Techniques-and-Embeddings/blob/main/1_KMeans_Clustering_on_Income_Data.ipynb
---
## 3. Gaussian Mixture Models Clustering
This notebook showcases Gaussian Mixture Models (GMMs) for clustering, explaining the probabilistic approach to cluster assignments. It also evaluates the clustering results using AIC, BIC, and other relevant metrics.colab Link: https://github.com/subhashpolisetti/Clustering-Techniques-and-Embeddings/blob/main/3_Clustering_with_K_Means_and_Gaussian_Mixture_Models_(GMM).ipynb
---
## 4. DBSCAN Clustering Using PyCaret
In this notebook, the DBSCAN clustering algorithm is implemented using PyCaret. This density-based approach identifies clusters with varying shapes and detects noise. The notebook includes clustering quality evaluations using silhouette scores.colab Link: https://github.com/subhashpolisetti/Clustering-Techniques-and-Embeddings/blob/main/4_Clustering_Analysis_with_PyCaret_K_Means_and_DBSCAN.ipynb
---
## 5. Anomaly Detection Using PyOD
This notebook demonstrates how to apply anomaly detection using the PyOD library, handling both univariate and multivariate datasets. Anomalies in time series data are detected and visualized to aid interpretation.colab Link: https://github.com/subhashpolisetti/Clustering-Techniques-and-Embeddings/blob/main/5_Clustering_Multimodal_Inference_with_ImageBind.ipynb
---
## 6. Clustering Time Series Data Using Pretrained Models
This notebook focuses on clustering time series data using pretrained models, including techniques such as stock market analysis. It compares methods like TS-PTMs and LLM embeddings for clustering temporal patterns.colab Link: https://github.com/subhashpolisetti/Clustering-Techniques-and-Embeddings/blob/main/6_Clustering_Time_Series_Using_Deep_Features_and_tslearn.ipynb
---
## 7. Document Clustering with LLM Embeddings
This notebook demonstrates clustering documents using advanced embeddings, such as those from Sentence Transformers. It employs semantic similarity for clustering and evaluates the results using silhouette scores.colab Link: https://github.com/subhashpolisetti/Clustering-Techniques-and-Embeddings/blob/main/7_Document_Clustering_with_Sentence_Transformers_and_KMeans.ipynb
---
## 8. Image Clustering Using ImageBind LLM Embeddings
In this notebook, image clustering is performed using embeddings from Meta's ImageBind model. It also explores the use of cross-modality embeddings, clustering images alongside other modalities, with visualizations such as t-SNE.colab Link: https://github.com/subhashpolisetti/Clustering-Techniques-and-Embeddings/blob/main/8_Audio_Feature_Extraction_and_Clustering.ipynb
---
## 9. Audio Clustering Using ImageBind LLM Embeddings
This notebook focuses on clustering audio files using embeddings from ImageBind. It applies K-Means clustering and visualizes the results with t-SNE and waveform plots.colab Link: https://github.com/subhashpolisetti/Clustering-Techniques-and-Embeddings/blob/main/9_Anomaly_Detection_in_Time_Series_Univariate_%26_Multivariate_with_PyOD.ipynb
---
## Common Evaluation Metrics
- **Inertia and Silhouette Scores**: Used to evaluate clustering performance.
- **Visualization Techniques**: t-SNE, PCA, dendrograms, and heatmaps help in interpreting the clustering results.
- **Probabilistic Measures**: These are used in GMMs and anomaly detection to evaluate the likelihood of clusters and anomalies.---
## How to Run
1. Open the desired notebook on Google Colab.
2. Follow the instructions provided in each notebook to run the cells sequentially.
3. Feel free to modify or extend the code to explore different datasets or configurations.---
**Playlist**: [Walkthrough Videos](https://drive.google.com/drive/folders/1HEKD-kQMDNER44NV44KOJvkdVADAk9hQ?usp=sharing)