https://github.com/msikorski93/wine-data-clustering

The goal of this notebook was to introduce and perform clustering algorithms on white wine dataset.
https://github.com/msikorski93/wine-data-clustering

agglomerative-algorithm clustering k-means-clustering spectral-clustering unsupervised-machine-learning

Last synced: about 1 month ago
JSON representation

The goal of this notebook was to introduce and perform clustering algorithms on white wine dataset.

Host: GitHub
URL: https://github.com/msikorski93/wine-data-clustering
Owner: msikorski93
Created: 2022-09-03T18:22:33.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2022-09-03T18:41:24.000Z (about 3 years ago)
Last Synced: 2025-02-26T15:17:33.766Z (7 months ago)
Topics: agglomerative-algorithm, clustering, k-means-clustering, spectral-clustering, unsupervised-machine-learning
Language: Jupyter Notebook
Homepage:
Size: 1.75 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Wine-Data-Clustering

The goal of this notebook was to introduce and perform clustering algorithms on white wine dataset. Clustering (or grouping) allows us to identify homogeneous groups and recognize pattens within the data without any ground truth labels.

We developed these clustering models to do the unsupervised learning:

* k-means,

* agglomerative,

* spectral.

We also have proved that dimensionality reduction is an essential tool to make sense of the data in the absence of supervision information and applying PCA method improved the clustering process. Below are listed basic scores achieved for each algorithm:

| Method            | Silhouette | Caliński-
Harabasz | Davies-
Bouldin | Cluster 0 | Cluster 1 | Cluster 2 |

|-------------------|------------|-----------------------|--------------------|-----------|-----------|-----------|

| **k-Means**       | 0.2116     | 1261.7120             | 1.6024             | 1075      | 1308      | 1578      |

| **Agglomerative** | 0.1812     | 1033.0347             | 1.6782             | 1886      | 1382      | 693       |

| **Spectral**      | 0.2004     | 1204.9508             | 1.6378             | 1229      | 1403      | 1329      |

Based on evaluation metrics in the table, the k-means algorithm performed the best on this dataset.

Reference: https://archive.ics.uci.edu/ml/datasets/wine+quality

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/msikorski93/wine-data-clustering

Awesome Lists containing this project

README