Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/xuyxu/clustering
Clustering / Subspace Clustering Algorithms on MATLAB
https://github.com/xuyxu/clustering
clustering clustering-algorithm subspace-clustering subspace-kmeans
Last synced: about 6 hours ago
JSON representation
Clustering / Subspace Clustering Algorithms on MATLAB
- Host: GitHub
- URL: https://github.com/xuyxu/clustering
- Owner: xuyxu
- Created: 2017-01-07T08:13:44.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2020-10-28T11:21:08.000Z (about 4 years ago)
- Last Synced: 2024-11-07T19:17:49.128Z (7 days ago)
- Topics: clustering, clustering-algorithm, subspace-clustering, subspace-kmeans
- Language: MATLAB
- Homepage:
- Size: 36.1 KB
- Stars: 222
- Watchers: 12
- Forks: 88
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Clustering/Subspace Clustering Algorithms on MATLAB
**This repo is no longer in active development. However, any problem on implementations of existing algorithms is welcomed. [Oct, 2020]**
## 1. Clustering Algorithms
- **K-means**
- **K-means++**
- Generally speaking, this algorithm is similar to **K-means**;
- Unlike classic K-means randomly choosing initial centroids, a better initialization procedure is integrated into **K-means++**, where observations far from existing centroids have higher probabilities of being chosen as the next centroid.
- The initializeation procedure can be achieved using Fitness Proportionate Selection.
- **ISODATA (Iterative Self-Organizing Data Analysis)**
- To be brief, **ISODATA** introduces two additional operations: Splitting and Merging;
- When the number of observations within one class is less than one pre-defined threshold, **ISODATA** merges two classes with minimum between-class distance;
- When the within-class variance of one class exceeds one pre-defined threshold, **ISODATA** splits this class into two different sub-classes.
- **Mean Shift**
- For each point *x*, find neighbors, calculate mean vector *m*, update *x = m*, until *x == m*;
- Non-parametric model, no need to specify the number of classes;
- No structure priori.
- **DBSCAN (Density-Based Spatial Clustering of Application with Noise)**
- Starting with pre-selected core objects, DBSCAN extends each cluster based on the connectivity between data points;
- DBSCAN takes noisy data into consideration, hence robust to outliers;
- Choosing good parameters can be hard without prior knowledge;
- **Gaussian Mixture Model (GMM)**
- **LVQ (Learning Vector Quantization)**## 2. Subspace Clustering Algorithms
- **Subspace K-means**
- This algorithm directly extends **K-means** to Subspace Clustering through multiplying each dimension *dj* by one weight *mj* (s.t. sum(*mj*)=1, *j*=1,2,...,*p*);
- It can be efficiently sovled in an Expectation-Maximization (EM) fashion. In each E-step, it updates weights, centroids using Lagrange Multiplier;
- This rough algorithm suffers from the problem on its favor of using just a few dimensions when clustering sparse data;
- **Entropy-Weighting Subspace K-means**
- Generally speaking, this algorithm is similar to **Subspace K-means**;
- In addition, it introduces one regularization item related to weight entropy into the objective function, in order to mitigate the aforementioned problem in **Subspace K-means**.
- Apart from its succinctness and efficiency, it works well on a broad range of real-world datasets.