https://github.com/finite-sample/lookahead-kmeans
Look Ahead Initialization of K-Means
https://github.com/finite-sample/lookahead-kmeans
Last synced: 8 months ago
JSON representation
Look Ahead Initialization of K-Means
- Host: GitHub
- URL: https://github.com/finite-sample/lookahead-kmeans
- Owner: finite-sample
- Created: 2025-06-18T23:24:14.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-06-18T23:35:49.000Z (12 months ago)
- Last Synced: 2025-06-19T19:04:13.830Z (12 months ago)
- Language: Jupyter Notebook
- Size: 8.79 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# π§ Lookahead K-Means: Smarter Cluster Initialization
This repo implements and compares a **lookahead-based initialization** strategy for KMeans against standard `k-means++`. The lookahead approach generates multiple candidate initializations and runs a few K-Means steps (not the full algorithm) for each. It then selects the initialization that produces the best intermediate silhouette score after this limited rollout.
## π Whatβs Inside
* π Evaluates both `k-means++` and **lookahead init**
* π Tracks **silhouette scores** over iterations
* β± Measures **runtime** and **peak memory**
* π§ͺ Tested on real (Iris, Wine) and synthetic datasets (Overlapping, Noisy)
## Notebook
[Notebook](lookahead-kmeans.ipynb)
## π§ Lookahead Strategy
* Randomly initialize multiple candidate centroids
* For each, simulate several K-Means steps (rollout_depth)
* Pick the one with the best silhouette score
## π Results
| Dataset | Std Sil. | LA Sil. | Std Time | LA Time | Std Mem | LA Mem |
| ------- | -------- | ------- | -------- | ------- | ------- | ------- |
| Iris | 0.55 | 0.55 | 0.05 s | 0.12 s | 0.36 MB | 0.36 MB |
| Noisy | 0.18 | 0.23 | 0.13 s | 0.31 s | 2.01 MB | 2.00 MB |
## πͺ When to Use
* Useful for **noisy** or **high-dimensional** data
* Helps when **initialization quality matters**
* Offers better clustering at the cost of runtime