https://github.com/alexpof/poincarekmeans

K-Means algorithm in the Poincare Disk Model
https://github.com/alexpof/poincarekmeans

clustering hyperbolic-geometry k-means kmeans poincare-embeddings

Last synced: 6 months ago
JSON representation

K-Means algorithm in the Poincare Disk Model

Host: GitHub
URL: https://github.com/alexpof/poincarekmeans
Owner: AlexPof
Created: 2018-11-12T19:12:52.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2018-11-12T19:36:27.000Z (over 7 years ago)
Last Synced: 2025-09-09T03:51:23.575Z (11 months ago)
Topics: clustering, hyperbolic-geometry, k-means, kmeans, poincare-embeddings
Language: Python
Size: 129 KB
Stars: 14
Watchers: 1
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# PoincareKMeans: K-Means algorithm in the Poincare Disk Model

This a simple K-Means algorithm for clustering points in the Poincare Disk Model, a model for hyperbolic space.
This package was develop to exploit the results of Nickel & Kiela's *Poincare Embeddings* (as described in the paper
[Poincaré Embeddings for Learning Hierarchical Representations](https://papers.nips.cc/paper/7213-poincare-embeddings-for-learning-hierarchical-representations)).

This code has not been optimized. All contributions (for example for optimizing it for large sample sizes) are welcome.

## Usage

The API follows closely that of [scikit-learn K-Means](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html).
A model is obtained by importing and instantiating PoincareKMeans

>>> model = PoincareKMeans()

The options are as follows.

* *n_clusters (default 8)*: number of clusters to be determined
* *n_init (default 20)*: number of time the k-means algorithm will be run with different centroid seeds.
* *max_iter (default 300)*: maximum number of iterations of the k-means algorithm for a single run.
* *tol (default 1e-8)*: tolerance criteria to declare convergence for each run.
* *verbose (default True)*: verbosity mode. If True, will display the best inertia obtained for each run.

The model is trained on the dataset using *fit*

>>> model.fit(X)

Additional methods are provided:
* *fit_predict*: compute centroids and predict cluster index for each sample.
* *fit_transform*: compute clustering and transform X to cluster-distance space.
* *predict*: predict cluster index for the given sample.
* *transform*: computer cluster-distance for the given sample.

## Example

An example, using some coordinates obtained by Nickel & Kiela's embedding algorithm, is provided.
The output should be analog to the following Figure.

![poincare_clustering](poincare_clustering.png)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/alexpof/poincarekmeans

Awesome Lists containing this project

README