https://github.com/dayyass/graph-based-clustering
Graph-Based Clustering using connected components and spanning trees.
https://github.com/dayyass/graph-based-clustering
clustering data-science graph graph-algorithms hacktoberfest machine-learning python sklearn
Last synced: 4 months ago
JSON representation
Graph-Based Clustering using connected components and spanning trees.
- Host: GitHub
- URL: https://github.com/dayyass/graph-based-clustering
- Owner: dayyass
- License: mit
- Created: 2021-09-16T17:13:29.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2021-11-01T19:27:42.000Z (over 4 years ago)
- Last Synced: 2025-12-26T02:50:59.509Z (6 months ago)
- Topics: clustering, data-science, graph, graph-algorithms, hacktoberfest, machine-learning, python, sklearn
- Language: Jupyter Notebook
- Homepage: https://pypi.org/project/graph-based-clustering/
- Size: 393 KB
- Stars: 28
- Watchers: 1
- Forks: 2
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://github.com/dayyass/graph-based-clustering/actions/workflows/tests.yml)
[](https://github.com/dayyass/graph-based-clustering/actions/workflows/linter.yml)
[](https://codecov.io/gh/dayyass/graph-based-clustering)
[](https://github.com/dayyass/graph-based-clustering#requirements)
[](https://github.com/dayyass/graph-based-clustering/releases/latest)
[](https://github.com/dayyass/graph-based-clustering/blob/main/LICENSE)
[](https://github.com/dayyass/graph-based-clustering/blob/main/.pre-commit-config.yaml)
[](https://github.com/psf/black)
[](https://pypi.org/project/graph-based-clustering)
[](https://pypi.org/project/graph-based-clustering)
### Graph-Based Clustering
Graph-Based Clustering using connected components and minimum spanning trees.
Both clustering methods, supported by this library, are **transductive** - meaning they are not designed to be applied to new, unseen data.
### Installation
To install **graph-based-clustering** run:
```
pip install graph-based-clustering
```
### Usage
The library has sklearn-like `fit/fit_predict` interface.
#### ConnectedComponentsClustering
This method computes pairwise distances matrix on the input data, and using *threshold* (parameter provided by the user) to binarize pairwise distances matrix makes an undirected graph in order to find connected components to perform the clustering.
Required arguments:
- **threshold** - paremeter to binarize pairwise distances matrix and make undirected graph
Optional arguments:
- **metric** - sklearn.metrics.[pairwise_distances](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html) parameter (default: *"euclidean"*)
- **n_jobs** - sklearn.metrics.[pairwise_distances](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html) parameter (default: *None*)
Example:
```python3
import numpy as np
from graph_based_clustering import ConnectedComponentsClustering
X = np.array([[0, 1], [1, 0], [1, 1]])
clustering = ConnectedComponentsClustering(
threshold=0.275,
metric="euclidean",
n_jobs=-1,
)
clustering.fit(X)
labels_pred = clustering.labels_
# alternative
labels_pred = clustering.fit_predict(X)
```
#### SpanTreeConnectedComponentsClustering
This method computes pairwise distances matrix on the input data, builds a graph on the obtained matrix, finds minimum spanning tree, and finaly, performs the clustering through dividing the graph into *n_clusters* (parameter given by the user) by removing *n-1* edges with the highest weights.
Required arguments:
- **n_clusters** - the number of clusters to find
Optional arguments:
- **metric** - sklearn.metrics.[pairwise_distances](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html) parameter (default: *"euclidean"*)
- **n_jobs** - sklearn.metrics.[pairwise_distances](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html) parameter (default: *None*)
Example:
```python3
import numpy as np
from graph_based_clustering import SpanTreeConnectedComponentsClustering
X = np.array([[0, 1], [1, 0], [1, 1]])
clustering = SpanTreeConnectedComponentsClustering(
n_clusters=3,
metric="euclidean",
n_jobs=-1,
)
clustering.fit(X)
labels_pred = clustering.labels_
# alternative
labels_pred = clustering.fit_predict(X)
```
### Comparing on sklearn toy datasets
#### ConnectedComponentsClustering

#### SpanTreeConnectedComponentsClustering

### Requirements
Python >= 3.7
### Citation
If you use **graph-based-clustering** in a scientific publication, we would appreciate references to the following BibTex entry:
```bibtex
@misc{dayyass2021graphbasedclustering,
author = {El-Ayyass, Dani},
title = {Graph-Based Clustering using connected components and spanning trees},
howpublished = {\url{https://github.com/dayyass/graph-based-clustering}},
year = {2021}
}
```