Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dayyass/graph-based-clustering

Graph-Based Clustering using connected components and spanning trees.
https://github.com/dayyass/graph-based-clustering

clustering data-science graph graph-algorithms hacktoberfest machine-learning python sklearn

Last synced: 9 days ago
JSON representation

Graph-Based Clustering using connected components and spanning trees.

Host: GitHub
URL: https://github.com/dayyass/graph-based-clustering
Owner: dayyass
License: mit
Created: 2021-09-16T17:13:29.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2021-11-01T19:27:42.000Z (about 3 years ago)
Last Synced: 2024-10-01T15:44:32.248Z (about 2 months ago)
Topics: clustering, data-science, graph, graph-algorithms, hacktoberfest, machine-learning, python, sklearn
Language: Jupyter Notebook
Homepage: https://pypi.org/project/graph-based-clustering/
Size: 393 KB
Stars: 26
Watchers: 1
Forks: 2
Open Issues: 6
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        [![tests](https://github.com/dayyass/graph-based-clustering/actions/workflows/tests.yml/badge.svg)](https://github.com/dayyass/graph-based-clustering/actions/workflows/tests.yml)

[![linter](https://github.com/dayyass/graph-based-clustering/actions/workflows/linter.yml/badge.svg)](https://github.com/dayyass/graph-based-clustering/actions/workflows/linter.yml)

[![codecov](https://codecov.io/gh/dayyass/graph-based-clustering/branch/main/graph/badge.svg?token=ZVR4C5SRON)](https://codecov.io/gh/dayyass/graph-based-clustering)

[![python 3.7](https://img.shields.io/badge/python-3.7-blue.svg)](https://github.com/dayyass/graph-based-clustering#requirements)

[![release (latest by date)](https://img.shields.io/github/v/release/dayyass/graph-based-clustering)](https://github.com/dayyass/graph-based-clustering/releases/latest)

[![license](https://img.shields.io/github/license/dayyass/graph-based-clustering?color=blue)](https://github.com/dayyass/graph-based-clustering/blob/main/LICENSE)

[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-black)](https://github.com/dayyass/graph-based-clustering/blob/main/.pre-commit-config.yaml)

[![code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

[![pypi version](https://img.shields.io/pypi/v/graph-based-clustering)](https://pypi.org/project/graph-based-clustering)

[![pypi downloads](https://img.shields.io/pypi/dm/graph-based-clustering)](https://pypi.org/project/graph-based-clustering)

### Graph-Based Clustering

Graph-Based Clustering using connected components and minimum spanning trees.

Both clustering methods, supported by this library, are **transductive** - meaning they are not designed to be applied to new, unseen data.

### Installation

To install **graph-based-clustering** run:

```

pip install graph-based-clustering

```

### Usage

The library has sklearn-like `fit/fit_predict` interface.

#### ConnectedComponentsClustering

This method computes pairwise distances matrix on the input data, and using *threshold* (parameter provided by the user) to binarize pairwise distances matrix makes an undirected graph in order to find connected components to perform the clustering.

Required arguments:

- **threshold** - paremeter to binarize pairwise distances matrix and make undirected graph

Optional arguments:

- **metric** - sklearn.metrics.[pairwise_distances](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html) parameter (default: *"euclidean"*)

- **n_jobs** - sklearn.metrics.[pairwise_distances](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html) parameter (default: *None*)

Example:

```python3

import numpy as np

from graph_based_clustering import ConnectedComponentsClustering

X = np.array([[0, 1], [1, 0], [1, 1]])

clustering = ConnectedComponentsClustering(

    threshold=0.275,

    metric="euclidean",

    n_jobs=-1,

)

clustering.fit(X)

labels_pred = clustering.labels_

# alternative

labels_pred = clustering.fit_predict(X)

```

#### SpanTreeConnectedComponentsClustering

This method computes pairwise distances matrix on the input data, builds a graph on the obtained matrix, finds minimum spanning tree, and finaly, performs the clustering through dividing the graph into *n_clusters* (parameter given by the user) by removing *n-1* edges with the highest weights.

Required arguments:

- **n_clusters** - the number of clusters to find

Optional arguments:

- **metric** - sklearn.metrics.[pairwise_distances](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html) parameter (default: *"euclidean"*)

- **n_jobs** - sklearn.metrics.[pairwise_distances](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html) parameter (default: *None*)

Example:

```python3

import numpy as np

from graph_based_clustering import SpanTreeConnectedComponentsClustering

X = np.array([[0, 1], [1, 0], [1, 1]])

clustering = SpanTreeConnectedComponentsClustering(

    n_clusters=3,

    metric="euclidean",

    n_jobs=-1,

)

clustering.fit(X)

labels_pred = clustering.labels_

# alternative

labels_pred = clustering.fit_predict(X)

```

### Comparing on sklearn toy datasets

#### ConnectedComponentsClustering

![ConnectedComponentsClustering](notebooks/images/ConnectedComponentsClustering.png "ConnectedComponentsClustering")


#### SpanTreeConnectedComponentsClustering

![SpanTreeConnectedComponentsClustering](notebooks/images/SpanTreeConnectedComponentsClustering.png "SpanTreeConnectedComponentsClustering")


### Requirements

Python >= 3.7

### Citation

If you use **graph-based-clustering** in a scientific publication, we would appreciate references to the following BibTex entry:

```bibtex

@misc{dayyass2021graphbasedclustering,

    author       = {El-Ayyass, Dani},

    title        = {Graph-Based Clustering using connected components and spanning trees},

    howpublished = {\url{https://github.com/dayyass/graph-based-clustering}},

    year         = {2021}

}

```