https://github.com/lachhebo/pyclustertend

A python package to assess cluster tendency
https://github.com/lachhebo/pyclustertend

cluster-analysis cluster-tendency clustering clustertendency data-science hopkins ivat machine-learning python scikit-learn statistics vat visual-assessment-cluster-tendency

Last synced: 5 months ago
JSON representation

A python package to assess cluster tendency

Host: GitHub
URL: https://github.com/lachhebo/pyclustertend
Owner: lachhebo
License: bsd-3-clause
Created: 2019-05-19T12:58:23.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2022-12-27T17:39:21.000Z (over 2 years ago)
Last Synced: 2024-10-29T23:33:37.437Z (9 months ago)
Topics: cluster-analysis, cluster-tendency, clustering, clustertendency, data-science, hopkins, ivat, machine-learning, python, scikit-learn, statistics, vat, visual-assessment-cluster-tendency
Language: Python
Homepage: https://pyclustertend.readthedocs.io/en/master/index.html
Size: 6.47 MB
Stars: 46
Watchers: 3
Forks: 8
Open Issues: 7
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # pyclustertend

[![Build Status](https://travis-ci.com/lachhebo/pyclustertend.svg?branch=master)](https://travis-ci.com/lachhebo/pyclustertend)  [![PyPi Status](https://img.shields.io/pypi/v/pyclustertend.svg?color=brightgreen)](https://pypi.org/project/pyclustertend/) [![Documentation Status](https://readthedocs.org/projects/pyclustertend/badge/?version=master)](https://pyclustertend.readthedocs.io/en/master/) [![Downloads](https://pepy.tech/badge/pyclustertend)](https://pepy.tech/project/pyclustertend) [![codecov](https://codecov.io/gh/lachhebo/pyclustertend/branch/master/graph/badge.svg)](https://codecov.io/gh/lachhebo/pyclustertend)

[![DOI](https://zenodo.org/badge/187477036.svg)](https://zenodo.org/badge/latestdoi/187477036)

pyclustertend is a python package specialized in cluster tendency. Cluster tendency consist to assess if clustering algorithms are relevant for a dataset.

Three methods for assessing cluster tendency are currently implemented and one additional method based on metrics obtained with a KMeans estimator :

- [x] Hopkins Statistics

- [x] VAT

- [x] iVAT

- [x] Metric based method (silhouette, calinksi, davies bouldin)

## Installation

```shell

    pip install pyclustertend

```

## Usage

### Example Hopkins

```python

    >>>from sklearn import datasets

    >>>from pyclustertend import hopkins

    >>>from sklearn.preprocessing import scale

    >>>X = scale(datasets.load_iris().data)

    >>>hopkins(X,150)

    0.18950453452838564

```

### Example VAT

```python

    >>>from sklearn import datasets

    >>>from pyclustertend import vat

    >>>from sklearn.preprocessing import scale

    >>>X = scale(datasets.load_iris().data)

    >>>vat(X)

```



### Example iVat

```python

    >>>from sklearn import datasets

    >>>from pyclustertend import ivat

    >>>from sklearn.preprocessing import scale

    >>>X = scale(datasets.load_iris().data)

    >>>ivat(X)

```



## Notes

It's preferable to scale the data before using hopkins or vat algorithm as they use distance between observations. Moreover, vat and ivat algorithms

do not really fit to massive databases. A first solution is to sample the data before using those algorithms.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lachhebo/pyclustertend

Awesome Lists containing this project

README