https://github.com/erdogant/clusteval
Clusteval provides methods for unsupervised cluster validation
https://github.com/erdogant/clusteval
clustering dbindex density-based-clustering machine-learning python silhouette-method unsupervised-clustering validation
Last synced: 9 months ago
JSON representation
Clusteval provides methods for unsupervised cluster validation
- Host: GitHub
- URL: https://github.com/erdogant/clusteval
- Owner: erdogant
- License: other
- Created: 2020-01-09T22:12:06.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2025-03-02T22:21:23.000Z (10 months ago)
- Last Synced: 2025-03-29T11:09:54.630Z (9 months ago)
- Topics: clustering, dbindex, density-based-clustering, machine-learning, python, silhouette-method, unsupervised-clustering, validation
- Language: Jupyter Notebook
- Homepage: https://erdogant.github.io/clusteval
- Size: 24.9 MB
- Stars: 58
- Watchers: 2
- Forks: 8
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# clusteval
[](https://img.shields.io/pypi/pyversions/clusteval)
[](https://pypi.org/project/clusteval/)
[](https://github.com/erdogant/clusteval/blob/master/LICENSE)
[](https://www.buymeacoffee.com/erdogant)
[](https://github.com/erdogant/clusteval/network)
[](https://github.com/erdogant/clusteval/issues)
[](http://www.repostatus.org/#active)
[](https://pepy.tech/project/clusteval)
[](https://pepy.tech/project/clusteval)
[](https://zenodo.org/badge/latestdoi/232915924)
[](https://erdogant.github.io/clusteval/)
[](https://erdogant.github.io/clusteval/pages/html/Documentation.html#colab-notebook)
``clusteval`` is a python package that is developed to evaluate detected clusters and return the cluster labels that have most optimal **clustering tendency**, **Number of clusters** and **clustering quality**. Multiple evaluation strategies are implemented for the evaluation; **silhouette**, **dbindex**, and **derivative**, and four clustering methods can be used: **agglomerative**, **kmeans**, **dbscan** and **hdbscan**.
#
**⭐️ Star this repo if you like it ⭐️**
#
### Blogs
#### [1. A step-by-step guide for clustering images](https://towardsdatascience.com/a-step-by-step-guide-for-clustering-images-4b45f9906128)
#### [2. Detection of Duplicate Images Using Image Hash Functions](https://towardsdatascience.com/detection-of-duplicate-images-using-image-hash-functions-4d9c53f04a75)
#### [3. From Data to Clusters: When is Your Clustering Good Enough?](https://towardsdatascience.com/from-data-to-clusters-when-is-your-clustering-good-enough-5895440a978a)
#### [4. From Clusters To Insights; The Next Step](https://towardsdatascience.com/from-clusters-to-insights-the-next-step-1c166814e0c6)
#
### [Documentation pages](https://erdogant.github.io/clusteval/)
On the [documentation pages](https://erdogant.github.io/clusteval/) you can find detailed information about the working of the ``clusteval`` with many examples.
#
### Installation
##### It is advisable to create a new environment (e.g. with Conda).
```bash
conda create -n env_clusteval python=3.8
conda activate clusteval
```
##### Install from PyPI
```bash
pip install clusteval
```
##### Import library
```python
from clusteval import clusteval
```
### Examples
A structured overview of all examples are now available on the [documentation pages](https://erdogant.github.io/clusteval/).
* [Example: Cluster validation using Silhouette score](https://erdogant.github.io/clusteval/pages/html/Examples.html#cluster-evaluation)
#
* [Example: Determine the optimal number of clusters](https://erdogant.github.io/clusteval/pages/html/Plots.html#plot)
#
* [Example: Plot the dendrogram](https://erdogant.github.io/clusteval/pages/html/Plots.html#dendrogram)
#
* [Example: Cluster validation using davies-boulin index](https://erdogant.github.io/clusteval/pages/html/Examples.html#dbindex-method)
#
* [Example: Cluster validation using davies-boulin index](https://erdogant.github.io/clusteval/pages/html/Examples.html#dbindex-method)
#
* [Example: Cluster validation using derivative evaluation method](https://erdogant.github.io/clusteval/pages/html/Examples.html#derivative-method)
#
* [Example: Cluster validation using dbscan](https://erdogant.github.io/clusteval/pages/html/Examples.html#dbscan)
#
* [Example: Cluster validation using hdbscan](https://erdogant.github.io/clusteval/pages/html/Examples.html#hdbscan)
## Citation
Please cite clusteval in your publications if this is useful for your research (see right top for citation).
## Other interesting techniques/blogs
* Use ARI when the ground truth clustering has large equal sized clusters
* Usa AMI when the ground truth clustering is unbalanced and there exist small clusters
* https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html
* https://scikit-learn.org/stable/auto_examples/cluster/plot_adjusted_for_chance_measures.html#sphx-glr-auto-examples-cluster-plot-adjusted-for-chance-measures-py
* https://github.com/idealo/imagededup
* https://towardsdatascience.com/how-to-cluster-images-based-on-visual-similarity-cd6e7209fe34
* https://github.com/facebookresearch/deepcluster
* https://towardsdatascience.com/pca-on-hyperspectral-data-99c9c5178385
* https://machinelearningmastery.com/face-recognition-using-principal-component-analysis/
### Maintainer
* Erdogan Taskesen, github: [erdogant](https://github.com/erdogant)
* Contributions are welcome.
* If you wish to buy me a Coffee for this work, it is very appreciated :)
Star it if you like it!