{"id":17465025,"url":"https://github.com/erdogant/clusteval","last_synced_at":"2025-04-05T12:09:21.706Z","repository":{"id":62562968,"uuid":"232915924","full_name":"erdogant/clusteval","owner":"erdogant","description":"Clusteval provides methods for unsupervised cluster validation","archived":false,"fork":false,"pushed_at":"2025-03-02T22:21:23.000Z","size":26136,"stargazers_count":58,"open_issues_count":2,"forks_count":8,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-29T11:09:54.630Z","etag":null,"topics":["clustering","dbindex","density-based-clustering","machine-learning","python","silhouette-method","unsupervised-clustering","validation"],"latest_commit_sha":null,"homepage":"https://erdogant.github.io/clusteval","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/erdogant.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["erdogant"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":null}},"created_at":"2020-01-09T22:12:06.000Z","updated_at":"2025-03-02T22:21:27.000Z","dependencies_parsed_at":"2023-12-26T09:06:57.647Z","dependency_job_id":"f274e15b-6c69-451f-8a7a-dcd5448fe498","html_url":"https://github.com/erdogant/clusteval","commit_stats":{"total_commits":232,"total_committers":3,"mean_commits":77.33333333333333,"dds":0.2112068965517241,"last_synced_commit":"27783a12bc8870e8d06c50ec3f5f3cd68dbf27ee"},"previous_names":[],"tags_count":23,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erdogant%2Fclusteval","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erdogant%2Fclusteval/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erdogant%2Fclusteval/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erdogant%2Fclusteval/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/erdogant","download_url":"https://codeload.github.com/erdogant/clusteval/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247332612,"owners_count":20921853,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clustering","dbindex","density-based-clustering","machine-learning","python","silhouette-method","unsupervised-clustering","validation"],"created_at":"2024-10-18T11:08:30.028Z","updated_at":"2025-04-05T12:09:21.660Z","avatar_url":"https://github.com/erdogant.png","language":"Jupyter Notebook","funding_links":["https://github.com/sponsors/erdogant","https://www.buymeacoffee.com/erdogant"],"categories":[],"sub_categories":[],"readme":"# clusteval\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://erdogant.github.io/clusteval\"\u003e\n  \u003cimg src=\"https://github.com/erdogant/clusteval/blob/master/docs/figs/logo_large_2.png\" width=\"300\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n[![Python](https://img.shields.io/pypi/pyversions/clusteval)](https://img.shields.io/pypi/pyversions/clusteval)\n[![PyPI Version](https://img.shields.io/pypi/v/clusteval)](https://pypi.org/project/clusteval/)\n[![License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/erdogant/clusteval/blob/master/LICENSE)\n[![BuyMeCoffee](https://img.shields.io/badge/buymea-coffee-yellow.svg)](https://www.buymeacoffee.com/erdogant)\n[![Github Forks](https://img.shields.io/github/forks/erdogant/clusteval.svg)](https://github.com/erdogant/clusteval/network)\n[![GitHub Open Issues](https://img.shields.io/github/issues/erdogant/clusteval.svg)](https://github.com/erdogant/clusteval/issues)\n[![Project Status](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)\n[![Downloads](https://pepy.tech/badge/clusteval/month)](https://pepy.tech/project/clusteval)\n[![Downloads](https://pepy.tech/badge/clusteval)](https://pepy.tech/project/clusteval)\n[![DOI](https://zenodo.org/badge/232915924.svg)](https://zenodo.org/badge/latestdoi/232915924)\n[![Sphinx](https://img.shields.io/badge/Sphinx-Docs-Green)](https://erdogant.github.io/clusteval/)\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://erdogant.github.io/clusteval/pages/html/Documentation.html#colab-notebook)\n\u003c!---[![Coffee](https://img.shields.io/badge/coffee-black-grey.svg)](https://erdogant.github.io/donate/?currency=USD\u0026amount=5)--\u003e\n\n``clusteval`` is a python package that is developed to evaluate detected clusters and return the cluster labels that have most optimal **clustering tendency**, **Number of clusters** and **clustering quality**. Multiple evaluation strategies are implemented for the evaluation; **silhouette**, **dbindex**, and **derivative**, and four clustering methods can be used: **agglomerative**, **kmeans**, **dbscan** and **hdbscan**.\n\n\n# \n**⭐️ Star this repo if you like it ⭐️**\n# \n\n### Blogs\n#### [1. A step-by-step guide for clustering images](https://towardsdatascience.com/a-step-by-step-guide-for-clustering-images-4b45f9906128)\n\n#### [2. Detection of Duplicate Images Using Image Hash Functions](https://towardsdatascience.com/detection-of-duplicate-images-using-image-hash-functions-4d9c53f04a75)\n\n#### [3. From Data to Clusters: When is Your Clustering Good Enough?](https://towardsdatascience.com/from-data-to-clusters-when-is-your-clustering-good-enough-5895440a978a)\n\n#### [4. From Clusters To Insights; The Next Step](https://towardsdatascience.com/from-clusters-to-insights-the-next-step-1c166814e0c6)\n\n\n# \n\n### [Documentation pages](https://erdogant.github.io/clusteval/)\n\nOn the [documentation pages](https://erdogant.github.io/clusteval/) you can find detailed information about the working of the ``clusteval`` with many examples. \n\n# \n\n### Installation\n\n##### It is advisable to create a new environment (e.g. with Conda). \n```bash\nconda create -n env_clusteval python=3.8\nconda activate clusteval\n```\n\n##### Install from PyPI\n```bash\npip install clusteval\n```\n\n##### Import library\n```python\nfrom clusteval import clusteval\n```\n\n\u003chr\u003e\n\n### Examples\nA structured overview of all examples are now available on the [documentation pages](https://erdogant.github.io/clusteval/).\n\n\u003chr\u003e\n\n\n* [Example: Cluster validation using Silhouette score](https://erdogant.github.io/clusteval/pages/html/Examples.html#cluster-evaluation)\n\n\u003cp align=\"left\"\u003e\n  \u003ca href=\"https://erdogant.github.io/clusteval/pages/html/Examples.html#cluster-evaluation\"\u003e\n  \u003cimg src=\"https://github.com/erdogant/clusteval/blob/master/docs/figs/fig1b_sil.png\" width=\"600\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\n#\n\n* [Example: Determine the optimal number of clusters](https://erdogant.github.io/clusteval/pages/html/Plots.html#plot)\n\n\u003cp align=\"left\"\u003e\n  \u003ca href=\"https://erdogant.github.io/clusteval/pages/html/Plots.html#plot\"\u003e\n  \u003cimg src=\"https://github.com/erdogant/clusteval/blob/master/docs/figs/fig1a_sil.png\" width=\"600\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n#\n\n* [Example: Plot the dendrogram](https://erdogant.github.io/clusteval/pages/html/Plots.html#dendrogram)\n\n\u003cp align=\"left\"\u003e\n  \u003ca href=\"https://erdogant.github.io/clusteval/pages/html/Plots.html#dendrogram\"\u003e\n  \u003cimg src=\"https://github.com/erdogant/clusteval/blob/master/docs/figs/dendrogram.png\" width=\"600\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n#\n\n* [Example: Cluster validation using davies-boulin index](https://erdogant.github.io/clusteval/pages/html/Examples.html#dbindex-method)\n\n\u003cp align=\"left\"\u003e\n  \u003ca href=\"https://erdogant.github.io/clusteval/pages/html/Examples.html#dbindex-method\"\u003e\n  \u003cimg src=\"https://github.com/erdogant/clusteval/blob/master/docs/figs/dendrogram.png\" width=\"600\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n#\n\n* [Example: Cluster validation using davies-boulin index](https://erdogant.github.io/clusteval/pages/html/Examples.html#dbindex-method)\n\n\u003cp align=\"left\"\u003e\n  \u003ca href=\"https://erdogant.github.io/clusteval/pages/html/Examples.html#dbindex-method\"\u003e\n  \u003cimg src=\"https://github.com/erdogant/clusteval/blob/master/docs/figs/fig2_dbindex.png\" width=\"600\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n#\n\n* [Example: Cluster validation using derivative evaluation method](https://erdogant.github.io/clusteval/pages/html/Examples.html#derivative-method)\n\n\u003cp align=\"left\"\u003e\n  \u003ca href=\"https://erdogant.github.io/clusteval/pages/html/Examples.html#derivative-method\"\u003e\n  \u003cimg src=\"https://github.com/erdogant/clusteval/blob/master/docs/figs/fig3_der.png\" width=\"600\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n#\n\n\n* [Example: Cluster validation using dbscan](https://erdogant.github.io/clusteval/pages/html/Examples.html#dbscan)\n\n\u003cp align=\"left\"\u003e\n  \u003ca href=\"https://erdogant.github.io/clusteval/pages/html/Examples.html#dbscan\"\u003e\n  \u003cimg src=\"https://github.com/erdogant/clusteval/blob/master/docs/figs/fig5_dbscan.png\" width=\"600\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n#\n\n* [Example: Cluster validation using hdbscan](https://erdogant.github.io/clusteval/pages/html/Examples.html#hdbscan)\n\n\u003cp align=\"left\"\u003e\n  \u003ca href=\"https://erdogant.github.io/clusteval/pages/html/Examples.html#hdbscan\"\u003e\n  \u003cimg src=\"https://github.com/erdogant/clusteval/blob/master/docs/figs/fig4a_hdbscan.png\" width=\"600\" /\u003e\n  \u003cimg src=\"https://github.com/erdogant/clusteval/blob/master/docs/figs/fig4b_hdbscan.png\" width=\"600\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\n\n\n\n\n## Citation\nPlease cite clusteval in your publications if this is useful for your research (see right top for citation).\n\n## Other interesting techniques/blogs\n* Use ARI when the ground truth clustering has large equal sized clusters\n* Usa AMI when the ground truth clustering is unbalanced and there exist small clusters\n* https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html\n* https://scikit-learn.org/stable/auto_examples/cluster/plot_adjusted_for_chance_measures.html#sphx-glr-auto-examples-cluster-plot-adjusted-for-chance-measures-py\n* https://github.com/idealo/imagededup\n* https://towardsdatascience.com/how-to-cluster-images-based-on-visual-similarity-cd6e7209fe34\n* https://github.com/facebookresearch/deepcluster\n* https://towardsdatascience.com/pca-on-hyperspectral-data-99c9c5178385\n* https://machinelearningmastery.com/face-recognition-using-principal-component-analysis/\n\n### Maintainer\n* Erdogan Taskesen, github: [erdogant](https://github.com/erdogant)\n* Contributions are welcome.\n* If you wish to buy me a \u003ca href=\"https://erdogant.github.io/donate/?currency=USD\u0026amount=5\"\u003eCoffee\u003c/a\u003e for this work, it is very appreciated :)\n\tStar it if you like it!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ferdogant%2Fclusteval","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ferdogant%2Fclusteval","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ferdogant%2Fclusteval/lists"}