{"id":24975333,"url":"https://github.com/wilsonjr/clustershapley","last_synced_at":"2026-04-02T02:02:18.634Z","repository":{"id":44888450,"uuid":"366832818","full_name":"wilsonjr/ClusterShapley","owner":"wilsonjr","description":"Explaining dimensionality results using SHAP values","archived":false,"fork":false,"pushed_at":"2022-11-19T21:04:01.000Z","size":671,"stargazers_count":48,"open_issues_count":1,"forks_count":3,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-02-15T09:37:26.928Z","etag":null,"topics":["cluster-formation","dimensionality-reduction","explainable-ai","explainable-ml","explanations","machine-learning","visualizations"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wilsonjr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-05-12T19:34:40.000Z","updated_at":"2023-10-03T23:28:30.000Z","dependencies_parsed_at":"2022-08-31T08:50:30.399Z","dependency_job_id":null,"html_url":"https://github.com/wilsonjr/ClusterShapley","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wilsonjr%2FClusterShapley","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wilsonjr%2FClusterShapley/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wilsonjr%2FClusterShapley/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wilsonjr%2FClusterShapley/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wilsonjr","download_url":"https://codeload.github.com/wilsonjr/ClusterShapley/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248396312,"owners_count":21096915,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cluster-formation","dimensionality-reduction","explainable-ai","explainable-ml","explanations","machine-learning","visualizations"],"created_at":"2025-02-03T20:55:25.550Z","updated_at":"2026-04-02T02:02:18.571Z","avatar_url":"https://github.com/wilsonjr.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":".. -*- mode: rst -*-\n\n|pypi_version|_ |pypi_downloads|_\n\n.. |pypi_version| image:: https://img.shields.io/pypi/v/cluster-shapley.svg\n.. _pypi_version: https://pypi.python.org/pypi/cluster-shapley/\n\n.. |pypi_downloads| image:: https://pepy.tech/badge/cluster-shapley/month\n.. _pypi_downloads: https://pepy.tech/project/cluster-shapley\n\n=====\nClusterShapley\n=====\n\nClusterShapley is a technique to explain non-linear dimendionality reduction results. You can explain the cluster formation after reducing the dimensionality to 2D. Read the `preprint \u003chttps://arxiv.org/abs/2103.05678\u003e`_ or `publisher \u003chttps://doi.org/10.1016/j.eswa.2021.115020\u003e`_ versions for further details.\n\n-----------\nInstallation\n-----------\n\nClusterShapley depends upon common machine learning libraries, such as ``scikit-learn`` and ``NumPy``. It also depends on SHAP.\n\nRequirements:\n\n* shap\n* numpy\n* scipy\n* scikit-learn\n* pybind11\n\nIf you have these requirements installed, use PyPI:\n\n.. code:: bash\n\n    pip install cluster-shapley\n\n--------------\nUsage examples\n--------------\n\nClusterShapley package follows the same idea of sklearn classes, in which you need to fit and transform data.\n\n**Explaining cluster formation**\n\nSuppose you want to investigate the decisions of a dimensionality reduction (DR) technique to impose a projection on 2D. The first thing to do is to project the dataset.\n\n.. code:: python\n\t\n\timport umap\n\t\n\timport matplotlib.pyplot as plt\n\n\tfrom sklearn.datasets import load_iris\n\n\n\tdata = load_iris()\n\tX, y = data.data, data.target\n\n\treducer = umap.UMAP(verbose=0, random_state=0)\n\tembedding = reducer.fit_transform(X)\n\tplt.scatter(embedding[:, 0], embedding[:, 1], c=y)\n\n.. image:: docs/artwork/iris.png\n\t:alt: UMAP embedding of the Iris dataset\n\n**Compute explanations**\n\nNow, you can generate explanations to understand why UMAP (or any other DR technique) imposed that cluster formation.\n\n.. code:: python\n\n\timport random\n\timport numpy as np\n\t# our library\n\timport dr_explainer as dre\n\n\n\t# fit the dataset\n\tclusterShapley = dre.ClusterShapley()\n\tclusterShapley.fit(X, y)\n\n\t# compute explanations for data subset \n\n\tto_explain = np.array(random.sample(X.tolist(), int(X.shape[0] * 0.2)))\n\n\tshap_values = clusterShapley.transform(to_explain)\n\nThe matrix shap_values of shape (3, 30, 4) contains: \n\t* the features' contributions for each class (3);\n\t* upon the samples used to generate explanations (30);\n\t* for each feature (4).\n\n**Visualize the contributions using SHAP plot**\n\nFor now, you can rely on SHAP library to visualize the contributions\n\n.. code:: python\n\n\tklass = 0\n\tc_exp = shap.Explanation(shap_values[klass], data=to_explain, feature_names=data.feature_names)\n\tshap.plots.beeswarm(c_exp)\n\n\n.. image:: docs/artwork/explanation_iris0.png\n\t:alt: Contributions for the embedding of class 0\n\nThe plot shows the contributions of each feature for the cohesion of the selected class. Example for 'petal length (cm)':\n\n\t* Low feature values (blue) contribute for the cohesion of the selected class.\n\t* Higher feature values (red) *do not* contribute for the cohesion.\n\n\n**Defining your own clusters**\n\nSuppose you want to investigate why UMAP clustered 2 classes together while projecting the third one distant in 2D.\n\nTo understand that, we can use ClusterShapley to explain how the features contribute to these two major clusters.\n\n\n.. code:: python\n\n\t# fit KMeans with two clusters (see notebooks/ for the complete code)\n\n\n.. image:: docs/artwork/kmeans_clusters.png\n\t:alt: Two clusters returned by KMeans on the embedding\n\nLets generate explanations knowing that cluster 0 is on right and cluster 1 is on left.\n\n.. code:: python\n\n\tclusterShapley = dre.ClusterShapley()\n\tclusterShapley.fit(X, kmeans.labels_)\n\n\tshap_values = clusterShapley.transform(to_explain)\n\n\t\n***For the right cluster***\n\n.. code:: python\n\n\tc_exp = shap.Explanation(shap_values[0], data=to_explain, feature_names=data.feature_names)\n\tshap.plots.beeswarm(c_exp)\n\n.. image:: docs/artwork/explanation0.png\n\t:alt: Features' contributions for cluster 0\n\nThe right cluster is characterized by the low values of petal length (cm), petal width (cm), sepal length (cm).\n\n\n***For the left cluster***\n\n.. code:: python\n\n\tc_exp = shap.Explanation(shap_values[1], data=to_explain, feature_names=data.feature_names)\n\tshap.plots.beeswarm(c_exp)\n\n.. image:: docs/artwork/explanation1.png\n\t:alt: Features' contributions for cluster 1\n\nOn the other hand, the left cluster (composed by two classes) is characterized by high values of petal length (cm), petal width (cm), sepal length (cm).\n\n\n--------\nCitation\n--------\n\nPlease, use the following reference to further details and to cite ClusterShapley in your work:\n\n.. code:: bibtex\n\n    @article{MarcilioJr2021_ClusterShapley,\n\ttitle = {Explaining dimensionality reduction results using Shapley values},\n\tjournal = {Expert Systems with Applications},\n\tvolume = {178},\n\tpages = {115020},\n\tyear = {2021},\n\tissn = {0957-4174},\n\tdoi = {https://doi.org/10.1016/j.eswa.2021.115020},\n\turl = {https://www.sciencedirect.com/science/article/pii/S0957417421004619},\n\tauthor = {Wilson E. Marcílio-Jr and Danilo M. Eler}\n\t}\n\n\n-------\nLicense\n-------\n\nClusterShapley follows the 3-clause BSD license.\n\nClusterShapley uses the open-source SHAP implementation from `SHAP \u003chttps://github.com/slundberg/shap\u003e`_.\n\n\n......\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwilsonjr%2Fclustershapley","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwilsonjr%2Fclustershapley","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwilsonjr%2Fclustershapley/lists"}