https://github.com/wilsonjr/clustershapley

Explaining dimensionality results using SHAP values
https://github.com/wilsonjr/clustershapley

cluster-formation dimensionality-reduction explainable-ai explainable-ml explanations machine-learning visualizations

Last synced: 10 months ago
JSON representation

Explaining dimensionality results using SHAP values

Host: GitHub
URL: https://github.com/wilsonjr/clustershapley
Owner: wilsonjr
License: bsd-3-clause
Created: 2021-05-12T19:34:40.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2022-11-19T21:04:01.000Z (about 3 years ago)
Last Synced: 2024-02-15T09:37:26.928Z (almost 2 years ago)
Topics: cluster-formation, dimensionality-reduction, explainable-ai, explainable-ml, explanations, machine-learning, visualizations
Language: Jupyter Notebook
Homepage:
Size: 655 KB
Stars: 48
Watchers: 4
Forks: 3
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          .. -*- mode: rst -*-

|pypi_version|_ |pypi_downloads|_

.. |pypi_version| image:: https://img.shields.io/pypi/v/cluster-shapley.svg

.. _pypi_version: https://pypi.python.org/pypi/cluster-shapley/

.. |pypi_downloads| image:: https://pepy.tech/badge/cluster-shapley/month

.. _pypi_downloads: https://pepy.tech/project/cluster-shapley

=====

ClusterShapley

=====

ClusterShapley is a technique to explain non-linear dimendionality reduction results. You can explain the cluster formation after reducing the dimensionality to 2D. Read the `preprint `_ or `publisher `_ versions for further details.

-----------

Installation

-----------

ClusterShapley depends upon common machine learning libraries, such as ``scikit-learn`` and ``NumPy``. It also depends on SHAP.

Requirements:

* shap

* numpy

* scipy

* scikit-learn

* pybind11

If you have these requirements installed, use PyPI:

.. code:: bash

    pip install cluster-shapley

--------------

Usage examples

--------------

ClusterShapley package follows the same idea of sklearn classes, in which you need to fit and transform data.

**Explaining cluster formation**

Suppose you want to investigate the decisions of a dimensionality reduction (DR) technique to impose a projection on 2D. The first thing to do is to project the dataset.

.. code:: python

	

	import umap

	

	import matplotlib.pyplot as plt

	from sklearn.datasets import load_iris

	data = load_iris()

	X, y = data.data, data.target

	reducer = umap.UMAP(verbose=0, random_state=0)

	embedding = reducer.fit_transform(X)

	plt.scatter(embedding[:, 0], embedding[:, 1], c=y)

.. image:: docs/artwork/iris.png

	:alt: UMAP embedding of the Iris dataset

**Compute explanations**

Now, you can generate explanations to understand why UMAP (or any other DR technique) imposed that cluster formation.

.. code:: python

	import random

	import numpy as np

	# our library

	import dr_explainer as dre

	# fit the dataset

	clusterShapley = dre.ClusterShapley()

	clusterShapley.fit(X, y)

	# compute explanations for data subset 

	to_explain = np.array(random.sample(X.tolist(), int(X.shape[0] * 0.2)))

	shap_values = clusterShapley.transform(to_explain)

The matrix shap_values of shape (3, 30, 4) contains: 

	* the features' contributions for each class (3);

	* upon the samples used to generate explanations (30);

	* for each feature (4).

**Visualize the contributions using SHAP plot**

For now, you can rely on SHAP library to visualize the contributions

.. code:: python

	klass = 0

	c_exp = shap.Explanation(shap_values[klass], data=to_explain, feature_names=data.feature_names)

	shap.plots.beeswarm(c_exp)

.. image:: docs/artwork/explanation_iris0.png

	:alt: Contributions for the embedding of class 0

The plot shows the contributions of each feature for the cohesion of the selected class. Example for 'petal length (cm)':

	* Low feature values (blue) contribute for the cohesion of the selected class.

	* Higher feature values (red) *do not* contribute for the cohesion.

**Defining your own clusters**

Suppose you want to investigate why UMAP clustered 2 classes together while projecting the third one distant in 2D.

To understand that, we can use ClusterShapley to explain how the features contribute to these two major clusters.

.. code:: python

	# fit KMeans with two clusters (see notebooks/ for the complete code)

.. image:: docs/artwork/kmeans_clusters.png

	:alt: Two clusters returned by KMeans on the embedding

Lets generate explanations knowing that cluster 0 is on right and cluster 1 is on left.

.. code:: python

	clusterShapley = dre.ClusterShapley()

	clusterShapley.fit(X, kmeans.labels_)

	shap_values = clusterShapley.transform(to_explain)

	

***For the right cluster***

.. code:: python

	c_exp = shap.Explanation(shap_values[0], data=to_explain, feature_names=data.feature_names)

	shap.plots.beeswarm(c_exp)

.. image:: docs/artwork/explanation0.png

	:alt: Features' contributions for cluster 0

The right cluster is characterized by the low values of petal length (cm), petal width (cm), sepal length (cm).

***For the left cluster***

.. code:: python

	c_exp = shap.Explanation(shap_values[1], data=to_explain, feature_names=data.feature_names)

	shap.plots.beeswarm(c_exp)

.. image:: docs/artwork/explanation1.png

	:alt: Features' contributions for cluster 1

On the other hand, the left cluster (composed by two classes) is characterized by high values of petal length (cm), petal width (cm), sepal length (cm).

--------

Citation

--------

Please, use the following reference to further details and to cite ClusterShapley in your work:

.. code:: bibtex

    @article{MarcilioJr2021_ClusterShapley,

	title = {Explaining dimensionality reduction results using Shapley values},

	journal = {Expert Systems with Applications},

	volume = {178},

	pages = {115020},

	year = {2021},

	issn = {0957-4174},

	doi = {https://doi.org/10.1016/j.eswa.2021.115020},

	url = {https://www.sciencedirect.com/science/article/pii/S0957417421004619},

	author = {Wilson E. Marcílio-Jr and Danilo M. Eler}

	}

-------

License

-------

ClusterShapley follows the 3-clause BSD license.

ClusterShapley uses the open-source SHAP implementation from `SHAP `_.

......

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/wilsonjr/clustershapley

Awesome Lists containing this project

README