https://github.com/viadee/xai_examples

Things that call for explanations...
https://github.com/viadee/xai_examples
explainable-ai machine-learning
Last synced: 8 months ago
JSON representation
Things that call for explanations...
Host: GitHub
URL: https://github.com/viadee/xai_examples
Owner: viadee
License: bsd-3-clause
Created: 2019-03-04T09:55:38.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2022-01-04T17:56:54.000Z (over 4 years ago)
Last Synced: 2025-01-30T07:18:04.813Z (over 1 year ago)
Topics: explainable-ai, machine-learning
Language: Java
Size: 44.8 MB
Stars: 4
Watchers: 11
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          [![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)

# XAI Examples

This repository contains tutorials and readily compilable projects/source code concerning Explainable Artificial Intelligence (XAI).

The following algorithms and specific implementations are being used:

+ Anchors. [Implementation](https://github.com/viadee/javaAnchorExplainer) and its optional [Adapters](https://github.com/viadee/javaAnchorAdapters)

## Anchors Titanic Examples

One of the Anchors implementation's main feature consists in facilitating the usage of tabular explanations by providing default 

solutions to common scenarios in conjunction with the anchorj library.

The following use-case exemplifies its usage by creating both local and global explanations of the 

[Titanic tabular dataset](https://www.kaggle.com/c/titanic/data).

### 1. Referencing Dependencies

Using Apache Maven, the required anchorj dependencies are easily referenced and added as follows:

    

    

         de.viadee.xai.anchor

         algorithm

         1.0.2

    

    

    

    

        de.viadee.xai.anchor

        DefaultConfigsAdapter

        1.0.3

    

    

    

    

        de.viadee.xai.anchor

        DefaultMLMethods

        1.0.3

    

    

    

    

        org.apache.commons

        commons-csv

        1.6

    

     

    

        org.apache.logging.log4j

        log4j-slf4j-impl

        2.8.1

    

    

        org.apache.logging.log4j

        log4j-core

        2.8.1

    

### 1.5 Optional: Logging settings

If you want to see the logging output create a file named 'log4j2.xml' in the resources folder

and add the following lines:

    

    

        

            %-5p [%d{yyyy-MM-dd HH:mm:ss.SSS}][%t][%c] %m%n

        

        

            

                

            

        

    

        

            

                

            

        

    

### 2. Loading and Describing the Dataset

AnchorTabular is henceforth used to easily set up the Anchors algorithm to handle tabular data. 

Therefore, it possesses a builder that enables registering arbitrary columns that describe the dataset. 

A column contains a name for identification, a number of transformations and a discretization.

Whereas transformations are meant to clean data, discretization may be used to achieve better results with Anchors by 

grouping various feature values.

The Titanic dataset gets loaded and configured as follows. 

    AnchorTabular anchorTabular = new AnchorTabularBuilderSequential()

     .setDoBalance(false)

     .addIgnoredColumn("PassengerId")

     .addTargetColumn(IntegerColumn.fromStringInput("Survived"))

     .addColumn(IntegerColumn.fromStringInput("Pclass"))

     .addColumn(new StringColumn("Name"))

     .addColumn(new StringColumn("Sex"))

     .addColumn(DoubleColumn.fromStringInput("Age", -1, 5))

     .addColumn(IntegerColumn.fromStringInput("SibSp"))

     .addColumn(IntegerColumn.fromStringInput("Parch"))

     .addColumn(IntegerColumn.fromStringInput("Ticket", -1,

             Collections.singletonList(new TicketNumberTransformer()), null))

     .addColumn(DoubleColumn.fromStringInput("Fare", -1, 6))

     .addColumn(new StringColumn("Cabin", Arrays.asList(

             new ReplaceNonEmptyTransformer(true),

             new ReplaceEmptyTransformer(false)),

             null))

     .addColumn(new StringColumn("Embarked"))

     .build(ClassLoader.getSystemResourceAsStream("titanic/train.csv", true, false);

Using the sequential tabular builder, definitions for columns are stated in ascending order - just as they appear in the data.

Please note that attributes are described in greater depth in the code. 

All of the configured attributes, such as columns, transformations and discretizations can be implemented and 

extended as required.

It even is possibly to refrain from using this extension altogether and implement a custom solution based directly on 

the base library.

The anchorAdapters will conclude this phase by logging all discretization changes it made to the dataset. This helps monitoring the discretization, finding mistakes and looks like follows in this example's case:

    DEBUG [2019-07-26 15:48:28.112][main][de.viadee.xai.anchor.adapter.tabular.builder.TabularPreprocessor] Discretization for column [Pclass] is configured as follows:

        [1] --> 2

        [2] --> 1

        [3] --> 0

    DEBUG [2019-07-26 15:48:28.113][main][de.viadee.xai.anchor.adapter.tabular.builder.TabularPreprocessor] Discretization for column [Name] is configured as follows:

        [Ware, Mrs. John James (Florence Louise Long)] --> 362

        [Cotterill, Mr. Henry Harry""] --> 377

        [Olsson, Mr. Oscar Wilhelm] --> 300

        [Buckley, Mr. Daniel] --> 106

        [Buckley, Miss. Katherine] --> 113

        [Riordan, Miss. Johanna Hannah""] --> 408

        [Pallas y Castello, Mr. Emilio] --> 369

        [Karnes, Mrs. J Frank (Claire Bennett)] --> 246

        [Payne, Mr. Vivian Ponsonby] --> 390

        [Palsson, Master. Paul Folke] --> 389

        [Denbury, Mr. Herbert] --> 338

        [Makinen, Mr. Kalle Edvard] --> 97

        [Davies, Mr. John Samuel] --> 9

        [Chronopoulos, Mr. Demetrios] --> 115

        [Dodge, Mrs. Washington (Ruth Vidaver)] --> 374

        ... (and 403 more elements)

    DEBUG [2019-07-26 15:48:28.114][main][de.viadee.xai.anchor.adapter.tabular.builder.TabularPreprocessor] Discretization for column [Sex] is configured as follows:

        [female] --> 1

        [male] --> 0

    DEBUG [2019-07-26 15:48:28.116][main][de.viadee.xai.anchor.adapter.tabular.builder.TabularPreprocessor] Discretization for column [Age] is configured as follows:

        ]-1, -1) --> -1

        [0.17, 21) --> 17

        [22, 27) --> 24

        [28, 39) --> 32

        [40, 76[ --> 48

    DEBUG [2019-07-26 15:48:28.116][main][de.viadee.xai.anchor.adapter.tabular.builder.TabularPreprocessor] Discretization for column [SibSp] is configured as follows:

        [0] --> 0

        [1] --> 1

        [2] --> 2

        [3] --> 3

        [4] --> 4

        [5] --> 5

        [8] --> 6

    DEBUG [2019-07-26 15:48:28.116][main][de.viadee.xai.anchor.adapter.tabular.builder.TabularPreprocessor] Discretization for column [Parch] is configured as follows:

        [0] --> 0

        [1] --> 1

        [2] --> 3

        [3] --> 2

        [4] --> 4

        [5] --> 6

        [6] --> 5

        [9] --> 7

    DEBUG [2019-07-26 15:48:28.118][main][de.viadee.xai.anchor.adapter.tabular.builder.TabularPreprocessor] Discretization for column [Ticket] is configured as follows:

        [24065] --> 13

        [11778] --> 96

        [2] --> 173

        [23101284] --> 230

        [233478] --> 168

        [23101291] --> 45

        [3085] --> 30

        [9232] --> 193

        [349202] --> 267

        [349211] --> 68

        [14879] --> 198

        [2079] --> 320

        [349220] --> 10

        [11813] --> 48

        [4133] --> 126

        ... (and 347 more elements)

    DEBUG [2019-07-26 15:48:28.120][main][de.viadee.xai.anchor.adapter.tabular.builder.TabularPreprocessor] Discretization for column [Fare] is configured as follows:

        ]-1, 7.75) --> 7.27

        [7.78, 8.67) --> 7.9

        [8.72, 14.46) --> 12.55

        [14.46, 26) --> 21

        [26.55, 56.5) --> 31.5

        [57.75, 512.33[ --> 83.16

    DEBUG [2019-07-26 15:48:28.120][main][de.viadee.xai.anchor.adapter.tabular.builder.TabularPreprocessor] Discretization for column [Cabin] is configured as follows:

        [false] --> 0

        [true] --> 1

    DEBUG [2019-07-26 15:48:28.121][main][de.viadee.xai.anchor.adapter.tabular.builder.TabularPreprocessor] Discretization for column [Embarked] is configured as follows:

        [Q] --> 0

        [S] --> 1

        [C] --> 2

### 3. Obtaining the Model

Anchors is a Model-Agnostic explanation algorithm and can describe 

any classification model. Hence, its presence is implicitly assumed when creating explanations. 

However, for this example a default solution, i.e. a random forest model is used to remove the need for requirements.

    TabularRandomForestClassifier randomForestModel = new TabularRandomForestClassifier(100, true);

    randomForestModel.fit(anchorTabular.getTabularInstances());

                    

Nonetheless, an arbitrary and custom model can easily be included by implementing the 

ClassificationFunction interface and its predict method.

More options are further provided by the ModelImportExtension project. This enables, among others, exported 

H2O models to be effortlessly explained.

                    

### 4. Obtaining the Explanation

Since both a classifier and perturbation function are now provided, an AnchorConstructionBuilder can be 

obtained. 

(The perturbation function is created by AnchorTabular. Implementing a custom solution is - of course - possible)

The AnchorConstructionBuilder offers configuring various parameters of the algorithm and

can be received by the previously configured AnchorTabular as follows:

    AnchorConstructionBuilder defaultBuilder = anchorTabular

                    .createDefaultBuilder(classifier, anchorTabular.getTabularInstances()[0]);

                    

This builder instance can henceforth be used to create explanations (in this case for the first instance contained 

by the anchorTabular instance):

    AnchorResult anchor = defaultBuilder.build().constructAnchor();        

    

anchor now provides information about why the model predicted the instance the way it did. 

In order to make the explanation human readable, the TabularInstanceVisualizer provided by the 

anchorTabular can be used as follows:

    System.out.println("====Explained instance====" + System.lineSeparator() +

            anchorTabular.getVisualizer().visualizeInstance(anchor.getInstance()));

            

    System.out.println("====Result====" + System.lineSeparator() +

            anchorTabular.getVisualizer().visualizeResult(anchor));

            

### 5. Global Explanations

The main project contains various algorithms that are able to aggregate multiple single explanations. 

Thereof, CoveragePick is expected to work best. It can be used as follows:

    List> globalExplanations = new CoveragePick<>(defaultBuilder, 10,

                                                                                Executors.newCachedThreadPool(),

                                                                                null)

                    .run(anchorTabular.shuffleSplitInstances(1, 0)[0], 20);          

Similarly, its results may be visualized

    

    System.out.println(anchorTabular.getVisualizer().visualizeGlobalResults(globalExplanations));

       

    

### 6. Exemplary Outputs

The above stated examples produce output similar to the following samples.

    ====Explained instance====

        Pclass='3'

        Name='Gilnagh, Miss. Katherine 'Katie''

        Sex='female'

        Age='16'

        SibSp='0'

        Parch='0'

        Ticket='35851'

        Fare='7'

        Embarked='Q'

        WITH LABEL Survived='1'

    ====Result====

        IF Sex='female' {0.85,-0.58} AND 

        Embarked='Q' {0.11,-0.37} AND 

        SibSp='0' {0.04,-0} AND 

        Parch='0' {0.02,-0}

        THEN PREDICT 1

        WITH PRECISION 1.0 AND COVERAGE 0.033

    

    ===Global Result #1===

        IF Fare IN RANGE [0,8] {0.63,-0.64} AND 

        Sex='male' {0.37,-0.07} AND 

        Parch='0' {0.01,-0}

        THEN PREDICT 0

        WITH PRECISION 1.0 AND COVERAGE 0.283

    ===Global Result #2===

        IF Fare IN RANGE [53,512] {0.68,-0.8} AND 

        Sex='female' {0.29,-0.07} AND 

        Pclass='1' {0.04,-0}

        THEN PREDICT 1

        WITH PRECISION 1.0 AND COVERAGE 0.117

    ===Global Result #3===

        IF SibSp='8' {0.64,-0.99} AND 

        Pclass='3' {0.06,0} AND 

        Ticket='CA. 2343' {0.19,0} AND 

        Embarked='S' {0.09,0} AND 

        Age IN RANGE [-1,0] {0.04,0}

        THEN PREDICT 0

        WITH PRECISION 1.0 AND COVERAGE 0.007

    ===Global Result #4===

        IF Fare IN RANGE [27,52] {0.61,-0.87} AND 

        Name='Barkworth, Mr. Algernon Henry Wilson' {0.24,-0.12} AND 

        Ticket='27042' {0.16,0}

        THEN PREDICT 1

        WITH PRECISION 1.0 AND COVERAGE 0.002

    ===Global Result #5===

        IF Fare IN RANGE [15,26] {0.55,-0.81} AND 

        Parch='2' {0.19,-0.16} AND 

        SibSp='1' {0.12,-0.02} AND 

        Name='Dean, Master. Bertram Vere' {0.16,0}

        THEN PREDICT 1

        WITH PRECISION 1.0 AND COVERAGE 0.002

        

Note that the bracketed values describe the added precision and coverage the inclusion of the respective feature 

effected. These values can be used to quickly infer a less precise anchor having a superior coverage.        

### Optimizations 

The required time to obtain explanations depends almost exclusively on the model and its latencies.

Depending on the explained instances and set parameters, this runtime can range from a few seconds to multiple hours.

The above examples should terminate in a few seconds, due to the random forest's high performance.

So, it shall not go unnoticed that the previous examples can be sped up significantly by configuring Anchors to 

utilize different forms of parallelization:

#### Threading

Enabling threading is easily achieved by configuring the 

AnchorConstructionBuilder:

    defaultBuilder.enableThreading(10 /*ThreadCount*/, Executors.newFixedThreadPool(10), null);

    

This leads to single explanations being explained significantly fast (depending on your machine's performance and the 

model's latency).

#### FastMPJ and Apache Spark

Furthermore, multiple approaches are included in this project to load balance the creation of multiple explanations

among a cluster of computers.

This is especially useful for global explanations.

* A message passing interface (MPI) implementation is included in the FastMPJExtension package.

* An Apache Spark adapter can be found in the SparkExtension.

* A default Threading approach is included in anchorj core.

These methods provide implementations of the BatchExplainer interface which can be plugged in to the global

explainers as follows:

    new CoveragePick<>(false, new SparkBatchExplainer(sparkContext), defaultBuilder);

    

However, these methods require advanced setups and configurations. 

For further information, please refer to respective project documentations.

# Collaboration

The project is operated and further developed by the viadee Consulting AG in Münster, Westphalia. Results from theses at the WWU Münster and the FH Münster have been incorporated.

* Further theses are planned: Contact person is Dr. Frank Köhne from viadee.

    Community contributions to the project are welcome: Please open Github-Issues with suggestions (or PR), which we can then edit in the team. For general discussions please refer to the [main repository](https://github.com/viadee/javaAnchorExplainer).

*   We are looking for further partners who have interesting process data to refine our tooling as well as partners that are simply interested in a discussion about AI in the context of business process automation and explainability.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/viadee/xai_examples

Awesome Lists containing this project

README