https://github.com/viadee/xai_examples
Things that call for explanations...
https://github.com/viadee/xai_examples
explainable-ai machine-learning
Last synced: 8 months ago
JSON representation
Things that call for explanations...
- Host: GitHub
- URL: https://github.com/viadee/xai_examples
- Owner: viadee
- License: bsd-3-clause
- Created: 2019-03-04T09:55:38.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2022-01-04T17:56:54.000Z (over 4 years ago)
- Last Synced: 2025-01-30T07:18:04.813Z (over 1 year ago)
- Topics: explainable-ai, machine-learning
- Language: Java
- Size: 44.8 MB
- Stars: 4
- Watchers: 11
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://opensource.org/licenses/BSD-3-Clause)
# XAI Examples
This repository contains tutorials and readily compilable projects/source code concerning Explainable Artificial Intelligence (XAI).
The following algorithms and specific implementations are being used:
+ Anchors. [Implementation](https://github.com/viadee/javaAnchorExplainer) and its optional [Adapters](https://github.com/viadee/javaAnchorAdapters)
## Anchors Titanic Examples
One of the Anchors implementation's main feature consists in facilitating the usage of tabular explanations by providing default
solutions to common scenarios in conjunction with the anchorj library.
The following use-case exemplifies its usage by creating both local and global explanations of the
[Titanic tabular dataset](https://www.kaggle.com/c/titanic/data).
### 1. Referencing Dependencies
Using Apache Maven, the required anchorj dependencies are easily referenced and added as follows:
de.viadee.xai.anchor
algorithm
1.0.2
de.viadee.xai.anchor
DefaultConfigsAdapter
1.0.3
de.viadee.xai.anchor
DefaultMLMethods
1.0.3
org.apache.commons
commons-csv
1.6
org.apache.logging.log4j
log4j-slf4j-impl
2.8.1
org.apache.logging.log4j
log4j-core
2.8.1
### 1.5 Optional: Logging settings
If you want to see the logging output create a file named 'log4j2.xml' in the resources folder
and add the following lines:
%-5p [%d{yyyy-MM-dd HH:mm:ss.SSS}][%t][%c] %m%n
### 2. Loading and Describing the Dataset
AnchorTabular is henceforth used to easily set up the Anchors algorithm to handle tabular data.
Therefore, it possesses a builder that enables registering arbitrary columns that describe the dataset.
A column contains a name for identification, a number of transformations and a discretization.
Whereas transformations are meant to clean data, discretization may be used to achieve better results with Anchors by
grouping various feature values.
The Titanic dataset gets loaded and configured as follows.
AnchorTabular anchorTabular = new AnchorTabularBuilderSequential()
.setDoBalance(false)
.addIgnoredColumn("PassengerId")
.addTargetColumn(IntegerColumn.fromStringInput("Survived"))
.addColumn(IntegerColumn.fromStringInput("Pclass"))
.addColumn(new StringColumn("Name"))
.addColumn(new StringColumn("Sex"))
.addColumn(DoubleColumn.fromStringInput("Age", -1, 5))
.addColumn(IntegerColumn.fromStringInput("SibSp"))
.addColumn(IntegerColumn.fromStringInput("Parch"))
.addColumn(IntegerColumn.fromStringInput("Ticket", -1,
Collections.singletonList(new TicketNumberTransformer()), null))
.addColumn(DoubleColumn.fromStringInput("Fare", -1, 6))
.addColumn(new StringColumn("Cabin", Arrays.asList(
new ReplaceNonEmptyTransformer(true),
new ReplaceEmptyTransformer(false)),
null))
.addColumn(new StringColumn("Embarked"))
.build(ClassLoader.getSystemResourceAsStream("titanic/train.csv", true, false);
Using the sequential tabular builder, definitions for columns are stated in ascending order - just as they appear in the data.
Please note that attributes are described in greater depth in the code.
All of the configured attributes, such as columns, transformations and discretizations can be implemented and
extended as required.
It even is possibly to refrain from using this extension altogether and implement a custom solution based directly on
the base library.
The anchorAdapters will conclude this phase by logging all discretization changes it made to the dataset. This helps monitoring the discretization, finding mistakes and looks like follows in this example's case:
DEBUG [2019-07-26 15:48:28.112][main][de.viadee.xai.anchor.adapter.tabular.builder.TabularPreprocessor] Discretization for column [Pclass] is configured as follows:
[1] --> 2
[2] --> 1
[3] --> 0
DEBUG [2019-07-26 15:48:28.113][main][de.viadee.xai.anchor.adapter.tabular.builder.TabularPreprocessor] Discretization for column [Name] is configured as follows:
[Ware, Mrs. John James (Florence Louise Long)] --> 362
[Cotterill, Mr. Henry Harry""] --> 377
[Olsson, Mr. Oscar Wilhelm] --> 300
[Buckley, Mr. Daniel] --> 106
[Buckley, Miss. Katherine] --> 113
[Riordan, Miss. Johanna Hannah""] --> 408
[Pallas y Castello, Mr. Emilio] --> 369
[Karnes, Mrs. J Frank (Claire Bennett)] --> 246
[Payne, Mr. Vivian Ponsonby] --> 390
[Palsson, Master. Paul Folke] --> 389
[Denbury, Mr. Herbert] --> 338
[Makinen, Mr. Kalle Edvard] --> 97
[Davies, Mr. John Samuel] --> 9
[Chronopoulos, Mr. Demetrios] --> 115
[Dodge, Mrs. Washington (Ruth Vidaver)] --> 374
... (and 403 more elements)
DEBUG [2019-07-26 15:48:28.114][main][de.viadee.xai.anchor.adapter.tabular.builder.TabularPreprocessor] Discretization for column [Sex] is configured as follows:
[female] --> 1
[male] --> 0
DEBUG [2019-07-26 15:48:28.116][main][de.viadee.xai.anchor.adapter.tabular.builder.TabularPreprocessor] Discretization for column [Age] is configured as follows:
]-1, -1) --> -1
[0.17, 21) --> 17
[22, 27) --> 24
[28, 39) --> 32
[40, 76[ --> 48
DEBUG [2019-07-26 15:48:28.116][main][de.viadee.xai.anchor.adapter.tabular.builder.TabularPreprocessor] Discretization for column [SibSp] is configured as follows:
[0] --> 0
[1] --> 1
[2] --> 2
[3] --> 3
[4] --> 4
[5] --> 5
[8] --> 6
DEBUG [2019-07-26 15:48:28.116][main][de.viadee.xai.anchor.adapter.tabular.builder.TabularPreprocessor] Discretization for column [Parch] is configured as follows:
[0] --> 0
[1] --> 1
[2] --> 3
[3] --> 2
[4] --> 4
[5] --> 6
[6] --> 5
[9] --> 7
DEBUG [2019-07-26 15:48:28.118][main][de.viadee.xai.anchor.adapter.tabular.builder.TabularPreprocessor] Discretization for column [Ticket] is configured as follows:
[24065] --> 13
[11778] --> 96
[2] --> 173
[23101284] --> 230
[233478] --> 168
[23101291] --> 45
[3085] --> 30
[9232] --> 193
[349202] --> 267
[349211] --> 68
[14879] --> 198
[2079] --> 320
[349220] --> 10
[11813] --> 48
[4133] --> 126
... (and 347 more elements)
DEBUG [2019-07-26 15:48:28.120][main][de.viadee.xai.anchor.adapter.tabular.builder.TabularPreprocessor] Discretization for column [Fare] is configured as follows:
]-1, 7.75) --> 7.27
[7.78, 8.67) --> 7.9
[8.72, 14.46) --> 12.55
[14.46, 26) --> 21
[26.55, 56.5) --> 31.5
[57.75, 512.33[ --> 83.16
DEBUG [2019-07-26 15:48:28.120][main][de.viadee.xai.anchor.adapter.tabular.builder.TabularPreprocessor] Discretization for column [Cabin] is configured as follows:
[false] --> 0
[true] --> 1
DEBUG [2019-07-26 15:48:28.121][main][de.viadee.xai.anchor.adapter.tabular.builder.TabularPreprocessor] Discretization for column [Embarked] is configured as follows:
[Q] --> 0
[S] --> 1
[C] --> 2
### 3. Obtaining the Model
Anchors is a Model-Agnostic explanation algorithm and can describe
any classification model. Hence, its presence is implicitly assumed when creating explanations.
However, for this example a default solution, i.e. a random forest model is used to remove the need for requirements.
TabularRandomForestClassifier randomForestModel = new TabularRandomForestClassifier(100, true);
randomForestModel.fit(anchorTabular.getTabularInstances());
Nonetheless, an arbitrary and custom model can easily be included by implementing the
ClassificationFunction interface and its predict method.
More options are further provided by the ModelImportExtension project. This enables, among others, exported
H2O models to be effortlessly explained.
### 4. Obtaining the Explanation
Since both a classifier and perturbation function are now provided, an AnchorConstructionBuilder can be
obtained.
(The perturbation function is created by AnchorTabular. Implementing a custom solution is - of course - possible)
The AnchorConstructionBuilder offers configuring various parameters of the algorithm and
can be received by the previously configured AnchorTabular as follows:
AnchorConstructionBuilder defaultBuilder = anchorTabular
.createDefaultBuilder(classifier, anchorTabular.getTabularInstances()[0]);
This builder instance can henceforth be used to create explanations (in this case for the first instance contained
by the anchorTabular instance):
AnchorResult anchor = defaultBuilder.build().constructAnchor();
anchor now provides information about why the model predicted the instance the way it did.
In order to make the explanation human readable, the TabularInstanceVisualizer provided by the
anchorTabular can be used as follows:
System.out.println("====Explained instance====" + System.lineSeparator() +
anchorTabular.getVisualizer().visualizeInstance(anchor.getInstance()));
System.out.println("====Result====" + System.lineSeparator() +
anchorTabular.getVisualizer().visualizeResult(anchor));
### 5. Global Explanations
The main project contains various algorithms that are able to aggregate multiple single explanations.
Thereof, CoveragePick is expected to work best. It can be used as follows:
List> globalExplanations = new CoveragePick<>(defaultBuilder, 10,
Executors.newCachedThreadPool(),
null)
.run(anchorTabular.shuffleSplitInstances(1, 0)[0], 20);
Similarly, its results may be visualized
System.out.println(anchorTabular.getVisualizer().visualizeGlobalResults(globalExplanations));
### 6. Exemplary Outputs
The above stated examples produce output similar to the following samples.
====Explained instance====
Pclass='3'
Name='Gilnagh, Miss. Katherine 'Katie''
Sex='female'
Age='16'
SibSp='0'
Parch='0'
Ticket='35851'
Fare='7'
Embarked='Q'
WITH LABEL Survived='1'
====Result====
IF Sex='female' {0.85,-0.58} AND
Embarked='Q' {0.11,-0.37} AND
SibSp='0' {0.04,-0} AND
Parch='0' {0.02,-0}
THEN PREDICT 1
WITH PRECISION 1.0 AND COVERAGE 0.033
===Global Result #1===
IF Fare IN RANGE [0,8] {0.63,-0.64} AND
Sex='male' {0.37,-0.07} AND
Parch='0' {0.01,-0}
THEN PREDICT 0
WITH PRECISION 1.0 AND COVERAGE 0.283
===Global Result #2===
IF Fare IN RANGE [53,512] {0.68,-0.8} AND
Sex='female' {0.29,-0.07} AND
Pclass='1' {0.04,-0}
THEN PREDICT 1
WITH PRECISION 1.0 AND COVERAGE 0.117
===Global Result #3===
IF SibSp='8' {0.64,-0.99} AND
Pclass='3' {0.06,0} AND
Ticket='CA. 2343' {0.19,0} AND
Embarked='S' {0.09,0} AND
Age IN RANGE [-1,0] {0.04,0}
THEN PREDICT 0
WITH PRECISION 1.0 AND COVERAGE 0.007
===Global Result #4===
IF Fare IN RANGE [27,52] {0.61,-0.87} AND
Name='Barkworth, Mr. Algernon Henry Wilson' {0.24,-0.12} AND
Ticket='27042' {0.16,0}
THEN PREDICT 1
WITH PRECISION 1.0 AND COVERAGE 0.002
===Global Result #5===
IF Fare IN RANGE [15,26] {0.55,-0.81} AND
Parch='2' {0.19,-0.16} AND
SibSp='1' {0.12,-0.02} AND
Name='Dean, Master. Bertram Vere' {0.16,0}
THEN PREDICT 1
WITH PRECISION 1.0 AND COVERAGE 0.002
Note that the bracketed values describe the added precision and coverage the inclusion of the respective feature
effected. These values can be used to quickly infer a less precise anchor having a superior coverage.
### Optimizations
The required time to obtain explanations depends almost exclusively on the model and its latencies.
Depending on the explained instances and set parameters, this runtime can range from a few seconds to multiple hours.
The above examples should terminate in a few seconds, due to the random forest's high performance.
So, it shall not go unnoticed that the previous examples can be sped up significantly by configuring Anchors to
utilize different forms of parallelization:
#### Threading
Enabling threading is easily achieved by configuring the
AnchorConstructionBuilder:
defaultBuilder.enableThreading(10 /*ThreadCount*/, Executors.newFixedThreadPool(10), null);
This leads to single explanations being explained significantly fast (depending on your machine's performance and the
model's latency).
#### FastMPJ and Apache Spark
Furthermore, multiple approaches are included in this project to load balance the creation of multiple explanations
among a cluster of computers.
This is especially useful for global explanations.
* A message passing interface (MPI) implementation is included in the FastMPJExtension package.
* An Apache Spark adapter can be found in the SparkExtension.
* A default Threading approach is included in anchorj core.
These methods provide implementations of the BatchExplainer interface which can be plugged in to the global
explainers as follows:
new CoveragePick<>(false, new SparkBatchExplainer(sparkContext), defaultBuilder);
However, these methods require advanced setups and configurations.
For further information, please refer to respective project documentations.
# Collaboration
The project is operated and further developed by the viadee Consulting AG in Münster, Westphalia. Results from theses at the WWU Münster and the FH Münster have been incorporated.
* Further theses are planned: Contact person is Dr. Frank Köhne from viadee.
Community contributions to the project are welcome: Please open Github-Issues with suggestions (or PR), which we can then edit in the team. For general discussions please refer to the [main repository](https://github.com/viadee/javaAnchorExplainer).
* We are looking for further partners who have interesting process data to refine our tooling as well as partners that are simply interested in a discussion about AI in the context of business process automation and explainability.