Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/SDM-TIB/InterpretME

An interpretable machine learning pipeline over knowledge graphs
https://github.com/SDM-TIB/InterpretME

interpretability knowledge-graph machine-learning-interpretability machine-learning-models ontologies shacl

Last synced: 2 months ago
JSON representation

An interpretable machine learning pipeline over knowledge graphs

Host: GitHub
URL: https://github.com/SDM-TIB/InterpretME
Owner: SDM-TIB
License: mit
Created: 2022-05-04T08:16:49.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-03-15T16:07:48.000Z (3 months ago)
Last Synced: 2024-04-15T17:43:08.799Z (2 months ago)
Topics: interpretability, knowledge-graph, machine-learning-interpretability, machine-learning-models, ontologies, shacl
Language: Jupyter Notebook
Homepage:
Size: 8.77 MB
Stars: 23
Watchers: 6
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Lists

awesome-machine-learning-interpretability - InterpretME - TIB/InterpretME?style=social) | "integrates knowledge graphs (KG) with machine learning methods to generate interesting meaningful insights. It helps to generate human- and machine-readable decisions to provide assistance to users and enhance efficiency.” | (Technical Resources / Open Source/Access Responsible AI Software Packages)

README

[![DOI](https://zenodo.org/badge/488505724.svg)](https://zenodo.org/badge/latestdoi/488505724)
[![Docker Image](https://img.shields.io/badge/Docker%20Image-sdmtib/interpretme-blue?logo=Docker)](https://hub.docker.com/r/sdmtib/interpretme)
[![Latest Release](http://img.shields.io/github/release/SDM-TIB/InterpretME.svg?logo=github)](https://github.com/SDM-TIB/InterpretME/releases)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

[![Python Versions](https://img.shields.io/pypi/pyversions/InterpretME)](https://pypi.org/project/InterpretME)
[![Package Format](https://img.shields.io/pypi/format/InterpretME)](https://pypi.org/project/InterpretME)
[![Package Status](https://img.shields.io/pypi/status/InterpretME)](https://pypi.org/project/InterpretME)
[![Package Version](https://img.shields.io/pypi/v/InterpretME)](https://pypi.org/project/InterpretME)

**Demo:**

[![GitHub](https://img.shields.io/badge/GitHub-SDM--TIB%2FInterpretME__Demo-blue?logo=GitHub)](https://github.com/SDM-TIB/InterpretME_Demo)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/SDM-TIB/InterpretME_Demo/main?labpath=InterpretME_Demo.ipynb)

# InterpretME

![InterpretME Architecture](https://raw.githubusercontent.com/SDM-TIB/InterpretME/main/images/architecture.png "InterpretME Architecture")

InterpretME integrates knowledge graphs (KG) with machine learning methods to generate interesting meaningful insights.
It helps to generate human- and machine-readable decisions to provide assistance to users and enhance efficiency.
InterpretME is a tool for fine-grained representations, in a KG, of the main characteristics of trained machine learning models.
It receives as input the features' definition, classes and the SHACL constraints from multiple KGs.
InterpretME takes JSON input from the user as shown below. The features' definition are classified into independent and dependent variables later used in the predictive models.
The feature definition has the following format _"x": "?x a . \n ", "gender": "Optional { ?x ?gender.}_ where the first part states the attribute from the KG and the later part describes the definition of that attribute in the KG using SPARQL.
This definition of features allows InterpretME to trace back the origin of that feature in the KG.
Given the features' definitions and the target definition, a _SELECT_ SPARQL query is built to retrieve the application domain data.
InterpretME also takes constraints as input from the user to check if the entity validates or invalidates the constraints.
InterpretME is divided into two main quadrants.
The first one is "Training interpretable predictive model" and the second is "Documenting interpretable predictive model".
In brief, the first quadrant is responsible to perform all the predictive model pipeline components which include data preparation, applying sampling strategy to the data, building the predictive model and lastly generating visualization of the predictive models encompassed with the SHACL constraints.
The second quadrant "Documenting of interpretable predictive model" provides assistance to the user by generating the InterpretME KG and executing federated query on top of the InterpretME KG and original KG.
This, in turn, helps user to perform data exploration and trace the entity predicted with all the relevant features in the original KG.
Additionally, different metrics like precision, recall and accuracy along with LIME interpretations are provided to the user.
InterpretME accepts input in the form of Knowledge graphs (`SPARQL endpoint` and `query`) or datasets (`.csv` or `.json`).
```json
{
"Endpoint": "http://frenchroyalty:8890/sparql",
"Type": "Person",
"Index_var": "x",
"Independent_variable": {
"x": "?x a . \n ",
"gender": "Optional { ?x ?gender } .\n ",
"childs": "?x ?childs . \n ",
"predecessors": "?x ?predecessors . \n",
"preds": "?x ?preds .\n",
"objects": "?x ?objects . \n",
"subjects": "?x ?subjects . \n"
},
"Dependent_variable": {
"HasSpouse": "{ SELECT ?x, ((?partners > 0) AS ?HasSpouse) WHERE { ?x ?partners . }} \n"
},
"Constraints": [
{
"name": "C3",
"inverted": false,
"shape_schema_dir": "example/shapes/french_royalty/spouse/rule3",
"target_shape": ""
},
{
"name": "C2",
"inverted": false,
"shape_schema_dir": "example/shapes/french_royalty/spouse/rule2",
"target_shape": ""
},
{
"name": "C1",
"inverted": false,
"shape_schema_dir": "example/shapes/french_royalty/spouse/rule1",
"target_shape": ""
}

],
"classes": {
"NoSpouse": "0",
"HasSpouse": "1"
},
"3_valued_logic": true,
"sampling_strategy": "undersampling",
"number_important_features": 5,
"cross_validation_folds": 5,
"test_split": 0.3,
"model": "Random Forest",
"min_max_depth": 4,
"max_max_depth": 6
}
```

## The InterpretME Ontology
The ontology used to describe the metadata traced by InterpretME can be explored in [VoCoL](http://ontology.tib.eu/InterpretME) and [WebProtégé](https://webprotege.stanford.edu/#projects/4dfe5ddb-752e-4dc9-b360-943785f0b0af/edit/Classes) (WebProtégé account required).

![InterpretME Ontology Visualization](https://raw.githubusercontent.com/SDM-TIB/InterpretME/main/images/ontology_vis.png "InterpretME Ontology Visualization")

The table below describes the number of mapping rules per class. You can find the mappings in `InterpretME/mappings` or query them only in a [public SPARQL endpoint](https://labs.tib.eu/sdm/InterpretME-mappings/sparql).

| Class | MappingRules |
|---------------------------------------------------------|--------------|
| http://interpretme.org/vocab/SHACLValidation | 8 |
| http://interpretme.org/vocab/PrecisionRecall | 7 |
| http://interpretme.org/vocab/PredictionInterpretability | 6 |
| http://interpretme.org/vocab/FeaturesWeights | 5 |
| http://interpretme.org/vocab/TargetEntity | 5 |
| http://interpretme.org/vocab/PredictionFeatures | 5 |
| http://interpretme.org/vocab/FeatureDefinition | 4 |
| http://www.w3.org/ns/mls#Implementation | 4 |
| http://www.w3.org/ns/mls#Run | 3 |
| http://www.w3.org/ns/mls#ModelEvaluation | 3 |
| http://www.w3.org/ns/mls#HyperParameterSetting | 3 |
| http://interpretme.org/vocab/TestedTargetEntity | 3 |
| http://interpretme.org/vocab/PredictionClasses | 2 |
| http://interpretme.org/vocab/SamplingStrategy | 2 |
| http://interpretme.org/vocab/CrossValidation | 2 |
| http://interpretme.org/vocab/Endpoint | 2 |
| http://interpretme.org/vocab/ImportantFeature | 2 |

## Experiment Results
We were running experiments with InterpretME over an extended version of the French Royalty KG [1] (see `example/data`).
The task was to predict whether a person in the dataset has a spouse.
We perform under-sampling for this experiment to balance the two classes.

![DT Result](https://raw.githubusercontent.com/SDM-TIB/InterpretME/main/images/DT_final_results.png "DT Result")

The above figure shows the decision tree for the predictive task over the data.

![DT with Constraint Validation](https://raw.githubusercontent.com/SDM-TIB/InterpretME/main/images/constraints_validation_dtree.png "DT with Constraint Validation")

Since InterpretME uses SHACL constraints to validate the model, we can also include the validation results in the visualization.
In this case, the target entities fulfilled all the constraints or the constraints did not apply for the classification.

![Random Forest Feature Importance](https://raw.githubusercontent.com/SDM-TIB/InterpretME/main/images/Random_Forest_Feature_Importance.png "Random Forest Feature Importance")

The above figure shows the list of relevant features in random forest; most important on top, following features with decreasing importance.

![Target Entity Degree Distribution](https://raw.githubusercontent.com/SDM-TIB/InterpretME/main/images/DegreeDistribution.png "Target Entity Degree Distribution")

The average number of neighbours in the original KG was 11.39 (std 5.06).
With the metadata traced by InterpretME, the number increased to 27.19 (std 6.13).
The increase in the average number of neighbours shows that InterpretME enhances the interpretability of the target entities.
The original KG is available as a [public SPARQL endpoint](https://labs.tib.eu/sdm/InterpretME-og/sparql).
The original data enhanced with the metadata traced by InterpretME is also publicly available as a [SPARQL endpoint](https://labs.tib.eu/sdm/InterpretME-wog/sparql).

**References**

[1] Marco Ribeiro, Sameer Singh, and Carlos Guestrin. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In: *Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16)*. ACM. 2016. DOI: [10.1145/2939672.2939778](https://doi.org/10.1145/2939672.2939778)

## Running InterpretME
### Building InterpretME from Source
Clone the repository and chang into the repository directory, you can then build the Docker image:
```bash
docker build . -t sdmtib/interpretme:latest
```

Follow the instructions in the `example` directory for further information on how to proceed.

### Using existing Resources
If you are not interested in building InterpretME from source, you can simply follow the instructions in the `example` directory.
All steps necessary to run the pipeline, upload the data to a SPARQL endpoint, and query the InterpretME KG are described there.
Additionally, there is an iPython notebook in the `example` folder that demonstrates the use of the InterpretME library.

## License
This work is licensed under the MIT license.

## Authors
InterpretME has been developed by members of the Scientific Data Management Group at TIB, as an ongoing research effort.
The development is co-ordinated and supervised by Maria-Esther Vidal.
We strongly encourage you to report any issues you have with InterpretME.
Please, use the GitHub issue tracker to do so.
InterpretME has been implemented in joint work by Yashrajsinh Chudasama, Disha Purohit, Julian Gercke, and Philipp D. Rohde.