Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/avoss84/bhad
A Python library for Bayesian Anomaly Detection
https://github.com/avoss84/bhad
anomaly-detection bayesian-inference explainability machine-learning scikit-learn unsupervised-machine-learning
Last synced: about 12 hours ago
JSON representation
A Python library for Bayesian Anomaly Detection
- Host: GitHub
- URL: https://github.com/avoss84/bhad
- Owner: AVoss84
- License: mit
- Created: 2022-10-08T11:49:39.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-28T13:56:16.000Z (3 months ago)
- Last Synced: 2024-09-29T10:48:12.040Z (about 1 month ago)
- Topics: anomaly-detection, bayesian-inference, explainability, machine-learning, scikit-learn, unsupervised-machine-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 3.93 MB
- Stars: 10
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# Bayesian Histogram-based Anomaly Detection (BHAD)
Python implementation of the BHAD algorithm as presented in [Vosseler, A. (2022): Unsupervised Insurance Fraud Prediction Based on Anomaly Detector Ensembles](https://www.researchgate.net/publication/361463552_Unsupervised_Insurance_Fraud_Prediction_Based_on_Anomaly_Detector_Ensembles) and [Vosseler, A. (2023): BHAD: Explainable anomaly detection using Bayesian histograms](https://www.researchgate.net/publication/364265660_BHAD_Explainable_anomaly_detection_using_Bayesian_histograms). The package has been presented at *PyCon DE & PyData Berlin 2023* ([watch talk here](https://www.youtube.com/watch?v=_8zfgPTD-d8&list=PLGVZCDnMOq0peDguAzds7kVmBr8avp46K&index=8)) as well as at *42nd International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering* ([MaxEnt 2023](https://www.mdpi.com/2673-9984/9/1/1)). The ***bhad* package** follows Scikit-learn's standard API for [outlier detection](https://scikit-learn.org/stable/modules/outlier_detection.html).
## Installation
```bash
pip install bhad
```## Usage
1.) Preprocess the input data: discretize continuous features and conduct Bayesian model selection (*optional*).
2.) Train the model using discrete data.
For convenience these two steps can be wrapped up via a scikit-learn pipeline (*optional*).
```python
from bhad.model import BHAD
from bhad.utils import Discretize
from sklearn.pipeline import Pipelinenum_cols = [....] # names of numeric features
cat_cols = [....] # categorical featurespipe = Pipeline(steps=[
('discrete', Discretize(nbins = None)),
('model', BHAD(contamination = 0.01, num_features = num_cols, cat_features = cat_cols))
])
```For a given dataset get binary model decisons:
```python
y_pred = pipe.fit_predict(X = dataset)
```Get *global* model explanation as well as for *individual* observations:
```python
from bhad.explainer import Explainerlocal_expl = Explainer(pipe.named_steps['model'], pipe.named_steps['discrete']).fit()
local_expl.get_explanation(nof_feat_expl = 5, append = False) # individual explanations
local_expl.global_feat_imp # global explanation
```A detailed *toy example* using synthetic data can be found [here](https://github.com/AVoss84/bhad/blob/main/src/notebooks/Toy_Example.ipynb). An example using the Titanic dataset illustrating *model explanability* with BHAD can be found [here](https://github.com/AVoss84/bhad/blob/main/src/notebooks/Titanic_Example.ipynb).