https://github.com/shobrook/outgraph
Outlier detection tool for graph datasets
https://github.com/shobrook/outgraph
chi-squared graph graph-algorithms mahalanobis-distance outlier-detection
Last synced: 2 months ago
JSON representation
Outlier detection tool for graph datasets
- Host: GitHub
- URL: https://github.com/shobrook/outgraph
- Owner: shobrook
- License: mit
- Created: 2022-07-17T21:28:19.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2022-07-18T16:48:25.000Z (about 3 years ago)
- Last Synced: 2025-05-10T12:08:46.264Z (5 months ago)
- Topics: chi-squared, graph, graph-algorithms, mahalanobis-distance, outlier-detection
- Language: Python
- Homepage:
- Size: 14.6 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# outgraph
`outgraph` is a simple outlier detection tool for graph datasets. Given a list of graphs, it uses [Mahalanobis distance](https://en.wikipedia.org/wiki/Mahalanobis_distance) detect which graphs are outliers based on either their topology or node attributes.
> Note: `outgraph` only works for datasets where each graph has an equal number of nodes.
## Installation
You can install `outgraph` with `pip`:
```bash
$ pip install outgraph
```## How it Works
Unlike most approaches to graph outlier detection, `outgraph` does not use machine learning. Instead, each graph is converted into a vector representation using one of three available methods:
1. Averaging the node feature/attribute vectors
2. Flattening the adjacency matrix
3. A concatenation of 1 and 2Then, the [Mahalanobis distance](https://en.wikipedia.org/wiki/Mahalanobis_distance) between each vector and the distribution of vectors is calculated. Lastly, a [Chi-Squared distribution](https://en.wikipedia.org/wiki/Chi-squared_distribution) is used to model the distances and identify those outside a cutoff threshold (e.g. p < 0.05).
This approach is based off [this article.](https://towardsdatascience.com/multivariate-outlier-detection-in-python-e946cfc843b3)
## Usage
Each graph in your dataset needs to be an instance of `outgraph.Graph`. This object has two parameters, `node_attrs` and `adjacency_matrix` –– both numpy arrays where the indices correspond to nodes. Example:
```python
import numpy as np
from outgraph import Graphnode_attrs = np.array([[-1], [0], [1]])
adj_matrix = np.array([[1, 1, 0],
[1, 1, 1],
[0, 1, 1]])
graph = Graph(node_attrs, adj_matrix)
```
![]()
Once you have a list of `Graph` objects, simply submit them to `outgraph.detect_outliers`:
```python
from outgraph import Graph, detect_outliersgraphs = [Graph(), ...]
outliers, indices = detect_outliers(graphs, method=1, p_value=0.05)
```Notice the `method` and `p_value` parameters. The `method` parameter is an integer between 1 and 3 that corresponds to one of the three graph vectorization methods described in the [How it Works](#how-it-works) section. `p_value` is the outlier cutoff threshold.