https://github.com/safe-graph/DGFraud

A Deep Graph-based Toolbox for Fraud Detection
https://github.com/safe-graph/DGFraud

anomaly-detection datamining datascience dblp-dataset financial-engineering fraud-detection fraud-prevention graph graph-algorithms graph-convolutional-networks graph-neural-networks graphneuralnetwork machine-learning opensource outlier-detection security security-tools spamdetection toolkit yelp-dataset

Last synced: 3 months ago
JSON representation

A Deep Graph-based Toolbox for Fraud Detection

Host: GitHub
URL: https://github.com/safe-graph/DGFraud
Owner: safe-graph
License: apache-2.0
Created: 2019-11-22T14:02:36.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2022-04-20T21:39:08.000Z (about 3 years ago)
Last Synced: 2025-03-28T16:08:08.958Z (3 months ago)
Topics: anomaly-detection, datamining, datascience, dblp-dataset, financial-engineering, fraud-detection, fraud-prevention, graph, graph-algorithms, graph-convolutional-networks, graph-neural-networks, graphneuralnetwork, machine-learning, opensource, outlier-detection, security, security-tools, spamdetection, toolkit, yelp-dataset
Language: Python
Homepage:
Size: 80.4 MB
Stars: 718
Watchers: 14
Forks: 160
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

Awesome-Graph-Neural-Networks - GNN-based Fraud Detection Toolbox
awesome-fraud-detection-papers - [Code
StarryDivineSky - safe-graph/DGFraud

README

        


    


    

        

    

    






    

        

    

    

        

    

    

        

    

    

        

    





A Deep Graph-based Toolbox for Fraud Detection

**Introduction** 

**May 2021 Update:** The DGFraud has upgraded to TensorFlow 2.0! Please check out [DGFraud-TF2](https://github.com/safe-graph/DGFraud-TF2)

**DGFraud** is a Graph Neural Network (GNN) based toolbox for fraud detection. It integrates the implementation & comparison of state-of-the-art GNN-based fraud detection models. The introduction of implemented models can be found [here](#implemented-models). 

We welcome contributions on adding new fraud detectors and extending the features of the toolbox. Some of the planned features are listed in [TODO list](#todo-list). 

If you use the toolbox in your project, please cite one of the two papers below and the [algorithms](#implemented-models) you used :

CIKM'20 ([PDF](https://arxiv.org/pdf/2008.08692.pdf))

```bibtex

@inproceedings{dou2020enhancing,

  title={Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters},

  author={Dou, Yingtong and Liu, Zhiwei and Sun, Li and Deng, Yutong and Peng, Hao and Yu, Philip S},

  booktitle={Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM'20)},

  year={2020}

}

```

SIGIR'20 ([PDF](https://arxiv.org/pdf/2005.00625.pdf))

```bibtex

@inproceedings{liu2020alleviating,

  title={Alleviating the Inconsistency Problem of Applying Graph Neural Network to Fraud Detection},

  author={Liu, Zhiwei and Dou, Yingtong and Yu, Philip S. and Deng, Yutong and Peng, Hao},

  booktitle={Proceedings of the 43nd International ACM SIGIR Conference on Research and Development in Information Retrieval},

  year={2020}

}

```

**Useful Resources**

- [PyGOD: A Python Library for Graph Outlier Detection (Anomaly Detection)](https://github.com/pygod-team/pygod)

- [UGFraud: An Unsupervised Graph-based Toolbox for Fraud Detection](https://github.com/safe-graph/UGFraud)

- [Graph-based Fraud Detection Paper List](https://github.com/safe-graph/graph-fraud-detection-papers) 

- [Awesome Fraud Detection Papers](https://github.com/benedekrozemberczki/awesome-fraud-detection-papers)

- [Attack and Defense Papers on Graph Data](https://github.com/safe-graph/graph-adversarial-learning-literature)

- [PyOD: A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)](https://github.com/yzhao062/pyod)

- [PyODD: An End-to-end Outlier Detection System](https://github.com/datamllab/pyodds)

- [DGL: Deep Graph Library](https://github.com/dmlc/dgl)

- [Outlier Detection DataSets (ODDS)](http://odds.cs.stonybrook.edu/)

**Table of Contents**

- [Installation](#installation)

- [Datasets](#datasets)

- [User Guide](#user-guide)

- [Implemented Models](#implemented-models)

- [Model Comparison](#model-comparison)

- [TODO List](#todo-list)

- [How to Contribute](#how-to-contribute)

## Installation

```bash

git clone https://github.com/safe-graph/DGFraud.git

cd DGFraud

python setup.py install

```

### Requirements

```bash

* python 3.6, 3.7

* tensorflow>=1.14.0,<2.0

* numpy>=1.16.4

* scipy>=1.2.0

* networkx<=1.11

```

## Datasets

### DBLP

We uses the pre-processed DBLP dataset from [Jhy1993/HAN](https://github.com/Jhy1993/HAN)

You can run the FdGars, Player2Vec, GeniePath and GEM based on the DBLP dataset.

Unzip the archive before using the dataset:

```bash

cd dataset

unzip DBLP4057_GAT_with_idx_tra200_val_800.zip

```

### Example dataset

We implement example graphs for SemiGNN, GAS and GEM in `data_loader.py`. Because those models require unique graph structures or node types, which cannot be found in opensource datasets.

### Yelp dataset

For [GraphConsis](https://arxiv.org/abs/2005.00625), we preprocessed [Yelp Spam Review Dataset](http://odds.cs.stonybrook.edu/yelpchi-dataset/) with reviews as nodes and three relations as edges.

The dataset with `.mat` format is located at `/dataset/YelpChi.zip`. The `.mat` file includes:

- `net_rur, net_rtr, net_rsr`: three sparse matrices representing three homo-graphs defined in [GraphConsis](https://arxiv.org/abs/2005.00625) paper;

- `features`: a sparse matrix of 32-dimension handcrafted features;

- `label`: a numpy array with the ground truth of nodes. `1` represents spam and `0` represents benign.

The YelpChi data preprocessing details can be found in our [CIKM'20](https://arxiv.org/pdf/2008.08692.pdf) paper.

To get the complete metadata of the Yelp dataset, please email to [[email protected]](mailto:[email protected]) for inquiry.

## User Guide

### Running the example code

You can find the implemented models in `algorithms` directory. For example, you can run Player2Vec using:

```bash

python Player2Vec_main.py 

```

You can specify parameters for models when running the code.

### Running on your datasets

Have a look at the load_data_dblp() function in utils/utils.py for an example.

In order to use your own data, you have to provide:

* adjacency matrices or adjlists (for GAS);

* a feature matrix

* a label matrix

then split feature matrix and label matrix into testing data and training data.

You can specify a dataset as follows:

```bash

python xx_main.py --dataset your_dataset 

```

or by editing xx_main.py

### The structure of code

The repository is organized as follows:

- `algorithms/` contains the implemented models and the corresponding example code;

- `base_models/` contains the basic models (GCN);

- `dataset/` contains the necessary dataset files;

- `utils/` contains:

    * loading and splitting the data (`data_loader.py`);

    * contains various utilities (`utils.py`).

## Implemented Models

| Model  | Paper  | Venue  | Reference  |

|-------|--------|--------|--------|

| **SemiGNN** | [A Semi-supervised Graph Attentive Network for Financial Fraud Detection](https://arxiv.org/pdf/2003.01171)  | ICDM 2019  | [BibTex](https://github.com/safe-graph/DGFraud/blob/master/reference/semignn.txt) |

| **Player2Vec** | [Key Player Identification in Underground Forums over Attributed Heterogeneous Information Network Embedding Framework](http://mason.gmu.edu/~lzhao9/materials/papers/lp0110-zhangA.pdf)  | CIKM 2019  | [BibTex](https://github.com/safe-graph/DGFraud/blob/master/reference/player2vec.txt)|

| **GAS** | [Spam Review Detection with Graph Convolutional Networks](https://arxiv.org/abs/1908.10679)  | CIKM 2019 | [BibTex](https://github.com/safe-graph/DGFraud/blob/master/reference/gas.txt) |

| **FdGars** | [FdGars: Fraudster Detection via Graph Convolutional Networks in Online App Review System](https://dl.acm.org/citation.cfm?id=3316586)  | WWW 2019 | [BibTex](https://github.com/safe-graph/DGFraud/blob/master/reference/fdgars.txt) |

| **GeniePath** | [GeniePath: Graph Neural Networks with Adaptive Receptive Paths](https://arxiv.org/abs/1802.00910)  | AAAI 2019 | [BibTex](https://github.com/safe-graph/DGFraud/blob/master/reference/geniepath.txt)  |

| **GEM** | [Heterogeneous Graph Neural Networks for Malicious Account Detection](https://arxiv.org/pdf/2002.12307.pdf)  | CIKM 2018 |[BibTex](https://github.com/safe-graph/DGFraud/blob/master/reference/gem.txt) |

| **GraphSAGE** | [Inductive Representation Learning on Large Graphs](https://arxiv.org/pdf/1706.02216.pdf)  | NIPS 2017  | [BibTex](https://github.com/safe-graph/DGFraud/blob/master/reference/graphsage.txt) |

| **GraphConsis** | [Alleviating the Inconsistency Problem of Applying Graph Neural Network to Fraud Detection](https://arxiv.org/pdf/2005.00625.pdf)  | SIGIR 2020  | [BibTex](https://github.com/safe-graph/DGFraud/blob/master/reference/graphconsis.txt) |

| **HACUD** | [Cash-Out User Detection Based on Attributed Heterogeneous Information Network with a Hierarchical Attention Mechanism](https://aaai.org/ojs/index.php/AAAI/article/view/3884)  | AAAI 2019 |  [BibTex](https://github.com/safe-graph/DGFraud/blob/master/reference/hacud.txt) |

## Model Comparison

| Model  | Application  | Graph Type  | Base Model  |

|-------|--------|--------|--------|

| **SemiGNN** | Financial Fraud  | Heterogeneous   | GAT, LINE, DeepWalk |

| **Player2Vec** | Cyber Criminal  | Heterogeneous | GAT, GCN|

| **GAS** | Opinion Fraud  | Heterogeneous | GCN, GAT |

| **FdGars** |  Opinion Fraud | Homogeneous | GCN |

| **GeniePath** | Financial Fraud | Homogeneous | GAT  |

| **GEM** | Financial Fraud  | Heterogeneous |GCN |

| **GraphSAGE** | Opinion Fraud  | Homogeneous   | GraphSAGE |

| **GraphConsis** | Opinion Fraud  | Heterogeneous   | GraphSAGE |

| **HACUD** | Financial Fraud | Heterogeneous | GAT |

## TODO List

- Implementing mini-batch training

- The log loss for GEM model

- Time-based sampling for GEM

- Add sampling methods

- Benchmarking SOTA models

- Scalable implementation

- Pytorch implementation

## How to Contribute

You are welcomed to contribute to this open-source toolbox. The detailed instructions will be released soon. Currently, you can create issues or email to [[email protected]](mailto:[email protected]) for inquiry.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/safe-graph/DGFraud

Awesome Lists containing this project

README