https://github.com/YingtongDou/CARE-GNN

Code for CIKM 2020 paper Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters
https://github.com/YingtongDou/CARE-GNN

datamining deep-learning fraud-detection fraud-prevention graphneuralnetwork machine-learning reinforcement-learning security

Last synced: 6 months ago
JSON representation

Code for CIKM 2020 paper Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters

Host: GitHub
URL: https://github.com/YingtongDou/CARE-GNN
Owner: YingtongDou
License: apache-2.0
Created: 2020-08-05T20:05:39.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2022-10-11T11:15:56.000Z (about 3 years ago)
Last Synced: 2025-04-09T14:15:48.620Z (7 months ago)
Topics: datamining, deep-learning, fraud-detection, fraud-prevention, graphneuralnetwork, machine-learning, reinforcement-learning, security
Language: Python
Homepage: https://arxiv.org/abs/2008.08692
Size: 38.8 MB
Stars: 264
Watchers: 4
Forks: 55
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-fraud-detection-papers - [Code

README

          # CARE-GNN

A PyTorch implementation for the [CIKM 2020](https://www.cikm2020.org/) paper below:  

**Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters**.  

[Yingtong Dou](http://ytongdou.com/), [Zhiwei Liu](https://sites.google.com/view/zhiwei-jim), [Li Sun](https://www.researchgate.net/profile/Li_Sun118), Yutong Deng, [Hao Peng](https://penghao-buaa.github.io/), [Philip S. Yu](https://www.cs.uic.edu/PSYu/).  

\[[Paper](https://arxiv.org/pdf/2008.08692.pdf)\]\[[Toolbox](https://github.com/safe-graph/DGFraud)\]\[[DGL Example](https://github.com/dmlc/dgl/tree/master/examples/pytorch/caregnn)\]\[[Benchmark](https://paperswithcode.com/paper/enhancing-graph-neural-network-based-fraud)\]

## Bug Fixes and Update (06/2021)

### Similarity score

The feature and label similarity scores presented in Table 2 of the paper are incorrect. The updated equations for calculating two similarity scores are shown below:



    


    

        

    

    




The code for calculating the similarity scores is in [simi_comp.py](https://github.com/YingtongDou/CARE-GNN/blob/master/simi_comp.py).

The updated similarity scores for the two datasets are shown below. Note that we only compute the similarity scores for positive nodes to demonstrate the camouflage of fraudsters (positive nodes).

| YelpChi  | rur  | rtr  | rsr  | homo  |

|-------|--------|--------|--------|--------|

| Avg. Feature Similarity | 0.991   |   0.988    |  0.988  | 0.988  |

| Avg. Label Similarity |  0.909  |   0.176   |  0.186  | 0.184  |

| Amazon  | upu  | usu  | uvu  | homo  |

|-------|--------|--------|--------|--------|

| Avg. Feature Similarity | 0.711   |   0.687    |  0.697  | 0.687  |

| Avg. Label Similarity |  0.167  |   0.056   |  0.053  | 0.072  |

### Relation weight in Figure 3

According to this [issue](https://github.com/YingtongDou/CARE-GNN/issues/5), the weighted aggregation of CARE-Weight (a variant of CARE-GNN) has an error. After fixing it, the relation weight will not converge to the same value. Thus, the relation weight subfigure in Figure 3 and its associated conclusion are wrong.

### Extended version CARE-GNN

Please check out [RioGNN](https://github.com/safe-graph/RioGNN), a GNN model extended based on CARE-GNN with more reinforcement learning modules integrated. We are actively developing an efficient multi-layer version of CARE-GNN. Stay tuned.

## Overview



    


    

        

    

    




**CA**mouflage-**RE**sistant **G**raph **N**eural **N**etwork **(CARE-GNN)** is a GNN-based fraud detector based on a multi-relation graph equipped with three modules that enhance its performance against camouflaged fraudsters.

Three enhancement modules are:

- **A label-aware similarity measure** which measures the similarity scores between a center node and its neighboring nodes;

- **A similarity-aware neighbor selector** which leverages top-p sampling and reinforcement learning to select the optimal amount of neighbors under each relation;

- **A relation-aware neighbor aggregator** which directly aggregates information from different relations using the optimal neighbor selection thresholds as weights.

CARE-GNN has following advantages:

- **Adaptability.** CARE-GNN adaptively selects best neighbors

for aggregation given arbitrary multi-relation graph;

- **High-efficiency.** CARE-GNN has a high computational efficiency without attention and deep reinforcement learning;

- **Flexibility.** Many other neural modules and external knowledge can be plugged into the CARE-GNN;

We have integrated more than **eight** GNN-based fraud detectors as a TensorFlow [toolbox](https://github.com/safe-graph/DGFraud). 

## Setup

You can download the project and install the required packages using the following commands:

```bash

git clone https://github.com/YingtongDou/CARE-GNN.git

cd CARE-GNN

pip3 install -r requirements.txt

```

To run the code, you need to have at least **Python 3.6** or later versions. 

## Running

1. In CARE-GNN directory, run `unzip /data/Amazon.zip` and `unzip /data/YelpChi.zip` to unzip the datasets; 

2. Run `python data_process.py` to generate adjacency lists used by CARE-GNN;

3. Run `python train.py` to run CARE-GNN with default settings.

For other dataset and parameter settings, please refer to the arg parser in `train.py`. Our model supports both CPU and GPU mode.

## Running on your datasets

To run CARE-GNN on your datasets, you need to prepare the following data:

- Multiple-single relation graphs with the same nodes where each graph is stored in `scipy.sparse` matrix format, you can use `sparse_to_adjlist()` in `utils.py` to transfer the sparse matrix into adjacency lists used by CARE-GNN;

- A numpy array with node labels. Currently, CARE-GNN only supports binary classification;

- A node feature matrix stored in `scipy.sparse` matrix format. 

### Repo Structure

The repository is organized as follows:

- `data/`: dataset files;

- `data_process.py`: transfer sparse matrix to adjacency lists;

- `graphsage.py`: model code for vanilla [GraphSAGE](https://github.com/williamleif/graphsage-simple/) model;

- `layers.py`: CARE-GNN layers implementations;

- `model.py`: CARE-GNN model implementations;

- `train.py`: training and testing all models;

- `utils.py`: utility functions for data i/o and model evaluation.

## Citation

If you use our code, please cite the paper below:

```bibtex

@inproceedings{dou2020enhancing,

  title={Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters},

  author={Dou, Yingtong and Liu, Zhiwei and Sun, Li and Deng, Yutong and Peng, Hao and Yu, Philip S},

  booktitle={Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM'20)},

  year={2020}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/YingtongDou/CARE-GNN

Awesome Lists containing this project

README