{"id":19585516,"url":"https://github.com/safe-graph/dgfraud","last_synced_at":"2025-04-04T17:08:00.159Z","repository":{"id":39533312,"uuid":"223415751","full_name":"safe-graph/DGFraud","owner":"safe-graph","description":"A Deep Graph-based Toolbox for Fraud Detection","archived":false,"fork":false,"pushed_at":"2022-04-20T21:39:08.000Z","size":84333,"stargazers_count":718,"open_issues_count":1,"forks_count":160,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-03-28T16:08:08.958Z","etag":null,"topics":["anomaly-detection","datamining","datascience","dblp-dataset","financial-engineering","fraud-detection","fraud-prevention","graph","graph-algorithms","graph-convolutional-networks","graph-neural-networks","graphneuralnetwork","machine-learning","opensource","outlier-detection","security","security-tools","spamdetection","toolkit","yelp-dataset"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/safe-graph.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-11-22T14:02:36.000Z","updated_at":"2025-03-26T10:12:47.000Z","dependencies_parsed_at":"2022-08-30T04:00:45.442Z","dependency_job_id":null,"html_url":"https://github.com/safe-graph/DGFraud","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/safe-graph%2FDGFraud","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/safe-graph%2FDGFraud/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/safe-graph%2FDGFraud/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/safe-graph%2FDGFraud/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/safe-graph","download_url":"https://codeload.github.com/safe-graph/DGFraud/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247217183,"owners_count":20903009,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anomaly-detection","datamining","datascience","dblp-dataset","financial-engineering","fraud-detection","fraud-prevention","graph","graph-algorithms","graph-convolutional-networks","graph-neural-networks","graphneuralnetwork","machine-learning","opensource","outlier-detection","security","security-tools","spamdetection","toolkit","yelp-dataset"],"created_at":"2024-11-11T07:54:58.635Z","updated_at":"2025-04-04T17:08:00.141Z","avatar_url":"https://github.com/safe-graph.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n    \u003cbr\u003e\n    \u003ca href=\"https://image.flaticon.com/icons/svg/1671/1671517.svg\"\u003e\n        \u003cimg src=\"https://github.com/safe-graph/DGFraud/blob/master/DGFraud_logo.png\" width=\"400\"/\u003e\n    \u003c/a\u003e\n    \u003cbr\u003e\n\u003cp\u003e\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://travis-ci.org/github/safe-graph/DGFraud\"\u003e\n        \u003cimg alt=\"PRs Welcome\" src=\"https://travis-ci.org/safe-graph/DGFraud.svg?branch=master\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/safe-graph/DGFraud/blob/master/LICENSE\"\u003e\n        \u003cimg alt=\"GitHub\" src=\"https://img.shields.io/github/license/safe-graph/DGFraud\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/safe-graph/DGFraud/pulls\"\u003e\n        \u003cimg alt=\"GitHub release\" src=\"https://img.shields.io/github/v/release/safe-graph/DGFraud?include_prereleases\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/safe-graph/DGFraud/archive/master.zip\"\u003e\n        \u003cimg alt=\"PRs\" src=\"https://img.shields.io/badge/PRs-welcome-brightgreen.svg\"\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\n\u003ch3 align=\"center\"\u003e\n\u003cp\u003eA Deep Graph-based Toolbox for Fraud Detection\n\u003c/h3\u003e\n\n**Introduction** \n\n**May 2021 Update:** The DGFraud has upgraded to TensorFlow 2.0! Please check out [DGFraud-TF2](https://github.com/safe-graph/DGFraud-TF2)\n\n**DGFraud** is a Graph Neural Network (GNN) based toolbox for fraud detection. It integrates the implementation \u0026 comparison of state-of-the-art GNN-based fraud detection models. The introduction of implemented models can be found [here](#implemented-models). \u003c!-- (Add introduction blogs links). --\u003e\n\nWe welcome contributions on adding new fraud detectors and extending the features of the toolbox. Some of the planned features are listed in [TODO list](#todo-list). \n\nIf you use the toolbox in your project, please cite one of the two papers below and the [algorithms](#implemented-models) you used :\n\nCIKM'20 ([PDF](https://arxiv.org/pdf/2008.08692.pdf))\n```bibtex\n@inproceedings{dou2020enhancing,\n  title={Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters},\n  author={Dou, Yingtong and Liu, Zhiwei and Sun, Li and Deng, Yutong and Peng, Hao and Yu, Philip S},\n  booktitle={Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM'20)},\n  year={2020}\n}\n```\nSIGIR'20 ([PDF](https://arxiv.org/pdf/2005.00625.pdf))\n```bibtex\n@inproceedings{liu2020alleviating,\n  title={Alleviating the Inconsistency Problem of Applying Graph Neural Network to Fraud Detection},\n  author={Liu, Zhiwei and Dou, Yingtong and Yu, Philip S. and Deng, Yutong and Peng, Hao},\n  booktitle={Proceedings of the 43nd International ACM SIGIR Conference on Research and Development in Information Retrieval},\n  year={2020}\n}\n```\n\n**Useful Resources**\n- [PyGOD: A Python Library for Graph Outlier Detection (Anomaly Detection)](https://github.com/pygod-team/pygod)\n- [UGFraud: An Unsupervised Graph-based Toolbox for Fraud Detection](https://github.com/safe-graph/UGFraud)\n- [Graph-based Fraud Detection Paper List](https://github.com/safe-graph/graph-fraud-detection-papers) \n- [Awesome Fraud Detection Papers](https://github.com/benedekrozemberczki/awesome-fraud-detection-papers)\n- [Attack and Defense Papers on Graph Data](https://github.com/safe-graph/graph-adversarial-learning-literature)\n- [PyOD: A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)](https://github.com/yzhao062/pyod)\n- [PyODD: An End-to-end Outlier Detection System](https://github.com/datamllab/pyodds)\n- [DGL: Deep Graph Library](https://github.com/dmlc/dgl)\n- [Outlier Detection DataSets (ODDS)](http://odds.cs.stonybrook.edu/)\n\n**Table of Contents**\n- [Installation](#installation)\n- [Datasets](#datasets)\n- [User Guide](#user-guide)\n- [Implemented Models](#implemented-models)\n- [Model Comparison](#model-comparison)\n- [TODO List](#todo-list)\n- [How to Contribute](#how-to-contribute)\n\n\n## Installation\n```bash\ngit clone https://github.com/safe-graph/DGFraud.git\ncd DGFraud\npython setup.py install\n```\n### Requirements\n```bash\n* python 3.6, 3.7\n* tensorflow\u003e=1.14.0,\u003c2.0\n* numpy\u003e=1.16.4\n* scipy\u003e=1.2.0\n* networkx\u003c=1.11\n```\n## Datasets\n\n### DBLP\nWe uses the pre-processed DBLP dataset from [Jhy1993/HAN](https://github.com/Jhy1993/HAN)\nYou can run the FdGars, Player2Vec, GeniePath and GEM based on the DBLP dataset.\nUnzip the archive before using the dataset:\n```bash\ncd dataset\nunzip DBLP4057_GAT_with_idx_tra200_val_800.zip\n```\n\n### Example dataset\nWe implement example graphs for SemiGNN, GAS and GEM in `data_loader.py`. Because those models require unique graph structures or node types, which cannot be found in opensource datasets.\n\n\n### Yelp dataset\nFor [GraphConsis](https://arxiv.org/abs/2005.00625), we preprocessed [Yelp Spam Review Dataset](http://odds.cs.stonybrook.edu/yelpchi-dataset/) with reviews as nodes and three relations as edges.\n\nThe dataset with `.mat` format is located at `/dataset/YelpChi.zip`. The `.mat` file includes:\n- `net_rur, net_rtr, net_rsr`: three sparse matrices representing three homo-graphs defined in [GraphConsis](https://arxiv.org/abs/2005.00625) paper;\n- `features`: a sparse matrix of 32-dimension handcrafted features;\n- `label`: a numpy array with the ground truth of nodes. `1` represents spam and `0` represents benign.\n\nThe YelpChi data preprocessing details can be found in our [CIKM'20](https://arxiv.org/pdf/2008.08692.pdf) paper.\nTo get the complete metadata of the Yelp dataset, please email to [ytongdou@gmail.com](mailto:ytongdou@gmail.com) for inquiry.\n\n\n## User Guide\n\n### Running the example code\nYou can find the implemented models in `algorithms` directory. For example, you can run Player2Vec using:\n```bash\npython Player2Vec_main.py \n```\nYou can specify parameters for models when running the code.\n\n### Running on your datasets\nHave a look at the load_data_dblp() function in utils/utils.py for an example.\n\nIn order to use your own data, you have to provide:\n* adjacency matrices or adjlists (for GAS);\n* a feature matrix\n* a label matrix\nthen split feature matrix and label matrix into testing data and training data.\n\nYou can specify a dataset as follows:\n```bash\npython xx_main.py --dataset your_dataset \n```\nor by editing xx_main.py\n\n### The structure of code\nThe repository is organized as follows:\n- `algorithms/` contains the implemented models and the corresponding example code;\n- `base_models/` contains the basic models (GCN);\n- `dataset/` contains the necessary dataset files;\n- `utils/` contains:\n    * loading and splitting the data (`data_loader.py`);\n    * contains various utilities (`utils.py`).\n\n\n## Implemented Models\n\n| Model  | Paper  | Venue  | Reference  |\n|-------|--------|--------|--------|\n| **SemiGNN** | [A Semi-supervised Graph Attentive Network for Financial Fraud Detection](https://arxiv.org/pdf/2003.01171)  | ICDM 2019  | [BibTex](https://github.com/safe-graph/DGFraud/blob/master/reference/semignn.txt) |\n| **Player2Vec** | [Key Player Identification in Underground Forums over Attributed Heterogeneous Information Network Embedding Framework](http://mason.gmu.edu/~lzhao9/materials/papers/lp0110-zhangA.pdf)  | CIKM 2019  | [BibTex](https://github.com/safe-graph/DGFraud/blob/master/reference/player2vec.txt)|\n| **GAS** | [Spam Review Detection with Graph Convolutional Networks](https://arxiv.org/abs/1908.10679)  | CIKM 2019 | [BibTex](https://github.com/safe-graph/DGFraud/blob/master/reference/gas.txt) |\n| **FdGars** | [FdGars: Fraudster Detection via Graph Convolutional Networks in Online App Review System](https://dl.acm.org/citation.cfm?id=3316586)  | WWW 2019 | [BibTex](https://github.com/safe-graph/DGFraud/blob/master/reference/fdgars.txt) |\n| **GeniePath** | [GeniePath: Graph Neural Networks with Adaptive Receptive Paths](https://arxiv.org/abs/1802.00910)  | AAAI 2019 | [BibTex](https://github.com/safe-graph/DGFraud/blob/master/reference/geniepath.txt)  |\n| **GEM** | [Heterogeneous Graph Neural Networks for Malicious Account Detection](https://arxiv.org/pdf/2002.12307.pdf)  | CIKM 2018 |[BibTex](https://github.com/safe-graph/DGFraud/blob/master/reference/gem.txt) |\n| **GraphSAGE** | [Inductive Representation Learning on Large Graphs](https://arxiv.org/pdf/1706.02216.pdf)  | NIPS 2017  | [BibTex](https://github.com/safe-graph/DGFraud/blob/master/reference/graphsage.txt) |\n| **GraphConsis** | [Alleviating the Inconsistency Problem of Applying Graph Neural Network to Fraud Detection](https://arxiv.org/pdf/2005.00625.pdf)  | SIGIR 2020  | [BibTex](https://github.com/safe-graph/DGFraud/blob/master/reference/graphconsis.txt) |\n| **HACUD** | [Cash-Out User Detection Based on Attributed Heterogeneous Information Network with a Hierarchical Attention Mechanism](https://aaai.org/ojs/index.php/AAAI/article/view/3884)  | AAAI 2019 |  [BibTex](https://github.com/safe-graph/DGFraud/blob/master/reference/hacud.txt) |\n\n\n## Model Comparison\n| Model  | Application  | Graph Type  | Base Model  |\n|-------|--------|--------|--------|\n| **SemiGNN** | Financial Fraud  | Heterogeneous   | GAT, LINE, DeepWalk |\n| **Player2Vec** | Cyber Criminal  | Heterogeneous | GAT, GCN|\n| **GAS** | Opinion Fraud  | Heterogeneous | GCN, GAT |\n| **FdGars** |  Opinion Fraud | Homogeneous | GCN |\n| **GeniePath** | Financial Fraud | Homogeneous | GAT  |\n| **GEM** | Financial Fraud  | Heterogeneous |GCN |\n| **GraphSAGE** | Opinion Fraud  | Homogeneous   | GraphSAGE |\n| **GraphConsis** | Opinion Fraud  | Heterogeneous   | GraphSAGE |\n| **HACUD** | Financial Fraud | Heterogeneous | GAT |\n\n\n## TODO List\n- Implementing mini-batch training\n- The log loss for GEM model\n- Time-based sampling for GEM\n- Add sampling methods\n- Benchmarking SOTA models\n- Scalable implementation\n- Pytorch implementation\n\n## How to Contribute\nYou are welcomed to contribute to this open-source toolbox. The detailed instructions will be released soon. Currently, you can create issues or email to [bdscsafegraph@gmail.com](mailto:bdscsafegraph@gmail.com) for inquiry.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsafe-graph%2Fdgfraud","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsafe-graph%2Fdgfraud","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsafe-graph%2Fdgfraud/lists"}