https://github.com/tristanbilot/phishgnn

Phishing detection using GNNs (SECRYPT'22)
https://github.com/tristanbilot/phishgnn

gnn graphs phishing-detection representation-learning

Last synced: 9 months ago
JSON representation

Phishing detection using GNNs (SECRYPT'22)

Host: GitHub
URL: https://github.com/tristanbilot/phishgnn
Owner: TristanBilot
License: mit
Created: 2022-03-23T10:54:27.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2025-06-06T07:46:06.000Z (about 1 year ago)
Last Synced: 2025-09-09T14:12:14.249Z (10 months ago)
Topics: gnn, graphs, phishing-detection, representation-learning
Language: Rust
Homepage:
Size: 33.6 MB
Stars: 14
Watchers: 3
Forks: 6
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# PhishGNN

Code for the paper: [PhishGNN: A Phishing Website Detection Framework using Graph Neural Networks](https://hal.science/hal-04401167v1/file/PhishGNN_A_Phishing_Website_Detection_Framework_using_Graph_Neural_Networks.pdf).

phishing_graph

## Installation

### Clone the repo

```
git clone https://github.com/TristanBilot/phishGNN.git
cd phishGNN
```

### Install dependencies

```python
python3 -m venv venv
. venv/bin/activate
pip install wheel
pip install -r requirements.txt
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.12.0+cpu.html # for cpu
```

### unzip the dataset
```shell
./install_dataset.sh
```

## Dataset & crawler

The dataset can be downloaded in PyG format and new features can be extracted from URLs using the crawler.
A full guide for both tasks can be found here.

## Training

During training, the files located in data/training/processed will be used by default. The raw dataset is composed of urls mapped to around 30 features, including a list of references (href, form, iframe) to other pages, which also have their own features and their list of references.

```
python phishGNN/training.py
```

## Visualize node embeddings

During training, it is possible to generate the embeddings just after passing through the Graph Convolutional layers. Just run the training with the following option:

```
python phishGNN/training.py --plot-embeddings
```

## Visualize the graphs
A tool has been developed in order to visualize graphically the internal structure of web pages from the dataset along with their characteristics such as the number of nodes/edges and whether the page is phishing or benign.

To visualize these data, first follow the instructions in the installation part, run the `visualization` script and open the file `visualization/visualization.html`.

```bash
python visualization.py
```

Screenshot 2022-03-30 at 12 39 01

## Citation

If you use this code, please cite the following paper.

```
@inproceedings{bilot2022phishgnn,
title={Phishgnn: a phishing website detection framework using graph neural networks},
author={Bilot, Tristan and Geis, Gr{\'e}goire and Hammi, Badis},
booktitle={19th International Conference on Security and Cryptography},
pages={428--435},
year={2022},
organization={SCITEPRESS-Science and Technology Publications}
}
```

## License
MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tristanbilot/phishgnn

Awesome Lists containing this project

README