https://github.com/tharunkumarreddypolu/phishdetectpro
Detect phishing Attacks with the power of Artificial Intelligence and Deep Learning
https://github.com/tharunkumarreddypolu/phishdetectpro
aritificalintelligence deeplearning fastapi heroku neural-network python
Last synced: about 1 month ago
JSON representation
Detect phishing Attacks with the power of Artificial Intelligence and Deep Learning
- Host: GitHub
- URL: https://github.com/tharunkumarreddypolu/phishdetectpro
- Owner: TharunKumarReddyPolu
- Created: 2023-11-25T08:04:36.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-15T04:52:49.000Z (over 2 years ago)
- Last Synced: 2025-04-04T08:45:33.655Z (about 1 year ago)
- Topics: aritificalintelligence, deeplearning, fastapi, heroku, neural-network, python
- Language: Jupyter Notebook
- Homepage:
- Size: 19.2 MB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# Phish Detect Pro
## Objective
The problem we aim to investigate is the detection of phishing websites using deep learning techniques. Phishing websites are a significant cybersecurity threat, and detecting them accurately is crucial to protect users from online fraud and identity theft. Phishing attacks continue to evolve, making it challenging to detect them using traditional methods. Deep learning offers a promising approach by allowing us to leverage patterns and features in website data to identify fraudulent websites more effectively. This project is interesting because it combines cybersecurity, machine learning, and real-world applicability to
enhance online security.
## Data Collection
To Train our deep learning model, we need a collection of legitimate and Phishing URLs.
Phishing URLs Data Collection: We use an popular opensource site called PhishTank which provides a huge collection of phishing URLs in multiple formats like CSV, XML, JSON, PHP and which gets periodically updated. Download the data file using the link: https://www.phishtank.com/developer_info.php
Legitimate URLs Data Collection: We use an another popular site called Kaggle, from which we take a dataset which is balanced and has 50% phishing and 50% legitimate URLs. Download the data file using the link: https://www.kaggle.com/datasets/shashwatwork/web-page-phishing-detection-dataset and we also used the dataset using the refernce paper P. Mowar and M. Jain, "Fishing out the Phishing Websites," 2021 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2021, pp. 1-6, doi: 10.1109/CyberSA52016.2021.9478237. Download the data file using the link: https://zenodo.org/records/5807622#.Ycsbzy0RpQJ
## Feature Enginnering
We extracted few of the domain based features, address bar features, HTML & Javascript based features for the URLs in the datasets. This data is further split for training and testing.
Based on the document, 'Phishing Website Features.docx' in this repository, the values of each feature were converted to -1 for legitimate site and 1 for phishing site. The respective feature extraction process are in 'Feature_Extraction.py' file of this repository.
This new dataset is available in 'url_features.csv' of this repository
To understand the relationships and the correlation of the data, We have implemented the visualisations using sklearn and matplotlib libraries in Python. These visualisations are available in 'Phishing Website Detection Deep Learning.ipynb' file of this repository.
## Model Development
The Deep Learning algorithms used for this analysis are
- Support Vector Machines
- Multilayer Perceptrons
- Auto encoder Neural Network
- Convolutional Neural Network (CNN)
- Recurrent Neural Network (RNN)
These models were trained and tested on the feature extracted dataset and evaluations were done to identify the model with high performance. Out of the above methods, Multi Layer Perceptrons(MLP) algorithm had a good accuracy and fast testing time compared to the other algorithms.
The entire model training and analysis code for this project is available in 'Phishing Website Detection Deep Learning.ipynb' file of this repository.
## Results
After making adjustments and improvements, we settled on using a Multilayer Perceptrons classifier as the final model, achieving an impressive accuracy of 99.2%. This finalized model has been saved using Python's pickle module and is accessible as 'phish_classifier.pkl' within this repository.
## Future Work
Looking ahead, there are exciting possibilities for the saved model. It could be further developed into a browser extension or integrated as a plugin with internet security providers. This enhancement would enable the model to proactively alert users, helping them steer clear of potentially harmful phishing sites by accurately identifying them.
## Required Installations
### Softwares:
Python 3 and above, Docker, Heroku CLI
### Python packages:
sklearn, numpy, pandas, pickle, matplotlib, python-whois, tensorflow, BeautifulSoup4, googlesearch