https://github.com/narayan954/phishing-url-detection
Predict the safety of your URL from Phishing attacks
https://github.com/narayan954/phishing-url-detection
flask machine-learning python
Last synced: 2 months ago
JSON representation
Predict the safety of your URL from Phishing attacks
- Host: GitHub
- URL: https://github.com/narayan954/phishing-url-detection
- Owner: narayan954
- Created: 2023-12-11T13:42:36.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-12T06:05:52.000Z (over 2 years ago)
- Last Synced: 2025-01-18T10:17:28.562Z (over 1 year ago)
- Topics: flask, machine-learning, python
- Language: Jupyter Notebook
- Homepage: https://phishing-url-detection-ugxg.onrender.com/
- Size: 2.43 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Phishing URL Detection


## Table of Content
- [Introduction](#introduction)
- [Installation](#installation)
- [Directory Tree](#directory-tree)
- [Result](#result)
- [Conclusion](#conclusion)
## Introduction
The Internet has become an indispensable part of our life, However, It also has provided opportunities to anonymously perform malicious activities like Phishing. Phishers try to deceive their victims by social engineering or creating mockup websites to steal information such as account ID, username, password from individuals and organizations. Although many methods have been proposed to detect phishing websites, Phishers have evolved their methods to escape from these detection methods. One of the most successful methods for detecting these malicious activities is Machine Learning. This is because most Phishing attacks have some common characteristics which can be identified by machine learning methods. To see project click [here](app.py).
## Installation
The Code is written in Python 3.6.10. If you don't have Python installed you can find it [here](https://www.python.org/downloads/). If you are using a lower version of Python you can upgrade using the pip package, ensuring you have the latest version of pip. To install the required packages and libraries, run this command in the project directory after [cloning](https://www.howtogeek.com/451360/how-to-clone-a-github-repository/) the repository:
### First run the virtual environment
```sh
python -m venv env
source env/bin/activate # Linux
.\env\Scripts\activate # Windows
```
### Install the required modules
```sh
pip install -r requirements.txt
```
## Run the app
```sh
waitress-serve --listen=127.0.0.1:5000 app:app
hupper -m waitress --listen=127.0.0.1:5000 app:app # With hotreloading
```
## Directory Tree
```sh
├── pickle
│ ├── model.pkl
├── static
│ ├── styles.css
├── templates
│ ├── index.html
├── Phishing URL Detection.ipynb
├── Procfile
├── README.md
├── app.py
├── feature.py
├── phishing.csv
├── requirements.txt
```
## Technologies Used

[
](https://numpy.org/doc/) [
](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)
[
](https://matplotlib.org/)
[
](https://scikit-learn.org/stable/)
[
](https://flask.palletsprojects.com/en/2.0.x/)
## Result
Accuracy of various model used for URL detection
| | ML Model | Accuracy | f1_score | Recall | Precision |
| --- | ---------------------------- | -------- | -------- | ------ | --------- |
| 0 | Gradient Boosting Classifier | 0.974 | 0.977 | 0.994 | 0.986 |
| 1 | CatBoost Classifier | 0.972 | 0.975 | 0.994 | 0.989 |
| 2 | XGBoost Classifier | 0.969 | 0.972 | 0.995 | 0.988 |
| 3 | Multi-layer Perceptron | 0.969 | 0.973 | 0.995 | 0.981 |
| 4 | Random Forest | 0.967 | 0.971 | 0.993 | 0.990 |
| 5 | Support Vector Machine | 0.964 | 0.968 | 0.980 | 0.965 |
| 6 | Decision Tree | 0.960 | 0.964 | 0.991 | 0.993 |
| 7 | K-Nearest Neighbors | 0.956 | 0.961 | 0.991 | 0.989 |
| 8 | Logistic Regression | 0.934 | 0.941 | 0.943 | 0.927 |
| 9 | Naive Bayes Classifier | 0.605 | 0.454 | 0.292 | 0.997 |
Feature importance for Phishing URL Detection

## Conclusion
1. The final take away form this project is to explore various machine learning models, perform Exploratory Data Analysis on phishing dataset and understanding their features.
2. Creating this notebook helped me to learn a lot about the features affecting the models to detect whether URL is safe or not, also I came to know how to tuned model and how they affect the model performance.
3. The final conclusion on the Phishing dataset is that the some feature like "HTTPS", "AnchorURL", "WebsiteTraffic" have more importance to classify URL is phishing URL or not.
4. Gradient Boosting Classifier currectly classify URL upto 97.4% respective classes and hence reduces the chance of malicious attachments.