https://github.com/vimal007vimal/malicious-url-detection
Our project employs machine learning to pinpoint phishing URLs with 97.4% accuracy, leveraging HTTPS and website traffic as critical indicators. Insights into features like AnchorURL enhance cybersecurity strategies, showcasing the power of AI in combating online threats.
https://github.com/vimal007vimal/malicious-url-detection
cybersecurity https malicious-url-detection phishing python python3 xgboost-algorithm xgboost-classifier
Last synced: 9 months ago
JSON representation
Our project employs machine learning to pinpoint phishing URLs with 97.4% accuracy, leveraging HTTPS and website traffic as critical indicators. Insights into features like AnchorURL enhance cybersecurity strategies, showcasing the power of AI in combating online threats.
- Host: GitHub
- URL: https://github.com/vimal007vimal/malicious-url-detection
- Owner: Vimal007Vimal
- Created: 2024-05-11T12:03:15.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-07-09T09:45:38.000Z (over 1 year ago)
- Last Synced: 2024-07-09T11:45:40.285Z (over 1 year ago)
- Topics: cybersecurity, https, malicious-url-detection, phishing, python, python3, xgboost-algorithm, xgboost-classifier
- Language: Python
- Homepage:
- Size: 387 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Malicious URL Detection


## Installation
The Code is written in Python 3.9 If you don't have Python installed you can find it [here](https://www.python.org/downloads/). If you are using a lower version of Python you can upgrade using the pip package, ensuring you have the latest version of pip. To install the required packages and libraries, run this command in the project directory after [cloning](https://www.howtogeek.com/451360/how-to-clone-a-github-repository/) the repository:
```bash
pip install -r requirements.txt
```
## Directory Tree
```
├── static
│ ├── styles.css
├── templates
│ ├── index.html
├── README.md
├── app.py
├── feature.py
├── phishing.csv
├── requirements.txt
```
## Technologies Used
[
](https://numpy.org/doc/) [
](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)
[
](https://matplotlib.org/)
[
](https://scikit-learn.org/stable/)
[
](https://flask.palletsprojects.com/en/2.0.x/)
## Result
Accuracy of various model used for URL detection
||ML Model| Accuracy| f1_score| Recall| Precision|
|---|---|---|---|---|---|
0| Gradient Boosting Classifier| 0.974| 0.977| 0.994| 0.986|
1| CatBoost Classifier| 0.972| 0.975| 0.994| 0.989|
2| XGBoost Classifier| 0.969| 0.973| 0.993| 0.984|
3| Multi-layer Perceptron| 0.969| 0.973| 0.995| 0.981|
4| Random Forest| 0.967| 0.971| 0.993| 0.990|
5| Support Vector Machine| 0.964| 0.968| 0.980| 0.965|
6| Decision Tree| 0.960| 0.964| 0.991| 0.993|
7| K-Nearest Neighbors| 0.956| 0.961| 0.991| 0.989|
8| Logistic Regression| 0.934| 0.941| 0.943| 0.927|
9| Naive Bayes Classifier| 0.605| 0.454| 0.292| 0.997|
Feature importance for Malicious URL Detection

Gradient Boosting Classifier currectly classify URL upto 97.4% respective classes and hence reduces the chance of malicious attachments.
\The final conclusion on the Malicious dataset is that the some feature like "HTTTPS", "AnchorURL", "WebsiteTraffic" have more importance to classify URL is Malicious URL or not.
The final take away form this project is to explore various machine learning models, perform Exploratory Data Analysis on Malicious dataset and understanding their features.