https://github.com/otuemre/emailphishingdetection
A real-time phishing email detection system using Machine Learning (SVM, Logistic Regression, Naive Bayes) with FastAPI backend and custom domain deployment.
https://github.com/otuemre/emailphishingdetection
cybersecurity fastapi huggingface machine-learning nlp real-time scikit-learn spam-detection svm-classifier tfidf-vectorizer
Last synced: about 2 months ago
JSON representation
A real-time phishing email detection system using Machine Learning (SVM, Logistic Regression, Naive Bayes) with FastAPI backend and custom domain deployment.
- Host: GitHub
- URL: https://github.com/otuemre/emailphishingdetection
- Owner: otuemre
- License: mit
- Created: 2025-05-26T23:51:38.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-06T22:26:54.000Z (12 months ago)
- Last Synced: 2025-08-13T18:43:03.145Z (10 months ago)
- Topics: cybersecurity, fastapi, huggingface, machine-learning, nlp, real-time, scikit-learn, spam-detection, svm-classifier, tfidf-vectorizer
- Language: Jupyter Notebook
- Homepage: https://phishingdetection.net/
- Size: 13.8 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐ง Email Phishing Detection
[](./LICENSE.md)
[](https://phishingdetection.net)
[](https://huggingface.co/emreotu)
Detect phishing emails in real-time using machine learning โ trained on six merged datasets and deployed via a full-stack FastAPI app.
๐ **Live Demo**: [https://phishingdetection.net](https://phishingdetection.net)
## ๐ Table of Contents
- [What It Does](#-what-it-does)
- [Tech Stack](#๏ธ-tech-stack)
- [ML Models](#-ml-models)
- [Sample Input](#-sample-input)
- [Future Improvements](#-future-improvements)
- [Project Structure](#-project-structure)
- [Acknowledgements](#-acknowledgements)
- [Author](#-author)
---
## ๐ง What It Does
This project allows users to paste real email content (sender, subject, body, etc.) and choose between three machine learning models to detect whether it's **Phishing** or **Legitimate**.
It combines:
- ๐ Natural Language Processing (TF-IDF + NLTK)
- ๐ค ML models (Naive Bayes, Logistic Regression, SVM)
- ๐ Live API deployment + responsive UI
---
## ๐ ๏ธ Tech Stack
| Layer | Tech |
|----------------|----------------------------|
| Frontend | HTML, CSS, JavaScript |
| Backend | FastAPI, Uvicorn |
| ML/NLP | Scikit-learn, NLTK, joblib |
| Deployment | Render, Namecheap |
| Hosting Models | Hugging Face ๐ค |
---
## ๐ค ML Models
Choose between:
- โ
Support Vector Machine (Best Accuracy)
- โ
Logistic Regression
- โ
Multinomial Naive Bayes
### ๐ฆ Hugging Face Models
- ๐ [SVM Model](https://huggingface.co/otuemre/email-phishing-svm)
- ๐ [TF-IDF Vectorizer](https://huggingface.co/otuemre/email-phishing-vectorizer)
---
## ๐งช Sample Input
```
Sender: freeiphone@gmail.com
Subject: Don't miss this chance!
Body: Click the link to claim your free iPhone 14 Pro Max.
Date: May 5, 2025
Model: SVM
```
โ๏ธ Output: **Phishing**
---
### ๐ Future Improvements
- **๐ก Public API & Documentation**
Provide a proper REST API endpoint with OpenAPI/Swagger documentation so developers can integrate the phishing detection system into their own applications.
- **๐จ Improve the UI**
Rebuild the frontend using a modern framework like React (possibly with Tailwind or Material UI) to create a more interactive and responsive user experience.
- **๐ URL-Based Model**
Train and integrate a secondary model focused specifically on analyzing URLs for phishing characteristics such as domain structure, length, obfuscation, and suspicious keywords.
- **๐ Expand the Dataset**
Enhance the model's performance by collecting a larger and more diverse dataset of phishing and legitimate emails, improving generalization and reducing bias.
- **๐ง Improve Model Explainability**
Integrate explainable AI tools like SHAP or LIME to provide transparency into why the model classified an email as phishing or legitimate.
- **๐ฌ Real-Time Email API Integration (Optional)**
Integrate with email providers like Gmail or Microsoft Outlook via API to allow live scanning of user inboxes (with permission) and flag suspicious messages in real time.
---
## ๐๏ธ Project Structure
```
EmailPhishingDetection/
โโโ api/
โ โโโ main.py
โ โโโ pipeline.py
โโโ data/
โ โโโ phishing_email.csv
โโโ frontend/
โ โโโ static/
โ โโโ index.html
โโโ models/
โ โโโ logistic_regression_model.joblib
โ โโโ naive_bayes_model.joblib
โ โโโ svm_model.joblib
โ โโโ tfidf_vectorizer.joblib
โโโ notebooks/
โ โโโ 01_training.ipynb
โโโ images/
โโโ .gitignore
โโโ LICENSE.md
โโโ README.md
โโโ render.yml
โโโ requirements.txt
```
---
## ๐ Acknowledgements
This project uses the **Phishing Email Dataset** by [Naser Abdullah Alam on Kaggle](https://www.kaggle.com/datasets/naserabdullahalam/phishing-email-dataset).
Please cite the following article if using this dataset:
> **Al-Subaiey, A., Al-Thani, M., Alam, N. A., Antora, K. F., Khandakar, A., & Zaman, S. A. U. (2024, May 19).**
> *Novel Interpretable and Robust Web-based AI Platform for Phishing Email Detection*.
> ArXiv: [https://arxiv.org/abs/2405.11619](https://arxiv.org/abs/2405.11619)
---
## ๐จโ๐ป Author
**Emre OTU**
๐ [GitHub](https://github.com/otuemre) | [LinkedIn](https://linkedin.com/in/emreotu)