An open API service indexing awesome lists of open source software.

https://github.com/otuemre/emailphishingdetection

A real-time phishing email detection system using Machine Learning (SVM, Logistic Regression, Naive Bayes) with FastAPI backend and custom domain deployment.
https://github.com/otuemre/emailphishingdetection

cybersecurity fastapi huggingface machine-learning nlp real-time scikit-learn spam-detection svm-classifier tfidf-vectorizer

Last synced: about 2 months ago
JSON representation

A real-time phishing email detection system using Machine Learning (SVM, Logistic Regression, Naive Bayes) with FastAPI backend and custom domain deployment.

Awesome Lists containing this project

README

          

# ๐Ÿ“ง Email Phishing Detection

[![License: MIT](https://img.shields.io/github/license/otuemre/EmailPhishingDetection?style=flat-square)](./LICENSE.md)
[![Deploy on Render](https://img.shields.io/badge/Deploy-Render-5e60ce?logo=render&style=flat-square)](https://phishingdetection.net)
[![Hugging Face Models](https://img.shields.io/badge/HuggingFace-SVM%20%7C%20TFIDF-orange?logo=huggingface&style=flat-square)](https://huggingface.co/emreotu)

Detect phishing emails in real-time using machine learning โ€” trained on six merged datasets and deployed via a full-stack FastAPI app.

๐Ÿ”— **Live Demo**: [https://phishingdetection.net](https://phishingdetection.net)

## ๐Ÿ“š Table of Contents

- [What It Does](#-what-it-does)
- [Tech Stack](#๏ธ-tech-stack)
- [ML Models](#-ml-models)
- [Sample Input](#-sample-input)
- [Future Improvements](#-future-improvements)
- [Project Structure](#-project-structure)
- [Acknowledgements](#-acknowledgements)
- [Author](#-author)

---

## ๐Ÿง  What It Does

This project allows users to paste real email content (sender, subject, body, etc.) and choose between three machine learning models to detect whether it's **Phishing** or **Legitimate**.

It combines:
- ๐Ÿ“Š Natural Language Processing (TF-IDF + NLTK)
- ๐Ÿค– ML models (Naive Bayes, Logistic Regression, SVM)
- ๐ŸŒ Live API deployment + responsive UI

---

## ๐Ÿ› ๏ธ Tech Stack

| Layer | Tech |
|----------------|----------------------------|
| Frontend | HTML, CSS, JavaScript |
| Backend | FastAPI, Uvicorn |
| ML/NLP | Scikit-learn, NLTK, joblib |
| Deployment | Render, Namecheap |
| Hosting Models | Hugging Face ๐Ÿค— |

---

## ๐Ÿค– ML Models

Choose between:
- โœ… Support Vector Machine (Best Accuracy)
- โœ… Logistic Regression
- โœ… Multinomial Naive Bayes

### ๐Ÿ“ฆ Hugging Face Models

- ๐Ÿ”— [SVM Model](https://huggingface.co/otuemre/email-phishing-svm)
- ๐Ÿ”— [TF-IDF Vectorizer](https://huggingface.co/otuemre/email-phishing-vectorizer)

---

## ๐Ÿงช Sample Input

```
Sender: freeiphone@gmail.com
Subject: Don't miss this chance!
Body: Click the link to claim your free iPhone 14 Pro Max.
Date: May 5, 2025
Model: SVM
```

โœ”๏ธ Output: **Phishing**

---

### ๐Ÿš€ Future Improvements

- **๐Ÿ“ก Public API & Documentation**
Provide a proper REST API endpoint with OpenAPI/Swagger documentation so developers can integrate the phishing detection system into their own applications.

- **๐ŸŽจ Improve the UI**
Rebuild the frontend using a modern framework like React (possibly with Tailwind or Material UI) to create a more interactive and responsive user experience.

- **๐Ÿ”— URL-Based Model**
Train and integrate a secondary model focused specifically on analyzing URLs for phishing characteristics such as domain structure, length, obfuscation, and suspicious keywords.

- **๐Ÿ“ˆ Expand the Dataset**
Enhance the model's performance by collecting a larger and more diverse dataset of phishing and legitimate emails, improving generalization and reducing bias.

- **๐Ÿง  Improve Model Explainability**
Integrate explainable AI tools like SHAP or LIME to provide transparency into why the model classified an email as phishing or legitimate.

- **๐Ÿ“ฌ Real-Time Email API Integration (Optional)**
Integrate with email providers like Gmail or Microsoft Outlook via API to allow live scanning of user inboxes (with permission) and flag suspicious messages in real time.

---

## ๐Ÿ—‚๏ธ Project Structure

```
EmailPhishingDetection/
โ”œโ”€โ”€ api/
โ”‚ โ”œโ”€โ”€ main.py
โ”‚ โ””โ”€โ”€ pipeline.py
โ”œโ”€โ”€ data/
โ”‚ โ””โ”€โ”€ phishing_email.csv
โ”œโ”€โ”€ frontend/
โ”‚ โ””โ”€โ”€ static/
โ”‚ โ””โ”€โ”€ index.html
โ”œโ”€โ”€ models/
โ”‚ โ”œโ”€โ”€ logistic_regression_model.joblib
โ”‚ โ”œโ”€โ”€ naive_bayes_model.joblib
โ”‚ โ”œโ”€โ”€ svm_model.joblib
โ”‚ โ””โ”€โ”€ tfidf_vectorizer.joblib
โ”œโ”€โ”€ notebooks/
โ”‚ โ””โ”€โ”€ 01_training.ipynb
โ”œโ”€โ”€ images/
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ LICENSE.md
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ render.yml
โ””โ”€โ”€ requirements.txt
```

---

## ๐Ÿ™ Acknowledgements

This project uses the **Phishing Email Dataset** by [Naser Abdullah Alam on Kaggle](https://www.kaggle.com/datasets/naserabdullahalam/phishing-email-dataset).

Please cite the following article if using this dataset:

> **Al-Subaiey, A., Al-Thani, M., Alam, N. A., Antora, K. F., Khandakar, A., & Zaman, S. A. U. (2024, May 19).**
> *Novel Interpretable and Robust Web-based AI Platform for Phishing Email Detection*.
> ArXiv: [https://arxiv.org/abs/2405.11619](https://arxiv.org/abs/2405.11619)

---

## ๐Ÿ‘จโ€๐Ÿ’ป Author

**Emre OTU**
๐Ÿ”— [GitHub](https://github.com/otuemre) | [LinkedIn](https://linkedin.com/in/emreotu)