Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/iiakshat/spam-mail-detection
A simple text classifier in Python that uses the Naive Bayes model to classify e-mails as spam or ham.
https://github.com/iiakshat/spam-mail-detection
email-spam-filter machine-learning naive-bayes-classifier spam-classification spam-detection spam-filtering
Last synced: about 1 month ago
JSON representation
A simple text classifier in Python that uses the Naive Bayes model to classify e-mails as spam or ham.
- Host: GitHub
- URL: https://github.com/iiakshat/spam-mail-detection
- Owner: iiakshat
- Created: 2023-02-02T04:59:49.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-06-03T09:02:54.000Z (7 months ago)
- Last Synced: 2024-06-03T17:16:43.860Z (7 months ago)
- Topics: email-spam-filter, machine-learning, naive-bayes-classifier, spam-classification, spam-detection, spam-filtering
- Language: Jupyter Notebook
- Homepage:
- Size: 678 KB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
---
title: Spam Email Detection
emoji: 💌
colorFrom: pink
colorTo: blue
sdk: gradio
sdk_version: 3.17.0
app_file: app.py
---# Email Spam and Phishing URL Detection
This project utilizes Naive Bayes classification to detect whether an email is spam or not, and XGBoost classification to determine if a URL within an email is phishing or legitimate.
# Getting Started
## Project OverviewThe project consists of two main components:
1. **Email Spam Detection**: This component employs Naive Bayes classification to classify emails as either spam or not spam based on their content features.
2. **Phishing URL Detection**: This component uses XGBoost classification to identify whether URLs within emails are associated with phishing attempts or legitimate websites.
## Prerequisites
Make sure you have Python 3.10 installed on your system. You can download it from [](python.org)## Requirements
Ensure you have the following dependencies installed. You can install them using `pip install -r requirements.txt`.- gunicorn==22.0.0
- python-dateutil==2.8.2
- gradio==4.32.1
- gradio_client==0.17.0
- requests==2.31.0
- beautifulsoup4==4.12.3
- googlesearch_python==1.2.4
- urlextract==1.9.0
- numpy==1.26.3
- pandas==2.2.0
- scikit-learn==1.5.0
- urllib3==2.1.0
- python-whois==0.9.4
- xgboost==2.0.3
- lxml==5.2.2## Setup and Installation
1. Clone the repository:
```bash
git clone https://github.com/your-username/email-spam-phishing-detection.git
cd email-spam-phishing-detection2. Install dependencies:
```bash
pip install -r requirements.txt```
## Usage
1. **Data Preparation:**
- Ensure the datasets `spam.csv` and `urldata.csv` are available in the `data/` directory.2. **Model Training:**
- If necessary, modify and run the `notebook.ipynb` Jupyter notebook to train or fine-tune the machine learning models.
- Trained models will be saved in the `models/` directory.3. **Run the Application:**
- Execute `app.py` to start the application.
- Access the application at [Hugging Face Space](https://huggingface.co/spaces/akshatsanghvi/spam-email-detection)## Acknowledgements
- The email spam classification model is trained using the `spam.csv` dataset, sourced from [Dataset: Spam/ham mail](https://www.kaggle.com/datasets/mfaisalqureshi/spam-email?resource=download)).
- The URL phishing detection model is trained using the `urldata.csv` dataset, sourced from [Phishing Websites Dataset](https://www.kaggle.com/datasets).## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.