https://github.com/devrihan/email-spam-detector
https://github.com/devrihan/email-spam-detector
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/devrihan/email-spam-detector
- Owner: devrihan
- Created: 2024-09-10T11:40:17.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-10T11:43:46.000Z (over 1 year ago)
- Last Synced: 2025-02-03T04:28:55.752Z (over 1 year ago)
- Language: Jupyter Notebook
- Size: 204 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Spam Detection Model
## Overview
This project demonstrates a basic spam detection system using **Naive Bayes** classification. The system is trained to distinguish between spam and non-spam messages using a dataset of SMS messages.
## Dataset
The dataset used is `spam.csv`, containing two columns:
- **Category**: Indicates whether the message is "spam" or "ham" (non-spam).
- **Message**: The content of the SMS message.
## Steps
1. **Data Preprocessing**:
- The `Category` column is converted into a binary format where `1` represents spam and `0` represents ham.
2. **Feature Extraction**:
- Used `CountVectorizer` to transform the text data into a matrix of token counts.
3. **Modeling**:
- Applied **Multinomial Naive Bayes** classifier to the tokenized data.
4. **Evaluation**:
- The model achieved an accuracy score of **98.64%** on the test data.
5. **Prediction**:
- Tested the model with two sample emails, where it correctly predicted one as spam and the other as non-spam.
## Libraries Used
- `pandas`: For data manipulation
- `scikit-learn`: For machine learning models and text vectorization
## How to Use
1. Ensure the required libraries are installed:
```bash
pip install pandas scikit-learn
```
2. Load the dataset and run the script.
3. The trained model can predict whether a given message is spam or not.