Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/samp1012/email_sms_spam_detector
An Email/SMS spam classifier that aims to identify and distinguish between spam and non-spam messages.
https://github.com/samp1012/email_sms_spam_detector
multinomial-naive-bayes naive-bayes-classifier natural-language-processing numpy pandas python scikit-learn spam-detection text-vectorization tokenization
Last synced: 14 days ago
JSON representation
An Email/SMS spam classifier that aims to identify and distinguish between spam and non-spam messages.
- Host: GitHub
- URL: https://github.com/samp1012/email_sms_spam_detector
- Owner: samp1012
- Created: 2024-12-14T06:55:50.000Z (20 days ago)
- Default Branch: main
- Last Pushed: 2024-12-17T02:55:33.000Z (17 days ago)
- Last Synced: 2024-12-17T03:48:05.902Z (17 days ago)
- Topics: multinomial-naive-bayes, naive-bayes-classifier, natural-language-processing, numpy, pandas, python, scikit-learn, spam-detection, text-vectorization, tokenization
- Language: Jupyter Notebook
- Homepage:
- Size: 981 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Email and SMS Spam Detector
This repository contains a machine learning-based **Email and SMS Spam Detector** that identifies whether a given text message (email or SMS) is spam or not. The project leverages natural language processing (NLP) techniques and classification models to make predictions. It also includes a simple web application for user interaction.
---
## Features
- **Spam Detection**: Accurately classify messages as spam or not.
- **Text Preprocessing**: Includes text cleaning, tokenization, and vectorization.
- **Machine Learning Models**: Utilizes classification algorithms such as Naive Bayes, Logistic Regression, or other customizable models.
- **Web Application**: Flask-based web app to interact with the model and predict messages.
- **Pre-trained Models**: Includes saved model and preprocessor for quick predictions.---
## Project Structure
```plaintext
Email_SMS_Spam_Detector/
├── spam.csv # Dataset containing labeled SMS data
├── main.ipynb # Jupyter Notebook for model training and testing
├── app.py # Flask web application for spam detection
├── preprocessor.pkl # Saved preprocessor for text data
├── model.pkl # Saved machine learning model
├── templates/ # HTML templates for the web application
├── requirements.txt # Python dependencies
└── README.md # Project documentation
```---
## Prerequisites
Before running this project, ensure you have the following installed:
- **Python 3.8+**
- **pip** (Python package manager)Install dependencies using:
```bash
pip install -r requirements.txt
```---
## How to Run
### 1. Clone the Repository
```bash
git clone https://github.com/samp1012/Email_SMS_Spam_Detector.git
cd Email_SMS_Spam_Detector
```### 2. Dataset
- The dataset file `spam.csv` is already included in the repository.
- It contains labeled SMS data with the following structure:| v1 | v2 |
|----------|----------------------------------|
| ham | "Go until jurong point..." |
| spam | "Free entry in 2 a weekly comp" |- `v1`: The label column (ham or spam).
- `v2`: The text message content.### 3. Run the Web Application
The project includes a Flask web application for spam detection.
Run the application with:
```bash
python app.py
```Access the web app in your browser at:
```
http://127.0.0.1:5000
```You can input a message, and the model will predict whether it is spam or not.
---
## Web App Templates
The web app uses basic HTML templates stored in the `templates/` folder:
- `index.html`: Main page to input text and view predictions.---
## Saved Models
The project includes pre-trained files for quick predictions:
- **preprocessor.pkl**: Text preprocessing pipeline.
- **model.pkl**: Machine learning model for spam detection.These files are used by the `app.py` script to process input data and generate predictions.
---
## Results
Evaluation metrics such as accuracy, precision, recall, and F1-score can be evaluated in the notebook. Visualizations are also generated for better insights.Example Output:
```
Accuracy: 98.5%
Precision: 95%
Recall: 92%
F1-Score: 93%
```---
## Technologies Used
- **Python**
- **Flask**: Web framework for deployment
- **Pandas**: Data manipulation
- **Scikit-learn**: Machine learning models
- **Numpy**: Numerical operations
- **Matplotlib/Seaborn**: Data visualization---
## Contact
For queries or suggestions:
- **GitHub**: [samp1012](https://github.com/samp1012)
- **Email**: [email protected]
- **Linked In**: [Samparka Das](https://www.linkedin.com/in/samparka-das-b4317726b/)