https://github.com/sabin74/spam_mail_detection
A machine learning project to classify SMS messages as Spam or Ham (Not Spam) using Natural Language Processing (NLP) techniques and Scikit-learn. This binary classification task uses the UCI SMS Spam Collection Dataset and implements various models including Naive Bayes, SVM, and Logistic Regression with performance tuning.
https://github.com/sabin74/spam_mail_detection
gridsearchcv nltk python scikit-learn smote sms-spam-detection uci-machine-learning
Last synced: about 2 months ago
JSON representation
A machine learning project to classify SMS messages as Spam or Ham (Not Spam) using Natural Language Processing (NLP) techniques and Scikit-learn. This binary classification task uses the UCI SMS Spam Collection Dataset and implements various models including Naive Bayes, SVM, and Logistic Regression with performance tuning.
- Host: GitHub
- URL: https://github.com/sabin74/spam_mail_detection
- Owner: sabin74
- Created: 2025-06-12T15:39:22.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-12T15:50:04.000Z (about 1 year ago)
- Last Synced: 2025-06-22T10:02:42.927Z (about 1 year ago)
- Topics: gridsearchcv, nltk, python, scikit-learn, smote, sms-spam-detection, uci-machine-learning
- Language: Jupyter Notebook
- Homepage: https://archive.ics.uci.edu/dataset/228/sms+spam+collection
- Size: 394 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 📧 Spam Email Detection
A machine learning project to classify SMS messages as **Spam** or **Ham** (Not Spam) using **Natural Language Processing (NLP)** techniques and **Scikit-learn**. This binary classification task uses the **UCI SMS Spam Collection Dataset** and implements various models including Naive Bayes, SVM, and Logistic Regression with performance tuning.
---
## 🚀 Features
- Text preprocessing and cleaning
- Feature extraction using TF-IDF with n-grams
- Handling imbalanced classes using **SMOTE**
- Hyperparameter tuning with **GridSearchCV**
- Model comparison: Naive Bayes, SVM, Logistic Regression
- Save & load trained model and vectorizer
- Predict new SMS messages
---
## 🛠️ Tools & Libraries
- Python
- Pandas, NumPy
- Scikit-learn
- NLTK (for stopword removal)
- Imbalanced-learn (for SMOTE)
- Matplotlib, Seaborn (for visualization)
- Joblib (for model persistence)