An open API service indexing awesome lists of open source software.

https://github.com/suvroneel/spam-email-classifier

It’s an E2E ML project to filter spam msgs by using naive bayes classifier ✨💖
https://github.com/suvroneel/spam-email-classifier

google-sheets-api machine-learning multinomial-naive-bayes naive-bayes-classifier natural-language-processing pandas python3

Last synced: 7 months ago
JSON representation

It’s an E2E ML project to filter spam msgs by using naive bayes classifier ✨💖

Awesome Lists containing this project

README

          

🌟Spam Email Classifier | End-to-End ML Pipeline:

![Image](https://github.com/user-attachments/assets/439129bc-b593-4926-9faa-0d5fe90c7e57)

==================================================

🚀  Tech Stack











Streamlit/Google Sheets API

Key Features:

✅ Text Preprocessing:

📌**Lowercasing, tokenization, special char removal, stemming**

![Image](https://github.com/user-attachments/assets/ec0fa2e2-74ae-4002-9b42-b36d17f02930)

✅ Advanced NLP: TF-IDF vectorization + Multinomial Naïve Bayes

✅ Visual EDA:

📌**Spam Word Cloud**

![Image](https://github.com/user-attachments/assets/197f20ed-fe33-4267-b060-551b21fdacef)

📌**Ham Word Cloud**

![Image](https://github.com/user-attachments/assets/96ca117e-85f1-4c41-ac03-2eb909f5e688)

✅ MLOps Ready: Pickle model serialization and scikit-learn pipelines

✅ Live Deployment: Streamlit web app + Google Sheets integration for user input tracking

Business Value:
🛡️ Spam Filtering: Blocks 99% of unwanted messages

📈 Data Collection: Logs predictions for model improvement

![Image](https://github.com/user-attachments/assets/4807df2a-7687-42a1-b4a0-aa972b17a490)

🔮 Scalable: Pipeline adapts to new spam patterns

🚧 Future Improvements
1. Deep Learning Upgrade
:construction: CNN Integration:

Implement character-level CNN models (e.g., Char-CNN) for context-aware spam detection

Compare performance against current TF-IDF + Naïve Bayes pipeline

Updates
=======================================================
📍Update 1 - Now user's input data and their respective output will be recorded and will be used for futher training and testings ✔✔

📍Update 2 - Minor fixes . Implemented speeling check for further checking ✔✔

📍Update 3 - The model is much more trained and a navbar along with a breath taking wallpaper is added