https://github.com/rishieeee/spam-email-classifier

A simple machine learning project that classifies emails as spam or ham using TF-IDF and a Multinomial Naive Bayes model. The project covers data cleaning, text preprocessing, feature extraction, model training, and evaluation. A great beginner-friendly introduction to NLP and ML workflows.
https://github.com/rishieeee/spam-email-classifier

multinomial-naive-bayes numpy pandas python sckit-learn tf-idf

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/rishieeee/spam-email-classifier
Owner: rishieeee
Created: 2025-11-19T04:57:57.000Z (about 2 months ago)
Default Branch: main
Last Pushed: 2025-11-19T05:12:13.000Z (about 2 months ago)
Last Synced: 2025-11-19T07:08:54.991Z (about 2 months ago)
Topics: multinomial-naive-bayes, numpy, pandas, python, sckit-learn, tf-idf
Language: Python
Homepage:
Size: 673 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

Spam Email Classifier
A complete project to classify emails as spam or ham using Python, Flask, and Machine Learning (TF-IDF + Multinomial Naive Bayes). This project includes a backend API and a frontend interface.

git clone https://github.com/rishieeee/spam-email-classifier.git
cd spam-email-classifier

python -m venv .venv

# Windows
.venv\Scripts\activate

# macOS/Linux
source .venv/bin/activate

2️⃣ Create and activate a virtual environment

python -m venv .venv

# Windows
.venv\Scripts\activate

# macOS/Linux
source .venv/bin/activate

3️⃣ Install required packages

pip install -r requirements.txt

Prepare the Dataset

Place a CSV file at data/dataset.csv.

Required columns:

text – Email content

label – Either spam or ham

Example dataset:
label,text
ham,"Hey! Are we still meeting for lunch today?"
spam,"Win a brand new car! Click here to claim your prize now."

Train the Model
From the backend folder, run:

cd backend
python train.py --data ../data/dataset.csv --model_dir models

This will:

Split the dataset into train/test sets

Train a TF-IDF + Multinomial Naive Bayes pipeline

Save the trained model to backend/models/spam_model.pkl

Run the Backend API

cd backend
python app.py

The Flask API will run at: http://127.0.0.1:5000/

Endpoints:

/ → Serves the frontend page (index.html)

/api/predict → POST endpoint for spam/ham prediction

Example POST request (JSON):
{
"text": "Win a FREE iPhone now! Click here"
}

Example response:
{
"ok": true,
"prediction": "Spam"
}

Project Structure

spam-email-classifier/
│
├── backend/
│ ├── app.py
│ ├── train.py
│ ├── predict.py
│ └── models/
│ ├── spam_model.pkl
│ └── vectorizer.pkl
│
├── frontend/
│ ├── index.html
│ ├── style.css
│ └── script.js
│
├── data/
│ └── dataset.csv
├── .venv/
├── requirements.txt
└── README.md

Notes & Tips

Ensure your Python version matches the one used during training to avoid sklearn compatibility issues

For larger datasets, consider using class weights or other classifiers like Logistic Regression or Linear SVM

Use pip install --upgrade scikit-learn pandas if encountering version mismatch warnings

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rishieeee/spam-email-classifier

Awesome Lists containing this project

README