https://github.com/andrewsy1004/logistic-regression-spam-classifier

This project implements a spam email classifier using Logistic Regression.
https://github.com/andrewsy1004/logistic-regression-spam-classifier

numpy pandas scikit-learn

Last synced: 3 months ago
JSON representation

This project implements a spam email classifier using Logistic Regression.

Host: GitHub
URL: https://github.com/andrewsy1004/logistic-regression-spam-classifier
Owner: Andrewsy1004
Created: 2024-12-15T16:15:43.000Z (7 months ago)
Default Branch: main
Last Pushed: 2024-12-15T16:23:00.000Z (7 months ago)
Last Synced: 2025-04-06T13:19:01.456Z (3 months ago)
Topics: numpy, pandas, scikit-learn
Language: Jupyter Notebook
Homepage:
Size: 205 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# 📬 Logistic Regression Spam Classifier

This project implements a **Spam Email Classifier** using **Logistic Regression**, trained on a dataset of SMS messages. The model distinguishes between **ham** (non-spam) and **spam** messages. This project demonstrates how to process text data, apply machine learning, and evaluate model performance.

## 🚀 Features:
- **Text Preprocessing**: The text data is cleaned and transformed using **TF-IDF Vectorization**, which converts the raw text into numerical feature vectors.
- **Model Training**: A **Logistic Regression** model is trained to classify SMS messages as either "ham" or "spam".
- **Model Evaluation**: Performance metrics such as **accuracy**, **precision**, **recall**, and **F1-score** are used to evaluate the model's effectiveness.

## 📊 Steps:
1. **Data Preprocessing**:
- The dataset is cleaned by removing stop words and converting all text to lowercase.
- The text is transformed into numerical features using the **TF-IDF** vectorizer.

2. **Training**:
- The **Logistic Regression** model is trained on the processed data.

3. **Evaluation**:
- The model is evaluated on both training and test datasets using multiple performance metrics (accuracy, precision, recall, F1-score).

## 📋 Dependencies:
- `pandas`: For data manipulation and handling.
- `numpy`: For numerical operations.
- `scikit-learn`: For machine learning models, including logistic regression and vectorization.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/andrewsy1004/logistic-regression-spam-classifier

Awesome Lists containing this project

README