Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/andrewsy1004/logistic-regression-spam-classifier
This project implements a spam email classifier using Logistic Regression.
https://github.com/andrewsy1004/logistic-regression-spam-classifier
numpy pandas scikit-learn
Last synced: 27 days ago
JSON representation
This project implements a spam email classifier using Logistic Regression.
- Host: GitHub
- URL: https://github.com/andrewsy1004/logistic-regression-spam-classifier
- Owner: Andrewsy1004
- Created: 2024-12-15T16:15:43.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2024-12-15T16:23:00.000Z (about 1 month ago)
- Last Synced: 2024-12-15T17:26:52.667Z (about 1 month ago)
- Topics: numpy, pandas, scikit-learn
- Language: Jupyter Notebook
- Homepage:
- Size: 205 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 📬 Logistic Regression Spam Classifier
This project implements a **Spam Email Classifier** using **Logistic Regression**, trained on a dataset of SMS messages. The model distinguishes between **ham** (non-spam) and **spam** messages. This project demonstrates how to process text data, apply machine learning, and evaluate model performance.
## 🚀 Features:
- **Text Preprocessing**: The text data is cleaned and transformed using **TF-IDF Vectorization**, which converts the raw text into numerical feature vectors.
- **Model Training**: A **Logistic Regression** model is trained to classify SMS messages as either "ham" or "spam".
- **Model Evaluation**: Performance metrics such as **accuracy**, **precision**, **recall**, and **F1-score** are used to evaluate the model's effectiveness.## 📊 Steps:
1. **Data Preprocessing**:
- The dataset is cleaned by removing stop words and converting all text to lowercase.
- The text is transformed into numerical features using the **TF-IDF** vectorizer.
2. **Training**:
- The **Logistic Regression** model is trained on the processed data.
3. **Evaluation**:
- The model is evaluated on both training and test datasets using multiple performance metrics (accuracy, precision, recall, F1-score).
## 📋 Dependencies:
- `pandas`: For data manipulation and handling.
- `numpy`: For numerical operations.
- `scikit-learn`: For machine learning models, including logistic regression and vectorization.