Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/beenish-ishtiaq/dep-task-2-spam-email-classifier

This project focuses on building a classifier to distinguish between spam and ham emails using Logistic Regression. Key steps include data preprocessing, feature extraction with TF-IDF vectorization, and model evaluation with accuracy metrics and a confusion matrix.
https://github.com/beenish-ishtiaq/dep-task-2-spam-email-classifier

data-science email-filtering logistic-regression machine-learning natural-language-processing python spam-detection text-classification tfidf-vectorizer

Last synced: 14 days ago
JSON representation

This project focuses on building a classifier to distinguish between spam and ham emails using Logistic Regression. Key steps include data preprocessing, feature extraction with TF-IDF vectorization, and model evaluation with accuracy metrics and a confusion matrix.

Awesome Lists containing this project

README

        

# Spam Email Classifier
## Overview
This repository contains code for a spam email classifier developed as part of my Machine Learning internship at Digital Empowerment Network. The goal of this project is to classify emails as spam or ham (not spam) using machine learning techniques.

## Dataset
The dataset used for this project is the mail_data.csv file, which contains the following columns:
1. Category: The label of the email, either 'spam' or 'ham'.
2. Message: The content of the email.

## Steps Involved
### 1. Data Preprocessing
Cleaning the data and preparing it for model training.
### 2.Feature Extraction
Transforming the text data into numerical features using TF-IDF vectorization.
### 3.Model Training
Training a Logistic Regression model to classify emails.
### 4.Model Evaluation
Evaluating the model's performance using accuracy, classification report, and confusion matrix.

## Results
The Logistic Regression model achieved the following results:
Accuracy on training data: 0.967
Accuracy on test data: 0.965

## Usage
To run the code, follow these steps:
1. Clone this repository to your local machine.
2. Navigate to the directory containing the code.
3. Ensure that the mail_data.csv file is in the same directory as the code.
4. Run the script: python spam_email_classifier.ipynb

## Conclusion
This project demonstrates the process of building a spam email classifier using Logistic Regression. The model can accurately classify emails as spam or ham based on their content. Future improvements could include experimenting with different models and techniques to further enhance accuracy.