https://github.com/shubhamgoyal575/spam_detective
This project uses machine learning to classify messages as spam or ham based on text analysis. It includes data preprocessing, feature extraction (TF-IDF), and classification models like Logistic Regression and Naive Bayes for accurate spam detection. Built with Python and Scikit-Learn. 🚀
https://github.com/shubhamgoyal575/spam_detective
count-vectorizer data-analysis data-analytics data-cleaning data-preprocessing data-science data-visualization data-wrangling exploratory-data-analysis logistic-regression machine-learning machine-learning-algorithms naive-bayes natural-language-processing spam-detection tfidf-vectorizer
Last synced: 3 months ago
JSON representation
This project uses machine learning to classify messages as spam or ham based on text analysis. It includes data preprocessing, feature extraction (TF-IDF), and classification models like Logistic Regression and Naive Bayes for accurate spam detection. Built with Python and Scikit-Learn. 🚀
- Host: GitHub
- URL: https://github.com/shubhamgoyal575/spam_detective
- Owner: shubhamgoyal575
- Created: 2025-01-20T08:03:31.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-01-26T05:35:17.000Z (9 months ago)
- Last Synced: 2025-06-25T05:11:32.511Z (4 months ago)
- Topics: count-vectorizer, data-analysis, data-analytics, data-cleaning, data-preprocessing, data-science, data-visualization, data-wrangling, exploratory-data-analysis, logistic-regression, machine-learning, machine-learning-algorithms, naive-bayes, natural-language-processing, spam-detection, tfidf-vectorizer
- Language: Jupyter Notebook
- Homepage:
- Size: 1.69 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## 📧 Spam Classifier - NLP Project
Welcome to the Spam Classifier project! This repository contains an end-to-end implementation of a machine learning model that predicts whether a given message is spam or ham. The project uses Natural Language Processing (NLP) techniques to process and classify text data, with predictions performed using a trained machine learning model.## 📖 Project Overview
The goal of this project is to classify messages into two categories:- **Spam:** Unwanted or unsolicited messages, often promotional or fraudulent.
- **Ham:** Genuine, non-spam messages.
This project demonstrates the use of NLP techniques and machine learning to build a robust and accurate spam classifier.## 🛠️ Tools and Technologies Used
**Programming Language:** Python**Libraries:**
- numpy and pandas for data manipulation
- scikit-learn for building and evaluating the machine learning model
- nltk and re for text preprocessing
- matplotlib and seaborn for data visualization
- Environment: Jupyter Notebook## 🧑💻 Key Steps in the Project
### Data Collection:
The dataset contains labeled text messages, where each message is marked as either "spam" or "ham."### Data Preprocessing:
- Removing unnecessary characters, punctuation, and stopwords.
- Tokenizing the text into individual words.
- Converting words into their base form using lemmatization or stemming.### Feature Extraction:
Using Count Vectorizer , TF-IDF Vectorizer to convert text data into numerical form suitable for machine learning models.### Model Selection and Training:
Trained various machine learning models, including:
- Naive Bayes Classifier
- Logistic Regression
- Support Vector Machine (SVM)Selected the best-performing model based on accuracy, precision, recall, and F1-score.
### Model Evaluation:
Evaluated the model on a test dataset.
Visualized performance metrics such as confusion matrix, ROC curve, and classification report.### Prediction:
Built a function to predict whether a new message is spam or ham using the trained model.📊 Results
Accuracy: Achieved an accuracy of 98% on the test dataset (update with your result).
- Precision: 100%
The model demonstrated strong performance in distinguishing between spam and ham messages.