https://github.com/satyamtripathi8/email_spam-ham_classification
This GitHub repository contains a spam-ham classification project using Naive Bayes (MultinomialNB and BernoulliNB). It processes text using tokenization, stopword removal, and TF-IDF vectorization. The model is evaluated using accuracy, precision, recall, and F1-score. Easily install dependencies and run the script for spam detection. 🚀
https://github.com/satyamtripathi8/email_spam-ham_classification
classification flask ham machine-learning navie-bayes-algorithm spam
Last synced: 4 months ago
JSON representation
This GitHub repository contains a spam-ham classification project using Naive Bayes (MultinomialNB and BernoulliNB). It processes text using tokenization, stopword removal, and TF-IDF vectorization. The model is evaluated using accuracy, precision, recall, and F1-score. Easily install dependencies and run the script for spam detection. 🚀
- Host: GitHub
- URL: https://github.com/satyamtripathi8/email_spam-ham_classification
- Owner: satyamtripathi8
- Created: 2024-06-12T06:29:03.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-18T04:47:19.000Z (4 months ago)
- Last Synced: 2025-02-18T05:29:39.576Z (4 months ago)
- Topics: classification, flask, ham, machine-learning, navie-bayes-algorithm, spam
- Language: Jupyter Notebook
- Homepage:
- Size: 238 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Spam-Ham Classification using Naive Bayes
This repository contains a Spam-Ham classification project using Naive Bayes classifiers (`MultinomialNB` and `BernoulliNB`) from `scikit-learn`.
## Table of Contents
- [Introduction](#introduction)
- [Dataset](#dataset)
- [Installation](#installation)
- [Usage](#usage)
- [Model](#model)
- [Evaluation](#evaluation)
- [Contributing](#contributing)
- [License](#license)## Introduction
Spam detection is a common NLP problem where emails, messages, or text data are classified as either spam or ham (not spam). This project implements a simple classifier using Naive Bayes algorithms.## Dataset
The dataset used for training and testing consists of labeled text messages indicating whether they are spam or ham. A common dataset used for this task is the [SMS Spam Collection](https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset).## Installation
Clone the repository and install the dependencies:```bash
git clone https://github.com/your-username/spam-ham-classifier.git
cd spam-ham-classifier
pip install -r requirements.txt
```## Usage
Run the classification script:```bash
python spam_ham_classifier.py
```## Model
This project uses two types of Naive Bayes classifiers:
1. **MultinomialNB** - Best suited for text classification problems with term frequency features.
2. **BernoulliNB** - Works well with binary term occurrence features.The text data is processed using the following steps:
- Tokenization
- Stopword removal
- TF-IDF vectorization## Evaluation
The model is evaluated using common metrics such as:
- Accuracy
- Precision
- Recall
- F1-scoreResults are printed at the end of the script.
## Contributing
Contributions are welcome! Feel free to fork the repository and submit pull requests.## License
This project is licensed under the MIT License.