https://github.com/selcia25/text-classification

📝A text classification project at SSN College Of Engineering, using the Naive Bayes algorithm to detect spam emails and explore features and algorithms for efficient detection.
https://github.com/selcia25/text-classification

naive-bayes-classifier spam-detection text-classification

Last synced: about 1 month ago
JSON representation

📝A text classification project at SSN College Of Engineering, using the Naive Bayes algorithm to detect spam emails and explore features and algorithms for efficient detection.

Host: GitHub
URL: https://github.com/selcia25/text-classification
Owner: selcia25
Created: 2023-10-16T16:07:53.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-10-16T16:19:33.000Z (about 2 years ago)
Last Synced: 2025-02-21T23:18:12.707Z (10 months ago)
Topics: naive-bayes-classifier, spam-detection, text-classification
Language: Jupyter Notebook
Homepage:
Size: 3.29 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Text Classification Using Naive Bayes Algorithm

This project focuses on building a text classification system to detect spam emails using the Naive Bayes algorithm. It includes steps to load a dataset, split it into training and testing sets, and use Count Vectorization to convert text data into numerical features for machine learning. The project also evaluates the model's performance using accuracy, precision, recall, and F1 score metrics.

## Prerequisites

Before running the code, make sure you have the following Python packages installed:

- [pandas](https://pypi.org/project/pandas/)
- [scikit-learn](https://scikit-learn.org/stable/)

You can install them using pip:

pip install pandas
pip install scikit-learn

## Usage

1. **Load the dataset**: The code loads the dataset from a CSV file named 'emails.csv'. Ensure that you have this file in the same directory as your code or provide the correct file path.

2. **Data Split**: The dataset is split into a training set and a test set using a default ratio of 3:1. You can adjust the split ratio as needed.

3. **Count Vectorization**: The text data is transformed into numerical features using the CountVectorizer method. The vocabulary is learned from the training data and applied to both the training and testing data for consistency.

4. **Training the Model**: A Multinomial Naive Bayes model is trained on the training data.

5. **Model Evaluation**: The model's performance is evaluated using the following metrics:
- Accuracy
- Precision
- Recall
- F1 score

6. **Testing the Model**: The code provides examples of email texts to classify as spam or not spam using the trained Naive Bayes model.

## Example Emails

Here are some example emails to test the classification model:

1. "Subject: Meeting Agenda for Tomorrow" - Not Spam
2. "Subject: Get a Free iPhone X" - Spam
3. "Subject: Invitation to a Birthday Party" - Not Spam
4. "Subject: Important Updates on Your Account" - Spam
5. "Subject: Dinner Plans for This Friday" - Not Spam
6. "Subject: Claim Your Prize Now!" - Spam

## Run the Code

You can run the code provided in your Python environment. Make sure to adapt the file paths and data as needed. After running the code, it will classify the example emails and print the results.

Feel free to modify the code for your specific use case or dataset.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/selcia25/text-classification

Awesome Lists containing this project

README