https://github.com/rajveersinghcse/email-classification

📧Email classification using a Machine Learning Models. It categorizes emails as either "ABUSIVE" or "NON ABUSIVE" based on their content, allowing users to quickly assess the nature of the email messages they input.
https://github.com/rajveersinghcse/email-classification

email email-classification internship project

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/rajveersinghcse/email-classification
Owner: rajveersinghcse
License: mit
Created: 2023-09-01T05:23:49.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2024-03-24T05:58:22.000Z (over 2 years ago)
Last Synced: 2025-03-01T18:25:35.474Z (over 1 year ago)
Topics: email, email-classification, internship, project
Language: Jupyter Notebook
Homepage:
Size: 36.2 MB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

---

# Email Classification for Abusive Content Detection

This project focuses on building a classification model to distinguish between abusive and non-abusive emails. With the increasing prevalence of online harassment and offensive communication, automated systems for identifying abusive content have become essential.

## Table of Contents

- [Introduction](#introduction)
- [Dataset](#dataset)
- [Approach](#approach)
- [Dependencies](#dependencies)
- [Results](#results)
- [Contributing](#contributing)
- [License](#license)

## Introduction

This project aims to develop a machine-learning model that accurately categorizes emails into abusive and non-abusive categories. This can be particularly useful for email providers, social media platforms, and other online communication platforms to filter out harmful content and ensure a safer environment for their users.

## Dataset

The dataset used for training and evaluation comprises a diverse collection of emails labeled as abusive or non-abusive. The dataset has been preprocessed to remove personally identifiable information and sensitive content.

## Approach

The classification model is built using natural language processing (NLP) techniques and machine learning algorithms. The process involves:

1. **Data Preprocessing**: Cleaning and tokenizing the text data, removing stopwords, and performing other necessary preprocessing steps.
2. **Feature Engineering**: Extracting relevant features from the text data, such as TF-IDF vectors or word embeddings.
3. **Model Selection**: Evaluating various classification algorithms such as Naive Bayes, SVM, and neural networks to determine the most effective approach.
4. **Training and Evaluation**: Training the selected model on the labeled dataset and evaluating its performance using metrics such as accuracy, precision, recall, and F1-score.
5. **Deployment**: Integrating the trained model into an application or service for real-time classification of incoming emails.

## Dependencies

- Python 3.x
- scikit-learn
- NLTK
- Pandas
- NumPy

## Results

The performance of the model on the test dataset is as follows:

- Passive Aggressive Classifier--------->99.56%
- Naive Bayes--------------------------->97.10%
- TFIDF---------------------------------->99.61%
- TFIDF: Bigrams------------------------>99.71%
- TFIDF: Trigrams------------------------>99.71%

## Contributing

Contributions to this project are welcome. If you have any suggestions for improvements or would like to report issues, please submit a pull request or open an issue on GitHub.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rajveersinghcse/email-classification

Awesome Lists containing this project

README