https://github.com/rajveersinghcse/email-classification
📧Email classification using a Machine Learning Models. It categorizes emails as either "ABUSIVE" or "NON ABUSIVE" based on their content, allowing users to quickly assess the nature of the email messages they input.
https://github.com/rajveersinghcse/email-classification
email email-classification internship project
Last synced: over 1 year ago
JSON representation
📧Email classification using a Machine Learning Models. It categorizes emails as either "ABUSIVE" or "NON ABUSIVE" based on their content, allowing users to quickly assess the nature of the email messages they input.
- Host: GitHub
- URL: https://github.com/rajveersinghcse/email-classification
- Owner: rajveersinghcse
- License: mit
- Created: 2023-09-01T05:23:49.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-03-24T05:58:22.000Z (about 2 years ago)
- Last Synced: 2025-01-12T09:28:05.733Z (over 1 year ago)
- Topics: email, email-classification, internship, project
- Language: Jupyter Notebook
- Homepage:
- Size: 36.2 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
---
# Email Classification for Abusive Content Detection
This project focuses on building a classification model to distinguish between abusive and non-abusive emails. With the increasing prevalence of online harassment and offensive communication, automated systems for identifying abusive content have become essential.
## Table of Contents
- [Introduction](#introduction)
- [Dataset](#dataset)
- [Approach](#approach)
- [Dependencies](#dependencies)
- [Results](#results)
- [Contributing](#contributing)
- [License](#license)
## Introduction
This project aims to develop a machine-learning model that accurately categorizes emails into abusive and non-abusive categories. This can be particularly useful for email providers, social media platforms, and other online communication platforms to filter out harmful content and ensure a safer environment for their users.
## Dataset
The dataset used for training and evaluation comprises a diverse collection of emails labeled as abusive or non-abusive. The dataset has been preprocessed to remove personally identifiable information and sensitive content.
## Approach
The classification model is built using natural language processing (NLP) techniques and machine learning algorithms. The process involves:
1. **Data Preprocessing**: Cleaning and tokenizing the text data, removing stopwords, and performing other necessary preprocessing steps.
2. **Feature Engineering**: Extracting relevant features from the text data, such as TF-IDF vectors or word embeddings.
3. **Model Selection**: Evaluating various classification algorithms such as Naive Bayes, SVM, and neural networks to determine the most effective approach.
4. **Training and Evaluation**: Training the selected model on the labeled dataset and evaluating its performance using metrics such as accuracy, precision, recall, and F1-score.
5. **Deployment**: Integrating the trained model into an application or service for real-time classification of incoming emails.
## Dependencies
- Python 3.x
- scikit-learn
- NLTK
- Pandas
- NumPy
## Results
The performance of the model on the test dataset is as follows:
- Passive Aggressive Classifier--------->99.56%
- Naive Bayes--------------------------->97.10%
- TFIDF---------------------------------->99.61%
- TFIDF: Bigrams------------------------>99.71%
- TFIDF: Trigrams------------------------>99.71%
## Contributing
Contributions to this project are welcome. If you have any suggestions for improvements or would like to report issues, please submit a pull request or open an issue on GitHub.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
---