https://github.com/ahmed-ai-01/nlp
this repo will include all my work regarding NLP
https://github.com/ahmed-ai-01/nlp
kaggle kaggle-dataset nlp nlp-deep-learning nlp-library nlp-machine-learning nltk nltk-library nltk-python pandas pandas-python sklearn sklearn-classify sklearn-library sklearn-metrics sklearn-pipeline
Last synced: 6 months ago
JSON representation
this repo will include all my work regarding NLP
- Host: GitHub
- URL: https://github.com/ahmed-ai-01/nlp
- Owner: Ahmed-AI-01
- License: mit
- Created: 2024-02-14T00:20:49.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-02-14T00:23:59.000Z (over 1 year ago)
- Last Synced: 2025-02-12T07:56:02.387Z (8 months ago)
- Topics: kaggle, kaggle-dataset, nlp, nlp-deep-learning, nlp-library, nlp-machine-learning, nltk, nltk-library, nltk-python, pandas, pandas-python, sklearn, sklearn-classify, sklearn-library, sklearn-metrics, sklearn-pipeline
- Language: Jupyter Notebook
- Homepage:
- Size: 50.8 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Email Classification using NLP
This project focuses on classifying emails into spam and non-spam categories using Natural Language Processing (NLP) techniques. We'll preprocess the text data, visualize the label distribution, perform feature engineering, and train machine learning models for classification.
## Dataset
The dataset used for this project can be found on Kaggle: [Email Classification - NLP](https://www.kaggle.com/datasets/datatattle/email-classification-nlp)
### Columns
1. **Message Body**: Contains the email content.
2. **Label**: Indicates whether the email is spam or non-spam.## Tasks
### Task 1: Text Cleaning and Preprocessing
- Load the training and testing datasets.
- Check for missing values and remove them if any.
- Convert all text to lowercase.
- Remove stop words.
- Remove punctuation.
- Perform stemming or lemmatization.### Task 2: Data Visualization
- Visualize the distribution of the labels in the training dataset using a histogram, bar chart, or pie chart.
### Task 3: Feature Engineering
- Apply text representation techniques:
- Bag of words
- TF-IDF### Task 4: Model Training
- Train SVM model after applying Bag of Words.
- Train SVM model after applying TF-IDF.
- Train Random Forest model after applying Bag of Words.
- Train Random Forest model after applying TF-IDF.## Evaluation Metrics
Evaluate the performance of the trained models on the testing dataset using the following metrics:
- Accuracy
- Precision
- Recall