https://github.com/ahmed-ai-01/nlp

this repo will include all my work regarding NLP
https://github.com/ahmed-ai-01/nlp

kaggle kaggle-dataset nlp nlp-deep-learning nlp-library nlp-machine-learning nltk nltk-library nltk-python pandas pandas-python sklearn sklearn-classify sklearn-library sklearn-metrics sklearn-pipeline

Last synced: 6 months ago
JSON representation

this repo will include all my work regarding NLP

Host: GitHub
URL: https://github.com/ahmed-ai-01/nlp
Owner: Ahmed-AI-01
License: mit
Created: 2024-02-14T00:20:49.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-02-14T00:23:59.000Z (over 1 year ago)
Last Synced: 2025-02-12T07:56:02.387Z (8 months ago)
Topics: kaggle, kaggle-dataset, nlp, nlp-deep-learning, nlp-library, nlp-machine-learning, nltk, nltk-library, nltk-python, pandas, pandas-python, sklearn, sklearn-classify, sklearn-library, sklearn-metrics, sklearn-pipeline
Language: Jupyter Notebook
Homepage:
Size: 50.8 KB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Email Classification using NLP

This project focuses on classifying emails into spam and non-spam categories using Natural Language Processing (NLP) techniques. We'll preprocess the text data, visualize the label distribution, perform feature engineering, and train machine learning models for classification.

## Dataset

The dataset used for this project can be found on Kaggle: [Email Classification - NLP](https://www.kaggle.com/datasets/datatattle/email-classification-nlp)

### Columns

1. **Message Body**: Contains the email content.
2. **Label**: Indicates whether the email is spam or non-spam.

## Tasks

### Task 1: Text Cleaning and Preprocessing

- Load the training and testing datasets.
- Check for missing values and remove them if any.
- Convert all text to lowercase.
- Remove stop words.
- Remove punctuation.
- Perform stemming or lemmatization.

### Task 2: Data Visualization

- Visualize the distribution of the labels in the training dataset using a histogram, bar chart, or pie chart.

### Task 3: Feature Engineering

- Apply text representation techniques:
- Bag of words
- TF-IDF

### Task 4: Model Training

- Train SVM model after applying Bag of Words.
- Train SVM model after applying TF-IDF.
- Train Random Forest model after applying Bag of Words.
- Train Random Forest model after applying TF-IDF.

## Evaluation Metrics

Evaluate the performance of the trained models on the testing dataset using the following metrics:
- Accuracy
- Precision
- Recall

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ahmed-ai-01/nlp

Awesome Lists containing this project

README