https://github.com/gehad-ahmed30/natural-language-processing
This repository showcases a collection of practical NLP projects, ranging from sentiment analysis to spam detection. The implementations leverage both Machine Learning (ML) and Deep Learning (DL) approaches to explore various natural language processing tasks and techniques.
https://github.com/gehad-ahmed30/natural-language-processing
deep-learning lstm machine-learning naive-bayes nlp nltk preprocessing stopwords tokenization
Last synced: 8 months ago
JSON representation
This repository showcases a collection of practical NLP projects, ranging from sentiment analysis to spam detection. The implementations leverage both Machine Learning (ML) and Deep Learning (DL) approaches to explore various natural language processing tasks and techniques.
- Host: GitHub
- URL: https://github.com/gehad-ahmed30/natural-language-processing
- Owner: gehad-Ahmed30
- Created: 2025-06-28T16:18:13.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-07-04T15:30:52.000Z (11 months ago)
- Last Synced: 2025-09-19T08:07:40.449Z (9 months ago)
- Topics: deep-learning, lstm, machine-learning, naive-bayes, nlp, nltk, preprocessing, stopwords, tokenization
- Language: Jupyter Notebook
- Homepage:
- Size: 161 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Restaurant Reviews โ Sentiment Analysis (NLP Case Study)
This project performs sentiment analysis on restaurant reviews using Natural Language Processing (NLP) and machine learning. The goal is to classify whether a review is **positive (Liked = 1)** or **negative (Liked = 0)**.
---
## ๐ง Project Workflow
1. **Importing Data & Libraries**
2. **Text Preprocessing (NLTK):**
- Lowercasing
- Removing punctuation
- Removing stopwords
- Stemming (Porter Stemmer)
3. **Vectorization:**
- Using `CountVectorizer`
4. **Model Building:**
- Multinomial Naive Bayes Classifier
5. **Model Evaluation:**
- Accuracy, Confusion Matrix, Classification Report
6. **Model Saving:**
- Exported using `joblib`
---
## ๐ Dataset
- Source: [Kaggle โ Restaurant Reviews (TSV)](https://www.kaggle.com/datasets/maher3id/restaurant-reviewstsv)
- Format: `.tsv` file
- Records: 1000 reviews
- Columns:
- `Review` (text)
- `Liked` (binary label)
---
# ๐ง Spam Detection โ Deep Learning (NLP Case Study)
This project detects whether a given SMS message is **Spam** or **Not Spam (Ham)** using Deep Learning and NLP.
It applies preprocessing, tokenization, and an LSTM-based model to classify messages.
---
## ๐ง Project Workflow
### 1. ๐ฅ Importing Data & Libraries
- Pandas, NumPy
- NLTK for text preprocessing
- TensorFlow / Keras for DL model
- Matplotlib / Seaborn for visualization
---
### 2. ๐งน Data Cleaning & Preprocessing
Steps applied on the text:
- Lowercasing
- Removing punctuation & numbers
- Removing stopwords
- Tokenization
- Padding sequences
---
### 3. ๐๏ธ Dataset Info
- **Source**: [Kaggle โ SMS Spam Collection Dataset](https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset)
- **Size**: 5572 messages
- **Classes**:
- `ham` โ Not Spam
- `spam` โ Spam
---