https://github.com/a-solo/sms_spam_detection_using_nlp
A supervised deep learning project using Natural Language Processing(NLP) that classifies a given SMS as a spam/ham message.
https://github.com/a-solo/sms_spam_detection_using_nlp
machine-learning nlp nltk-tokenizer pandas sklearn spam-detection
Last synced: 3 months ago
JSON representation
A supervised deep learning project using Natural Language Processing(NLP) that classifies a given SMS as a spam/ham message.
- Host: GitHub
- URL: https://github.com/a-solo/sms_spam_detection_using_nlp
- Owner: A-SOLO
- License: apache-2.0
- Created: 2024-08-29T19:36:43.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-08-29T21:32:36.000Z (10 months ago)
- Last Synced: 2025-04-07T14:48:00.386Z (3 months ago)
- Topics: machine-learning, nlp, nltk-tokenizer, pandas, sklearn, spam-detection
- Language: Jupyter Notebook
- Homepage:
- Size: 608 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SMS_Spam_Detection_using_NLP
A supervised deep learning project using Natural Language Processing(NLP) that classifies a given SMS as a spam/ham message.
### Dependencies/Libraries used:
* Pandas
* Numpy
* Seaborn
* Sklearn
* NLTK
* Pickle
* Streamlit### Input data:
Dataset - [https://www.kaggle.com/datasets/bagavathypriya/spam-ham-dataset](https://www.kaggle.com/datasets/bagavathypriya/spam-ham-dataset)### Building Model:
* ### Imbalanced Data
* ### Visualizing the trend of number of characters, number of words and number of sentences in a spam/ ham SMS
* ### Plotting the trend as a Heatmap
* ### Most common words in a Spam Corpus
* ### Performance of Various Models
# Results:
* Applied NLP techniques like tokenization, lemmatization, and stop words and punctuation removal using NLTK and regex.
* Performed feature engineering with handcrafted features such as digit count and email length.
* Implemented various classification models, Naive Bayes, SVC, ETC to find the best performer.
* Created an ensemble model, improving the accuracy from 97.87% to 98.25%.
* Designed & deployed a basic UI with Streamlit for classifying new inputs as spam or ham.# Top Performing Model
Combined Support Vector Classifier, Multinomial Naive Bayes and Extra Trees Classifier to build an Stacking Classifier Model( Ensemble Model) whose:* ### Accuracy: 0.9825918762088974
* ### Precision: 0.9736842105263158# Testing Examples:
* Example 1:* Example 2:
* Example 3: