https://github.com/ksdkamesh99/spam-classifier

A Natural Language Processing with SMS Data to predict whether the SMS is Spam/Ham with various ML Algorithms like multinomial-naive-bayes,logistic regression,svm,decision trees to compare accuracy and using various data cleaning and processing techniques like PorterStemmer,CountVectorizer,TFIDF Vetorizer,WordnetLemmatizer. It is implemented using LSTM and Word Embeddings to gain accuracy of 97.84%.
https://github.com/ksdkamesh99/spam-classifier

bag-of-words count-vectorizer decision-tree-classifier embeddings logistic-regression lstm-neural-networks multinomial-naive-bayes naive-bayes-classifier porter-stemmer sms-spam-detection support-vector-machines tfidf-vectorizer wordnetlemmatizer

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/ksdkamesh99/spam-classifier
Owner: ksdkamesh99
License: mit
Created: 2020-05-26T18:12:20.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2020-12-25T03:18:36.000Z (over 4 years ago)
Last Synced: 2025-04-20T13:36:56.660Z (2 months ago)
Topics: bag-of-words, count-vectorizer, decision-tree-classifier, embeddings, logistic-regression, lstm-neural-networks, multinomial-naive-bayes, naive-bayes-classifier, porter-stemmer, sms-spam-detection, support-vector-machines, tfidf-vectorizer, wordnetlemmatizer
Language: Jupyter Notebook
Homepage:
Size: 510 KB
Stars: 15
Watchers: 2
Forks: 11
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Spam-Classifier

[![forthebadge](https://forthebadge.com/images/badges/built-with-love.svg)](https://forthebadge.com)

[![forthebadge](https://forthebadge.com/images/badges/made-with-python.svg)](https://forthebadge.com)[![forthebadge](https://forthebadge.com/images/badges/its-not-a-lie-if-you-believe-it.svg)](https://forthebadge.com)

[![forthebadge](https://forthebadge.com/images/badges/built-by-developers.svg)](https://forthebadge.com)



  

    

  



## 📌 Introduction:-

A Natural Language Processing with SMS Data to predict whether the SMS is Spam/Ham with various ML Algorithms like multinomial-naive-bayes,logistic regression,svm,decision trees to compare accuracy and using various data cleaning and processing techniques like PorterStemmer,CountVectorizer,TFIDF Vetorizer,WordnetLemmatizer.

It is implemented using LSTM and Word Embeddings to gain accuracy of 97.84%.

## ✔❌Accuracy ❌✔:-

| Text Preprocessing Type              | Logistic Regression | Multinomial NB | Support Vector Machine  | Decision Tree |

|--------------------------------------|---------------------|----------------|-------------------------|---------------|

| TFIDF Vectorizer + PorterStemmer     | 96.68%              | 97.30%         | 98.47%                  | 96.68%        |

| CountVectorizer + PorterStemmer      | 98.65%              | 98.56%         | 98.74%                  | 97.84%        |

| CountVectorizer + WordnetLemmatizer  | 98.56%              | 98.29%         | 98.38%                  | 97.75%        |

| TFIDF Vectorizer + WordnetLemmatizer | 96.41%              | 97.48%         | 98.47%                  | 96.86%        |

## WorkFlow:-

![Workflow of SMS spam Classifer](workflow.gif)

## 🏁 Datasets Used:-

* The dataset used is SMS Spam Dataset created by UCI Machine Learning.This dataset is downloaded in kaggle.You can download it [here](https://www.kaggle.com/uciml/sms-spam-collection-dataset/download).

* Reference for this dataset can be found [here](http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/)

## 📧Contact:-

For any kind of suggesstions/ help in models code Please mail me at [email protected].

## 📜 LICENSE

[MIT](https://github.com/ksdkamesh99/Spam-Classifier/blob/master/LICENSE)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ksdkamesh99/spam-classifier

Awesome Lists containing this project

README