Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/raghavendranhp/spamguard-intelligent-sms-filtering-system-
SpamGuard is an intelligent SMS filtering system designed to detect and filter spam messages using machine learning techniques.
https://github.com/raghavendranhp/spamguard-intelligent-sms-filtering-system-
bagofwords nlp textanalysis textpreprocessing tokenization
Last synced: about 2 months ago
JSON representation
SpamGuard is an intelligent SMS filtering system designed to detect and filter spam messages using machine learning techniques.
- Host: GitHub
- URL: https://github.com/raghavendranhp/spamguard-intelligent-sms-filtering-system-
- Owner: raghavendranhp
- License: mit
- Created: 2023-12-01T18:47:37.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-01-16T11:37:27.000Z (12 months ago)
- Last Synced: 2024-01-17T03:30:20.169Z (12 months ago)
- Topics: bagofwords, nlp, textanalysis, textpreprocessing, tokenization
- Language: Jupyter Notebook
- Homepage:
- Size: 271 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SpamGuard: Intelligent SMS Filtering System
SpamGuard is an intelligent SMS filtering system designed to detect and filter spam messages using machine learning techniques. It utilizes the SMS Spam Collection dataset from Kaggle and implements a Bag of Words approach along with the Multinomial Naive Bayes algorithm for efficient spam detection.
## Table of Contents
- [Introduction](#introduction)
- [Dataset](#dataset)
- [Implementation](#implementation)
- [Data Cleaning and Exploration](#data-cleaning-and-exploration)
- [Feature Engineering](#feature-engineering)
- [Text Preprocessing](#text-preprocessing)
- [Bag of Words in scikit-learn](#bag-of-words-in-scikit-learn)
- [Naive Bayes Model Training](#naive-bayes-model-training)
- [Model Evaluation](#model-evaluation)
- [Results](#results)
- [Usage](#usage)
- [Dependencies](#dependencies)
- [License](#license)## Introduction
SpamGuard is a spam detection system for SMS messages, developed using machine learning algorithms. The system achieves high accuracy and precision in identifying spam, making it a reliable tool for filtering unwanted messages.
## Dataset
The system uses the SMS Spam Collection dataset from Kaggle, containing a collection of spam and non-spam (ham) SMS messages. The dataset is preprocessed and explored to create a robust model.
## Implementation
### Data Cleaning and Exploration
- Loaded the dataset and removed unnecessary columns.
- Renamed columns for clarity.
- Explored basic statistics and visualizations of the data.### Feature Engineering
- Added a new feature "length" to represent the length of each SMS.
- Visualized the distribution of SMS lengths for both spam and ham messages.### Text Preprocessing
- Tokenized and preprocessed the text data using a Bag of Words approach.
- Created a frequency list of words for each document.### Bag of Words in scikit-learn
- Used scikit-learn's `CountVectorizer` to convert the text data into a matrix of token counts.
- Split the dataset into training and testing sets.### Naive Bayes Model Training
- Applied the Multinomial Naive Bayes algorithm to the training data.
### Model Evaluation
- Evaluated the model's performance using metrics such as accuracy, precision, recall, and F1 score.
## Results
The SpamGuard system achieved impressive results on the test set:
- Accuracy: 98.48%
- Precision: 94.20%
- Recall: 93.53%
- F1 Score: 93.86%## Usage
To use SpamGuard, follow these steps:
1. Install the required dependencies.
2. Run the provided Python script.## Dependencies
- numpy
- pandas
- nltk
- matplotlib
- seaborn
- scikit-learnInstall dependencies using:
```bash
pip install numpy pandas nltk matplotlib seaborn scikit-learn
```## License
This project is licensed under the [MIT License](LICENSE).