Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/chandkund/sms-spam-detection
The goal is to develop a classification model that can accurately differentiate between spam and non-spam messages. This is crucial for applications like email filtering, SMS spam detection, and improving overall user experience by reducing the influx of unwanted or malicious content.
https://github.com/chandkund/sms-spam-detection
matplotlib nlp-machine-learning numpy pandas seaborn stemming tfidf-vectorizer tokenization
Last synced: 19 days ago
JSON representation
The goal is to develop a classification model that can accurately differentiate between spam and non-spam messages. This is crucial for applications like email filtering, SMS spam detection, and improving overall user experience by reducing the influx of unwanted or malicious content.
- Host: GitHub
- URL: https://github.com/chandkund/sms-spam-detection
- Owner: chandkund
- License: mit
- Created: 2024-09-17T14:37:13.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-09-17T14:49:32.000Z (2 months ago)
- Last Synced: 2024-10-31T13:04:23.135Z (19 days ago)
- Topics: matplotlib, nlp-machine-learning, numpy, pandas, seaborn, stemming, tfidf-vectorizer, tokenization
- Language: Jupyter Notebook
- Homepage:
- Size: 772 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SMS-Spam-Detection
## Project OverviewThis project focuses on detecting spam messages from a dataset that contains text data labeled as either "spam" or "ham" (non-spam). The goal is to develop a classification model that can accurately differentiate between spam and non-spam messages. This is crucial for applications like email filtering, SMS spam detection, and improving overall user experience by reducing the influx of unwanted or malicious content.
### Dataset
The dataset used for this project consists of five columns:
1. **v1**: Indicates whether the message is "ham" (non-spam) or "spam".
2. **v2**: Contains the text message.
3. **Unnamed: 2**, **Unnamed: 3**, **Unnamed: 4**: These columns are empty or contain irrelevant data and will be ignored in the analysis.Sample data:
```
0 ham Go until jurong point, crazy.. Available only ...
1 ham Ok lar... Joking wif u oni...
2 spam Free entry in 2 a wkly comp to win FA Cup fina...
3 ham U dun say so early hor... U c already then say...
4 ham Nah I don't think he goes to usf, he lives aro...
```# Spam Detection Project
## Overview
This project aims to build a machine learning model that detects whether a given message is spam or not. The dataset contains labeled messages as either "ham" (non-spam) or "spam". By leveraging natural language processing (NLP) techniques, this project strives to build a robust classifier that can automatically filter out spam messages.## Dataset
The dataset consists of 5 columns:
- **v1**: Spam or Ham (Target)
- **v2**: Message text
- **Unnamed: 2**, **Unnamed: 3**, **Unnamed: 4**: Unused or irrelevant data.
Sample messages:
- "ham": Go until jurong point, crazy.. Available only in bugis n great world...
- "spam": Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005...## Installation
Clone the repository and install the required dependencies:
```bash
git clone https://github.com/chandkund/SMS-Spam-Detection.git
cd SMS-Spam-Detection
pip install -r requirements.txt
```## Usage
1. Preprocess the data:
- Remove irrelevant columns and missing values.
- Tokenize and vectorize the text data using methods like TF-IDF.
2. Train the classifier:
- Use machine learning algorithms such as Naive Bayes or Logistic Regression.
- Evaluate the model's performance using metrics like accuracy and F1 score.
3. Predict spam messages:
```bash
python predict.py --message "Free entry to win cash prizes!"
```## License
This project is licensed under the MIT License.