Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/shanmukhsrisaivedullapalli/smsspamclassification

SMSSpamClassification is a machine learning project aimed at accurately classifying SMS messages as either spam or ham (non-spam). It employs natural language processing techniques to extract relevant features from the text data and utilizes various classification algorithms to build a robust spam detection model.
https://github.com/shanmukhsrisaivedullapalli/smsspamclassification

jupyter-notebook numpy pandas pickle python3 sklearn spam-classification spam-detection

Last synced: 6 days ago
JSON representation

SMSSpamClassification is a machine learning project aimed at accurately classifying SMS messages as either spam or ham (non-spam). It employs natural language processing techniques to extract relevant features from the text data and utilizes various classification algorithms to build a robust spam detection model.

Awesome Lists containing this project

README

        

## GitHub Repository: SMSSpamClassification

**SMSSpamClassification** is a machine learning project focused on categorizing SMS messages as spam or ham (non-spam). Leveraging the SMSSpamCollection dataset, the project employs Natural Language Processing (NLP) techniques, specifically TF-IDF vectorization, to extract meaningful features from the text data. A Logistic Regression model is trained on these features to build a robust spam detection classifier.

This repository houses the code for an SMS spam classification project. It encompasses data preprocessing, feature engineering using TF-IDF, model training with Logistic Regression, and model evaluation.

**Project Structure**

```
SMSSpamClassification/
├── data/
│ └── SMSSpamCollection.csv
├── models/
│ ├── feature_extraction.pkl
│ └── spam_detection_model.pkl
├── notebooks/
│ └── SMSSpamClassification.ipynb
├── requirements.txt
└── README.md
```

**Data**

* The dataset utilized for this project is the publicly accessible SMS Spam Collection dataset.
* Raw data is stored in the `data` directory.

**Notebooks**

* **SMSSpamClassification.ipynb**: Contains the entire workflow, including data exploration, preprocessing, feature extraction using TF-IDF, model training with Logistic Regression, and model evaluation.

**Models**

* **feature_extraction.pkl**: Saved TF-IDF vectorizer for future use.
* **spam_detection_model.pkl**: Trained Logistic Regression model for spam classification.

**requirements.txt**: Lists necessary Python libraries for project execution.

**Installation**

1. Clone the repository:
```bash
git clone https://github.com/shanmukhsrisaivedullapalli/SMSSpamClassification.git
```
2. Create a virtual environment:
```bash
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
```
3. Install dependencies:
```bash
pip install -r requirements.txt
```

**Usage**

1. Run the `SMSSpamClassification.ipynb` notebook to execute the entire project workflow.
2. The trained model and feature extractor are saved for potential future use.

**Contributing**

Contributions are welcome! You can enhance the project by:

* Implementing different NLP techniques or feature engineering methods.
* Experimenting with various classification algorithms.
* Improving model performance through hyperparameter tuning.
* Enhancing the project's documentation.