Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/adrijadastidar/spam-detection
https://github.com/adrijadastidar/spam-detection
decision-tree jupyter-notebook k-nearest-neighbours logistic-regression python random-forest support-vector-machine tf-idf
Last synced: 4 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/adrijadastidar/spam-detection
- Owner: AdrijaDastidar
- Created: 2024-11-08T09:49:50.000Z (12 days ago)
- Default Branch: main
- Last Pushed: 2024-11-08T10:22:46.000Z (12 days ago)
- Last Synced: 2024-11-08T10:41:36.814Z (12 days ago)
- Topics: decision-tree, jupyter-notebook, k-nearest-neighbours, logistic-regression, python, random-forest, support-vector-machine, tf-idf
- Language: Jupyter Notebook
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 📧 Spam Detection using Machine Learning
![Spam Detection](https://img.shields.io/badge/Spam-Detection-brightgreen.svg) ![Jupyter Notebook](https://img.shields.io/badge/Jupyter-Notebook-orange.svg) ![Python](https://img.shields.io/badge/Python-3.8%2B-blue.svg)
## 📜 Table of Contents
- [📝 Introduction](#-introduction)
- [📊 Dataset](#-dataset)
- [🛠️ Installation](#️-installation)
- [📁 Project Structure](#-project-structure)
- [🚀 Usage](#-usage)
- [📈 Results](#-results)
- [🛠️ Technologies Used](#️-technologies-used)
- [📧 Contact](#-contact)## 📝 Introduction
This project is a **Spam Detection** system that classifies emails as **Spam** or **Ham** using various Machine Learning algorithms. The goal is to filter out spam messages effectively by leveraging Natural Language Processing (NLP) techniques.
## 📊 Dataset
The dataset used in this project contains email messages labeled as either **Spam** or **Ham**. The dataset has two columns:
- **Category**: Indicates whether the email is 'spam' or 'ham'
- **Message**: The content of the emailThe dataset can be downloaded using the following [Google Drive link](https://drive.google.com/uc?id=1PWL9JWCTa6a2N6TffUObhcl6TuLcwgI4).
## 🛠️ Installation
To run this project locally, you need to have Python and Jupyter Notebook installed.
### Step 1: Clone the repository
```bash
git clone https://github.com/yourusername/spam-detection.git
cd spam-detection
```### Step 2: Install the required libraries
Open your terminal and run:
```bash
pip install numpy pandas scikit-learn gdown
```### Step 3: Launch Jupyter Notebook
```bash
jupyter notebook
```Then, open the `spam_detection.ipynb` file.
## 📁 Project Structure
```
spam-detection/
├── spam_detection.ipynb # Jupyter Notebook with the complete code
└── README.md # Project documentation
```## 🚀 Usage
1. **Open the Jupyter Notebook**: After cloning the repository, navigate to the folder and open the `spam_detection.ipynb` file.
2. **Run the cells sequentially**: The notebook is structured to guide you through data loading, preprocessing, model training, evaluation, and predictions.
3. **Predict on custom emails**: Modify the `input_mail` list in the last section of the notebook to classify your custom emails:
```python
input_mail = [
"Hello, how are you doing today?",
"Congratulations! You've won a $1000 gift card. Claim now!"
]
```## 📈 Results
The following models were tested, and their performance is summarized below:
| Model | Test Accuracy | Precision | Recall | F1-Score |
|------------------------|---------------|-----------|--------|----------|
| Logistic Regression | 98.7% | 97.5% | 98.0% | 97.7% |
| Random Forest | 97.8% | 96.8% | 97.0% | 96.9% |
| K-Nearest Neighbors | 95.3% | 94.1% | 93.5% | 93.8% |
| Decision Tree | 96.2% | 95.5% | 95.0% | 95.2% |
| Support Vector Machine | 98.0% | 96.7% | 97.3% | 97.0% |## 🛠️ Technologies Used
- **Python 3.8+**
- **Jupyter Notebook**
- **Pandas** for data manipulation
- **NumPy** for numerical operations
- **Scikit-learn** for machine learning models
- **TfidfVectorizer** for text feature extraction## 📧 Contact
Feel free to reach out for any questions or suggestions:
- Email: [email protected]---
To fix the issue with the Table of Contents links not working, you need to adjust the links to match GitHub's Markdown anchor link format. GitHub automatically converts headings into anchor links, typically by making them lowercase and replacing spaces with hyphens. Special characters are also removed.
Here's the updated `README.md` with corrected anchor links:
---