Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/adrijadastidar/spam-detection
https://github.com/adrijadastidar/spam-detection
decision-tree jupyter-notebook k-nearest-neighbours logistic-regression python random-forest support-vector-machine tf-idf
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/adrijadastidar/spam-detection
- Owner: AdrijaDastidar
- Created: 2024-11-08T09:49:50.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-11-08T10:22:46.000Z (3 months ago)
- Last Synced: 2024-11-16T12:25:52.083Z (3 months ago)
- Topics: decision-tree, jupyter-notebook, k-nearest-neighbours, logistic-regression, python, random-forest, support-vector-machine, tf-idf
- Language: Jupyter Notebook
- Homepage:
- Size: 12.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 📧 Spam Detection using Machine Learning
  
## 📜 Table of Contents
- [📝 Introduction](#-introduction)
- [📊 Dataset](#-dataset)
- [🛠️ Installation](#️-installation)
- [📁 Project Structure](#-project-structure)
- [🚀 Usage](#-usage)
- [📈 Results](#-results)
- [🛠️ Technologies Used](#️-technologies-used)
- [📧 Contact](#-contact)## 📝 Introduction
This project is a **Spam Detection** system that classifies emails as **Spam** or **Ham** using various Machine Learning algorithms. The goal is to filter out spam messages effectively by leveraging Natural Language Processing (NLP) techniques.
## 📊 Dataset
The dataset used in this project contains email messages labeled as either **Spam** or **Ham**. The dataset has two columns:
- **Category**: Indicates whether the email is 'spam' or 'ham'
- **Message**: The content of the emailThe dataset can be downloaded using the following [Google Drive link](https://drive.google.com/uc?id=1PWL9JWCTa6a2N6TffUObhcl6TuLcwgI4).
## 🛠️ Installation
To run this project locally, you need to have Python and Jupyter Notebook installed.
### Step 1: Clone the repository
```bash
git clone https://github.com/yourusername/spam-detection.git
cd spam-detection
```### Step 2: Install the required libraries
Open your terminal and run:
```bash
pip install numpy pandas scikit-learn gdown
```### Step 3: Launch Jupyter Notebook
```bash
jupyter notebook
```Then, open the `spam_detection.ipynb` file.
## 📁 Project Structure
```
spam-detection/
├── spam_detection.ipynb # Jupyter Notebook with the complete code
└── README.md # Project documentation
```## 🚀 Usage
1. **Open the Jupyter Notebook**: After cloning the repository, navigate to the folder and open the `spam_detection.ipynb` file.
2. **Run the cells sequentially**: The notebook is structured to guide you through data loading, preprocessing, model training, evaluation, and predictions.
3. **Predict on custom emails**: Modify the `input_mail` list in the last section of the notebook to classify your custom emails:
```python
input_mail = [
"Hello, how are you doing today?",
"Congratulations! You've won a $1000 gift card. Claim now!"
]
```## 📈 Results
The following models were tested, and their performance is summarized below:
| Model | Test Accuracy | Precision | Recall | F1-Score |
|------------------------|---------------|-----------|--------|----------|
| Logistic Regression | 98.7% | 97.5% | 98.0% | 97.7% |
| Random Forest | 97.8% | 96.8% | 97.0% | 96.9% |
| K-Nearest Neighbors | 95.3% | 94.1% | 93.5% | 93.8% |
| Decision Tree | 96.2% | 95.5% | 95.0% | 95.2% |
| Support Vector Machine | 98.0% | 96.7% | 97.3% | 97.0% |## 🛠️ Technologies Used
- **Python 3.8+**
- **Jupyter Notebook**
- **Pandas** for data manipulation
- **NumPy** for numerical operations
- **Scikit-learn** for machine learning models
- **TfidfVectorizer** for text feature extraction## 📧 Contact
Feel free to reach out for any questions or suggestions:
- Email: [email protected]