https://github.com/vaneeza-7/spam-detection-using-feedforwardnn-and-word2vec

Spam Detection using 2 layer deep feed forward neural network and word2vec embeddings
https://github.com/vaneeza-7/spam-detection-using-feedforwardnn-and-word2vec

Last synced: 8 months ago
JSON representation

Spam Detection using 2 layer deep feed forward neural network and word2vec embeddings

Host: GitHub
URL: https://github.com/vaneeza-7/spam-detection-using-feedforwardnn-and-word2vec
Owner: Vaneeza-7
License: mit
Created: 2025-02-17T21:25:18.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-02-17T21:30:49.000Z (8 months ago)
Last Synced: 2025-02-17T22:27:54.332Z (8 months ago)
Language: Jupyter Notebook
Homepage:
Size: 1.28 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# **Spam Detection Using a Feed-Forward Neural Network with Word Embedding**

## **Overview**
This project implements a **two-layer feed-forward neural network** for **spam detection** using **custom Word2Vec embeddings**. The model classifies text messages as **spam or not spam** based on learned word representations.

## **Dataset**
- **Source**: [Spam or Not Spam Dataset](https://www.kaggle.com/datasets/ozlerhakan/spam-or-not-spam-dataset?resource=download)
- **Task**: Classify messages as **spam (1)** or **not spam (0)**

## **Tasks**

### 🛠 **1. Data Preprocessing**
- Balance the dataset and clean text data for model training.

### 📖 **2. Model Training**

#### 🔡 **a. Word2Vec Embeddings**
- Implement **Word2Vec from scratch** using **logistic regression with negative sampling**.
- Each word is represented as a **10-dimensional embedding**.

#### 🧠 **b. Neural Network Architecture**
- **Input**: 12 words per message (each as a 10-dimensional vector).
- **Hidden Layer**: 2 nodes, each with a size of 8.
- **Output Layer**: 1 node for binary classification.
- **Training**: Implement **forward & backward pass, stochastic gradient descent (SGD)** without using external libraries.

### 📊 **3. Model Evaluation**
- Evaluate performance using **accuracy, precision, recall, and F1-score**.
- Visualize model performance using a **confusion matrix**.

🚀 **This project demonstrates an end-to-end spam detection pipeline, from word embeddings to deep learning classification (from scratch)!**

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vaneeza-7/spam-detection-using-feedforwardnn-and-word2vec

Awesome Lists containing this project

README