An open API service indexing awesome lists of open source software.

https://github.com/vaneeza-7/spam-detection-using-feedforwardnn-and-word2vec

Spam Detection using 2 layer deep feed forward neural network and word2vec embeddings
https://github.com/vaneeza-7/spam-detection-using-feedforwardnn-and-word2vec

Last synced: 8 months ago
JSON representation

Spam Detection using 2 layer deep feed forward neural network and word2vec embeddings

Awesome Lists containing this project

README

          

# **Spam Detection Using a Feed-Forward Neural Network with Word Embedding**

## **Overview**
This project implements a **two-layer feed-forward neural network** for **spam detection** using **custom Word2Vec embeddings**. The model classifies text messages as **spam or not spam** based on learned word representations.

## **Dataset**
- **Source**: [Spam or Not Spam Dataset](https://www.kaggle.com/datasets/ozlerhakan/spam-or-not-spam-dataset?resource=download)
- **Task**: Classify messages as **spam (1)** or **not spam (0)**

## **Tasks**

### 🛠 **1. Data Preprocessing**
- Balance the dataset and clean text data for model training.

### 📖 **2. Model Training**

#### 🔡 **a. Word2Vec Embeddings**
- Implement **Word2Vec from scratch** using **logistic regression with negative sampling**.
- Each word is represented as a **10-dimensional embedding**.

#### 🧠 **b. Neural Network Architecture**
- **Input**: 12 words per message (each as a 10-dimensional vector).
- **Hidden Layer**: 2 nodes, each with a size of 8.
- **Output Layer**: 1 node for binary classification.
- **Training**: Implement **forward & backward pass, stochastic gradient descent (SGD)** without using external libraries.

### 📊 **3. Model Evaluation**
- Evaluate performance using **accuracy, precision, recall, and F1-score**.
- Visualize model performance using a **confusion matrix**.

🚀 **This project demonstrates an end-to-end spam detection pipeline, from word embeddings to deep learning classification (from scratch)!**