Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dpb44/exploring-the-intuition-of-neural-networks-on-a-classification-problem-using-only-numpy

Implementing a softmax-based neural network from scratch using NumPy to classify the Iris dataset, leveraging vectorization, gradient descent, and decision boundary visualization.
https://github.com/dpb44/exploring-the-intuition-of-neural-networks-on-a-classification-problem-using-only-numpy

deep-learning neural-network numpy softmax-classifier

Last synced: 10 days ago
JSON representation

Implementing a softmax-based neural network from scratch using NumPy to classify the Iris dataset, leveraging vectorization, gradient descent, and decision boundary visualization.

Host: GitHub
URL: https://github.com/dpb44/exploring-the-intuition-of-neural-networks-on-a-classification-problem-using-only-numpy
Owner: dpb44
Created: 2025-02-10T21:32:04.000Z (11 days ago)
Default Branch: main
Last Pushed: 2025-02-10T22:02:10.000Z (11 days ago)
Last Synced: 2025-02-10T22:31:13.167Z (11 days ago)
Topics: deep-learning, neural-network, numpy, softmax-classifier
Language: Jupyter Notebook
Homepage:
Size: 0 Bytes
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Exploring the Intuition of Neural Networks on a Classification Problem Using Only NumPy

## Overview

This project explores the intuition behind neural networks for multiclass classification using **only NumPy**, without high-level frameworks like TensorFlow or PyTorch. The goal is to classify the three **Iris species**—Setosa, Versicolor, and Virginica—based on petal and sepal measurements. We built a single-layer neural network using softmax activation, cross-entropy loss, and gradient descent to optimize model parameters.

![Image](https://github.com/user-attachments/assets/2fb3559c-e3c1-4e36-ba4c-fa77c3e3a221)

### Key Features:

- **Softmax activation** for multi-class classification.

- **Cross-entropy loss function** for model optimization.

- **Gradient Descent- Backpropagation** to update model parameters.

- **Vectorization and broadcasting** for computational efficiency.

- **Decision boundary visualization** to analyze model predictions.

## Dataset

The dataset consists of **150 samples**, each with **four numerical features**:

- **Sepal Length**

- **Sepal Width**

- **Petal Length**

- **Petal Width**

Each sample belongs to one of three classes:

- **Setosa (0)**

- **Versicolor (1)**

- **Virginica (2)**

These features are represented as $X$ in matrix form:

$$

X \in \mathbb{R}^{m \times n_x}

$$ 

where $m = 150$ (50 samples per species) and $n_x = 4$(features per sample).

```python

# Load the dataset using sklearn

from sklearn.datasets import load_iris

iris = load_iris()

X, y = iris.data, iris.target

```

## One-Hot Encoding

Since we are dealing with a multi-class classification problem, we convert categorical labels into **one-hot encoded vectors**.

```python

import numpy as np

m, K = y.shape[0], 3  # 3 classes

y_one_hot = np.zeros((m, K))

y_one_hot[np.arange(m), y] = 1

```

This transforms each label into a vector where only the corresponding class index is set to 1.

## Model Architecture

We use a **single-layer feed-forward neural network** with softmax activation.

### 1. Softmax Function

Since we have three distinct classes, we use the **softmax function** instead of the sigmoid function. The softmax function is given by:

$$

g_k(\boldsymbol{t}) = \frac{e^{t_k}}{\sum_{j=1}^{K} e^{t_j}}

$$

where $\boldsymbol{t} = (t_k)_{k=1}^K$ represents the unnormalized class scores.

This function converts raw scores into probabilites.

```python

# Compute softmax activation

Z = np.dot(W.T, X) + b

numerator = np.exp(Z)

denominator = np.sum(numerator, axis=0, keepdims=True)

y_hat = (numerator / denominator).T

```

### 2. Cross-Entropy Loss Function

The loss function quantifies the difference between predicted and true labels:

$$

\mathcal{J}(\boldsymbol{W},\boldsymbol{b}) = -\frac{1}{m} \sum_{i=1}^m \sum_{j=1}^K \mathbf{y}_j^{(i)} \log(\widehat{y}^{(i)}_j)

$$

```python

# Compute loss

loss = np.sum(y_one_hot * np.log(y_hat), axis=1)

total_cost = - (1/m) * np.sum(loss)

```

### 3. Backpropagation Gradient Descent for Optimization

Using backpropagation, we compute gradients for weights and bias updates

$$

\nabla_{\boldsymbol{W}} \mathcal{J}(\boldsymbol{W},\boldsymbol{b}) = \frac{1}{m} (\widehat{\boldsymbol{Y}} - \mathbf{Y}) X^\top

$$

$$

\nabla_{\boldsymbol{b}} \mathcal{J}(\boldsymbol{W},\boldsymbol{b}) = \frac{1}{m} \sum_{i=1}^{m} (\widehat{\boldsymbol{y}}^{(i)} - \mathbf{y}^{(i)})

$$

We update parameters iteratively using:

$$

\boldsymbol{W} := \boldsymbol{W} - \alpha \nabla_{\boldsymbol{W}} \mathcal{J}(\boldsymbol{W},\boldsymbol{b})

$$

$$

\boldsymbol{b} := \boldsymbol{b} - \alpha \nabla_{\boldsymbol{b}} \mathcal{J}(\boldsymbol{W},\boldsymbol{b})

$$

![Image](https://github.com/user-attachments/assets/b30133d2-bc34-4c81-adf4-c40d6f2c35ea)

*Note: This image is only to show how the training updates the parameters though back propagation. It is not representative of the single-layer feed-forward neural network we have built.*

```python

# Compute gradients

W_grad = np.dot((y_hat - y_one_hot), X) / m

b_grad = np.sum((y_hat - y_one_hot), axis=1) / m

```

## Training the Model

We train the model using **gradient descent** over multiple iterations.

```python

# Training loop

costs = []

for i in range(iters):

    y_hat = p_model(X, W, b)

    cost = compute_cost(y, y_hat)

    W_grad, b_grad = compute_gradients(X, y, W, b)

    W -= lr * W_grad

    b -= lr * b_grad

    if i % 100 == 0:

        costs.append(cost)

        print(f"Cost after iteration {i}: {cost:.4f}")

```

## Model Evaluation & Results

We tested three feature sets:

| Feature Set        | Accuracy |

|------------------|----------|

| Petal Measurements | **96%**  |

| Sepal Measurements | **75%**  |

| Both Features      | **98%**  |

### Observations:

- **Petal measurements alone** perform better than **sepal measurements alone**.

- **Using both features gives the highest accuracy (98%)**.

- The **decision boundary** was influenced by the number of training iterations and learning rate.

## Decision Boundary Visualization

To visualize how the model classifies new data, we plot the decision boundary.

```python

import matplotlib.pyplot as plt

from matplotlib.colors import ListedColormap

x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5

y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5

xx, yy = np.meshgrid(np.linspace(x_min, x_max, 300),

                     np.linspace(y_min, y_max, 300))

y_pred = p_model(np.c_[xx.ravel(), yy.ravel()], W_trained, b_trained)

y_pred = np.argmax(y_pred, axis=1).reshape(xx.shape)

plt.contourf(xx, yy, y_pred, alpha=0.3, cmap=ListedColormap(['lightgreen', 'pink', 'coral']))

plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k')

plt.title("Decision Boundary")

plt.show()

```

  ![Image](https://github.com/user-attachments/assets/8cba0635-a9e4-481f-bc33-4d3db460e1ee)

  ![Image](https://github.com/user-attachments/assets/ae4dfc0b-54d1-41f1-9c39-c326301eeede)

### Decision Boundary Analysis:

- **Petal-only features**: Forms **well-defined decision regions** due to strong separability.

- **Sepal-only features**: The model perfoems poorly and is not able to form well-defined boundaires.

## Conclusion

- **Petal measurements provide a stronger predictive signal than sepal measurements**.

- **Gradient descent, softmax activation, and cross-entropy loss optimize the model effectively**.

- **Vectorization and broadcasting improve computational efficiency**.

- **Decision boundaries improve with more training iterations and proper hyperparameter tuning for the petal measurements**.

This project serves as a **minimal yet powerful demonstration** of how a neural network can be implemented from scratch, reinforcing mathematical intuition behind classification tasks.

---

## References

- [Iris Dataset - UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/iris)

- [Softmax Regression - Stanford CS229](https://cs229.stanford.edu/)

- [Medium Article by Srija Neogi - Exploring Multi-Class Classification using Deep Learning](https://medium.com/@srijaneogi31/exploring-multi-class-classification-using-deep-learning-cd3134290887)

- [Medium Article by LM Po - Backpropagation: The Backbone of Neural Network Training (Back Propagation Image)](https://medium.com/@lmpo/backpropagation-the-backbone-of-neural-network-training-64946d6c3ae5)

---

## **Credits & Acknowledgments**  

This coursework was completed under the guidance of **Ms. Tatiana Bubba** (Mathematics Professor).