{"id":32747731,"url":"https://github.com/msmrexe/numpy-neural-network","last_synced_at":"2026-05-18T10:10:45.975Z","repository":{"id":321926465,"uuid":"1087658663","full_name":"msmrexe/numpy-neural-network","owner":"msmrexe","description":"A scratch-built NumPy implementation of a Fully Connected Neural Network, with a sequential model API, a variety of layers (Linear, ReLU, BatchNorm), loss functions (MSE, SoftmaxCrossEntropy), and a robust training `Solver` to create and train multi-layer perceptrons for both classification and regression.","archived":false,"fork":false,"pushed_at":"2025-11-01T12:13:57.000Z","size":53,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-01T13:20:06.850Z","etag":null,"topics":["backpropagation","batch-normalization","course-project","deep-learning","fully-connected-neural-network","neural-networks","nn-from-scratch","numpy","python","university-project"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/msmrexe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-01T11:25:04.000Z","updated_at":"2025-11-01T12:15:51.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/msmrexe/numpy-neural-network","commit_stats":null,"previous_names":["msmrexe/numpy-neural-network"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/msmrexe/numpy-neural-network","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msmrexe%2Fnumpy-neural-network","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msmrexe%2Fnumpy-neural-network/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msmrexe%2Fnumpy-neural-network/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msmrexe%2Fnumpy-neural-network/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/msmrexe","download_url":"https://codeload.github.com/msmrexe/numpy-neural-network/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msmrexe%2Fnumpy-neural-network/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33174091,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-18T09:27:30.708Z","status":"ssl_error","status_checked_at":"2026-05-18T09:27:28.300Z","response_time":71,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["backpropagation","batch-normalization","course-project","deep-learning","fully-connected-neural-network","neural-networks","nn-from-scratch","numpy","python","university-project"],"created_at":"2025-11-03T20:01:00.612Z","updated_at":"2026-05-18T10:10:45.969Z","avatar_url":"https://github.com/msmrexe.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Neural Networks from Scratch (NumPy)\n\nA modular deep learning library built from scratch using only NumPy. This project implements a sequential model API, a variety of layers (Linear, ReLU, BatchNorm), loss functions (MSE, SoftmaxCrossEntropy), and a robust training `Solver` to create and train multi-layer perceptrons for both classification and regression.\n\nThis project was developed for a Deep Learning course to demonstrate a foundational understanding of neural network mechanics, from forward propagation to backpropagation and optimization.\n\n## Features\n\n* **Object-Oriented Design:** A clean, \"PyTorch-like\" API with `Layer`, `Loss`, and `Sequential` base classes.\n* **Modular Layers:** Easily stack layers, including `Linear`, `ReLU`, `Sigmoid`, and `BatchNorm`.\n* **Robust Training:** A `Solver` class that handles all training, validation, and hyperparameter logic.\n* **Optimizers:** Includes `sgd` and `sgd_momentum` update rules.\n* **Versatile:** Capable of handling both `classification` (with `SoftmaxCrossEntropyLoss`) and `regression` (with `MSELoss`) tasks.\n* **Utilities:** Comes with data loaders for MNIST, Fashion-MNIST, and California Housing, plus a numerical gradient checker for debugging.\n\n## Core Concepts \u0026 Techniques\n\n* **Backpropagation:** All layer gradients are analytically derived and implemented from scratch.\n* **Batch Normalization:** Implemented as a layer with distinct `train` and `test` modes to stabilize training.\n* **Numerical Stability:** Uses a combined `SoftmaxCrossEntropyLoss` to prevent overflow/underflow issues.\n* **Modular Architecture:** The `Sequential` model is decoupled from the `Solver`, promoting clean code and reusability.\n* **Logging \u0026 CLI:** All training scripts use `argparse` for hyperparameter tuning and `logging` to save results to files.\n\n---\n\n## How It Works\n\nThis library is composed of several core modules that work together to train a network.\n\n### 1. Core Logic \u0026 Architecture\n\nThe project is built around two main components: the `Sequential` model and the `Solver`.\n\n* **`src/model.py` (`Sequential`):** This class acts as a container. You initialize it with a list of `Layer` objects and a `Loss` object. It is responsible for:\n    * Collecting all learnable parameters (weights, biases, gamma, beta) from its layers into a central `model.params` dictionary.\n    * Performing a full forward pass by calling `layer.forward()` sequentially.\n    * Performing a full backward pass by calling `layer.backward()` in reverse.\n    * Computing the total loss (data loss + regularization).\n\n* **`src/solver.py` (`Solver`):** This is the training engine. You give it the `model` and a `data` dictionary. It handles:\n    * The main training loop (epochs, iterations).\n    * Creating minibatches of data.\n    * Calling `model.compute_loss()` to get the loss and gradients.\n    * Calling the optimizer (e.g., `sgd_momentum`) to update every parameter in `model.params`.\n    * Tracking loss history, validation metrics, and saving the best model.\n\n### 2. Mathematical Foundations: Backpropagation\n\nOur network is built on **backpropagation**, which is a practical application of the chain rule from calculus. To update a weight `W`, we must find how the final `Loss` $L$ changes with respect to `W` (i.e., $\\frac{\\partial L}{\\partial W}$).\n\nFor a simple layer $y = f(x, W)$, the chain rule states:\n\n$$\\frac{\\partial L}{\\partial W} = \\frac{\\partial L}{\\partial y} \\cdot \\frac{\\partial y}{\\partial W}$$\n\nHere, $\\frac{\\partial L}{\\partial y}$ is the \"upstream gradient\" (coming from the *next* layer) and $\\frac{\\partial y}{\\partial W}$ is the \"local gradient\" (the derivative of the *current* layer). Each layer's `backward()` pass computes its local gradients, multiplies them by the upstream gradient, and passes the result $\\frac{\\partial L}{\\partial x}$ *downstream* to the previous layer.\n\n### 3. Core Implementations (The Math)\n\n#### Linear Layer\n* **Forward:** $y = xW + b$\n* **Backward:** The layer receives the upstream gradient $\\frac{\\partial L}{\\partial y}$ and computes three things:\n    * $\\frac{\\partial L}{\\partial W} = x^T \\cdot \\frac{\\partial L}{\\partial y}$ (Gradient for weights)\n    * $\\frac{\\partial L}{\\partial b} = \\sum \\frac{\\partial L}{\\partial y}$ (Gradient for biases)\n    * $\\frac{\\partial L}{\\partial x} = \\frac{\\partial L}{\\partial y} \\cdot W^T$ (Downstream gradient to pass to the next layer)\n\n#### ReLU Activation\n* **Forward:** $f(x) = \\max(0, x)$\n* **Backward:** The local gradient is a simple gate: it is $1$ if $x \u003e 0$ and $0$ otherwise. This means gradients only flow through neurons that were \"active\" during the forward pass.\n    * $\\frac{\\partial L}{\\partial x} = \\frac{\\partial L}{\\partial y} \\cdot (x \u003e 0)$\n\n#### Batch Normalization\n* **Forward (Train):** Normalizes activations within a batch $B$:\n    1.  $\\mu_B = \\frac{1}{m} \\sum_{i \\in B} x_i$ (Find batch mean)\n    2.  $\\sigma^2_B = \\frac{1}{m} \\sum_{i \\in B} (x_i - \\mu_B)^2$ (Find batch variance)\n    3.  $\\hat{x_i} = \\frac{x_i - \\mu_B}{\\sqrt{\\sigma^2_B + \\epsilon}}$ (Normalize)\n    4.  $y_i = \\gamma \\hat{x_i} + \\beta$ (Scale and shift)\n* **Backward:** This is the most complex backward pass, as the gradient $\\frac{\\partial L}{\\partial y}$ must be propagated back through $\\gamma$, $\\beta$, and the normalization statistics ($\\mu_B$, $\\sigma^2_B$) to the input $x$.\n\n#### Softmax Cross-Entropy Loss\nFor numerical stability, we combine the final activation and the loss function.\n* **Forward:**\n    1.  **Softmax:** $P_i = \\frac{e^{z_i}}{\\sum e^{z_j}}$ (Converts raw scores/logits $z$ to probabilities $P$).\n    2.  **Cross-Entropy:** $L = - \\frac{1}{N} \\sum y_i \\log(P_i)$ (Calculates loss, where $y_i$ is 1 for the true class).\n* **Backward:** When combined, the derivative $\\frac{\\partial L}{\\partial z}$ simplifies to a clean, stable expression that is perfect for starting backpropagation:\n    * $\\frac{\\partial L}{\\partial z} = \\frac{1}{N} (P - Y_{onehot})$ (where $Y_{onehot}$ is the one-hot encoded target vector).\n\n---\n\n## Project Structure\n\n```\nnumpy-neural-network/\n├── .gitignore                 # Standard Python .gitignore\n├── LICENSE                    # MIT License\n├── README.md                  # This readme file\n├── requirements.txt           # Project dependencies (numpy, sklearn)\n├── notebook.ipynb             # Jupyter Notebook for demonstration\n├── logs/                      # Directory for output log files\n│   └── .gitkeep\n├── src/                       # Main library source code\n│   ├── __init__.py\n│   ├── layers.py              # Layer implementations (Linear, ReLU, BN)\n│   ├── losses.py              # Loss functions (MSE, SoftmaxCrossEntropy)\n│   ├── model.py               # Sequential model class\n│   ├── optimizer.py           # Update rules (SGD, Momentum)\n│   ├── solver.py              # The Solver training class\n│   └── utils/                 # Helper modules\n│       ├── __init__.py\n│       ├── data_utils.py      # Data loading (MNIST, etc.)\n│       ├── gradient_check.py  # Numerical gradient checker\n│       └── logger.py          # Logging setup\n└── scripts/                   # Runnable training scripts\n    ├── __init__.py\n    ├── check_gradients.py     # Script to debug layer gradients\n    ├── train_mnist.py         # Script to train on MNIST\n    ├── train_fashion_mnist.py # Script to train on Fashion-MNIST\n    └── train_regression.py    # Script to train on California Housing\n```\n\n## How to Use\n\n1.  **Clone the Repository:**\n    ```bash\n    git clone https://github.com/msmrexe/numpy-neural-network.git\n    cd numpy-neural-network\n    ```\n\n2.  **Set up the Environment:**\n    (Recommended to use a virtual environment)\n    ```bash\n    python -m venv venv\n    source venv/bin/activate  # On Windows: venv\\Scripts\\activate\n    pip install -r requirements.txt\n    ```\n\n3.  **Run a Training Script:**\n    The `scripts/` folder contains ready-to-run training scripts. You can use `argparse` to change hyperparameters.\n\n    **Example: Train on MNIST**\n    ```bash\n    python scripts/train_mnist.py --epochs 10 --lr 0.01 --batch_size 128\n    ```\n    * Logs will be saved to `logs/train_mnist.log`.\n    * Progress will be printed to the console.\n\n    **Example: Train on California Housing (Regression)**\n    ```bash\n    python scripts/train_regression.py --epochs 30 --lr 0.005\n    ```\n    * Logs will be saved to `logs/train_regression.log`.\n\n4.  **Run the Demonstration Notebook:**\n    For a detailed breakdown and manual, step-by-step example of how to use the library, open the Jupyter Notebook:\n    ```bash\n    jupyter notebook notebook.ipynb\n    ```\n\n5.  **Check Layer Gradients (for Debugging):**\n    You can verify that all `backward()` passes are implemented correctly by running the gradient checker.\n    ```bash\n    python scripts/check_gradients.py\n    ```\n    * You should see very small relative errors (e.g., `\u003c 1e-7`) for all parameters.\n\n---\n\n## Author\n\nFeel free to connect or reach out if you have any questions!\n\n* **Maryam Rezaee**\n* **GitHub:** [@msmrexe](https://github.com/msmrexe)\n* **Email:** [ms.maryamrezaee@gmail.com](mailto:ms.maryamrezaee@gmail.com)\n\n---\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for full details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmsmrexe%2Fnumpy-neural-network","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmsmrexe%2Fnumpy-neural-network","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmsmrexe%2Fnumpy-neural-network/lists"}