{"id":22363564,"url":"https://github.com/david-palma/neural-network-from-scratch","last_synced_at":"2025-03-26T15:18:44.866Z","repository":{"id":169979780,"uuid":"193400319","full_name":"david-palma/neural-network-from-scratch","owner":"david-palma","description":"Step-by-step guide to creating a neural network from scratch in Python.","archived":false,"fork":false,"pushed_at":"2023-12-14T20:13:36.000Z","size":1679,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-31T16:30:26.499Z","etag":null,"topics":["artificial-inteligence","artificial-neural-networks","classifiers","deep-learning","feed-forward","learning-by-doing","machine-learning","multi-layer-perceptrons","neural-networks","neural-networks-from-scratch","python","supervised-learing","three-layer-architecture"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/david-palma.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-06-23T22:13:21.000Z","updated_at":"2024-12-30T23:29:39.000Z","dependencies_parsed_at":"2023-12-14T20:49:57.746Z","dependency_job_id":"c22debe5-773e-458a-a1e7-3c8ec5f44023","html_url":"https://github.com/david-palma/neural-network-from-scratch","commit_stats":null,"previous_names":["david-palma/neural-network","david-palma/neural-network-from-scratch"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/david-palma%2Fneural-network-from-scratch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/david-palma%2Fneural-network-from-scratch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/david-palma%2Fneural-network-from-scratch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/david-palma%2Fneural-network-from-scratch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/david-palma","download_url":"https://codeload.github.com/david-palma/neural-network-from-scratch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245678902,"owners_count":20654738,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-inteligence","artificial-neural-networks","classifiers","deep-learning","feed-forward","learning-by-doing","machine-learning","multi-layer-perceptrons","neural-networks","neural-networks-from-scratch","python","supervised-learing","three-layer-architecture"],"created_at":"2024-12-04T17:15:35.747Z","updated_at":"2025-03-26T15:18:44.844Z","avatar_url":"https://github.com/david-palma.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Implementation of a neural network from scratch in Python\n\nThis repository contains a python implementation of a feed forward artificial neural network (ANN) based on multi-layer perceptron (MLP) model.\n\n**NOTE**: the present tutorial is not a complete and comprehensive guide to neural network, rather it is intended to build some basic skills and get familiar with the concepts.\n\n## Model\n\nANN is a collection of interconnected neurons that incrementally learn from their environment (data) to capture essential linear and nonlinear trends in complex data, so that it provides reliable predictions for new situations containing even partial and noisy information.\n\nThe model adopted in this implementation refers to the multi-layer perceptron model with a single hidden layer (three-layer neural network), which is represented in the following figure.\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"./figures/mlp.png\" width=\"700px\"\u003e\u003c/img\u003e\u003cp\u003e\n\nA three-layer neural network corresponds to a function $f: \\mathbb{R}^N \\to \\mathbb{R}^M$ with $x \\in \\mathbb{R}^N$ and $y \\in \\mathbb{R}^M$.\n\n### Feed-forward propagation\nAt the very beginning, all weights are initially set to a weighted random number from a normal distribution, i.e., $\\sim N(0,1)$, whilst the biases are set to zero.\n\nThen, it is possible to compute the propagation forward through the network to generate the output value(s).\n\nThe hidden node values are given by:\n\n$$\\begin{aligned}\na_1 \u0026= s \\left(b_1^{(1)} + w_{11}^{(1)}x_1 + w_{12}^{(1)}x_2 \\ldots w_{1n}^{(1)}x_n\\right) \\\\\na_2 \u0026= s \\left(b_2^{(1)} + w_{21}^{(1)}x_1 + w_{22}^{(1)}x_2 \\ldots w_{2n}^{(1)}x_n\\right) \\\\\n\u0026 \\vdots \\\\\na_h \u0026= s \\left(b_h^{(1)} + w_{h1}^{(1)}x_1 + w_{h2}^{(1)}x_2 \\ldots w_{hn}^{(1)}x_n\\right)\n\\end{aligned}$$\n\nThen, using this result it is possible to compute the output node values:\n\n$$\\begin{aligned}\ny_1 \u0026= G \\left(b_1^{(2)} + w_{11}^{(2)}a_1 + w_{12}^{(2)}a_2 \\ldots w_{1h}^{(2)}a_h\\right) \\\\\ny_2 \u0026= G \\left(b_2^{(2)} + w_{21}^{(2)}a_1 + w_{22}^{(2)}a_2 \\ldots w_{2h}^{(2)}a_h\\right) \\\\\n\u0026 \\vdots \\\\\ny_m \u0026= G \\left(b_m^{(2)} + w_{m1}^{(2)}a_1 + w_{m2}^{(2)}a_2 \\ldots w_{mh}^{(2)}a_h\\right)\n\\end{aligned}$$\n\nHowever, it is also possible to use a matrix notation:\n\n$$f(x) = G \\left(b^{(2)} + W^{(2)} \\left(s \\left(b^{(1)} + W^{(1)}x\\right)\\right)\\right)$$\n\nwhere:\n* $b^{(1)} \\in \\mathbb{R}^H, b^{(2)} \\in \\mathbb{R}^M$ are the bias vectors;\n* $W^{(1)} \\in \\mathbb{R}^{N\\times H}, W^{(2)} \\in \\mathbb{R}^{H\\times M}$ are the weight matrices;\n* $G, s$ are the activation functions.\n\n### Activation function\n\nFor the activation function it has been used the hyperbolic tangent, but you can choose the sigmoid function as well.\n\n$$s(x) = \\tanh(x) = \\dfrac{e^x-e^{-x}}{e^x+e^{-x}} \\qquad \\rightarrow \\qquad \\dfrac{ds(x)}{dx} = s(x)\\left(1-s(x)\\right)$$\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"./figures/tanh.png\"\u003e\u003c/img\u003e\u003c/p\u003e\n\n### Classification\n\nClassification is done by projecting an input vector onto a set of hyperplanes, each of which corresponds to a class.\n\nThe distance from the input to a hyperplane reflects the probability that the input is a member of the corresponding class.\n\nThe probability that an input vector $x$ is a member of a class $i$ is:\n\n$$P\\left(y=i|x,W,b\\right)=\\text{softmax}_i\\left(Wx+b\\right)=\\dfrac{e^{W_i x + b_i}}{\\sum_j e^{W_j x + b_j}}$$\n\nThe model's prediction $\\hat{y}$ is the class whose probability is maximal:\n\n$$\\hat{y} = \\text{argmax}_i\\left(P\\left(y=i|x,W,b\\right)\\right)$$\n\n### Training and backpropagation\n\nThe training of the network takes place through the backpropagation algorithm used in conjunction with the stochastic gradient descent optimisation method.\n\nThe total error (loss function) is given by the sum of squared errors of prediction, which is the  sum of the squares of residuals (deviations predicted from actual empirical values of data):\n\n$$E=\\dfrac{1}{2}\\left\\|(y-\\hat{y})\\right\\|=\\dfrac{1}{2}\\sum_i\\left(y-\\hat{y}\\right)$$\n\nThe goal in this step is to find the gradient of each weight with respect to the output:\n\n$$\\Delta w_{ij} = -\\eta \\dfrac{\\partial E}{\\partial w_{ij}}$$\n\nwhere $\\eta$ is the learning rate, which should be tuned to ensure a fast convergence of the weights to a response, without oscillations.\n\nNow it is possible to apply the chain rule to back propagate the error in order to update the weight matrix and bias vector (the math part is not reported here).\n\n## Usage\n\nTo import the module type the following command\n\n```python\n# import the module\nimport NeuralNetwork\n```\n\nThen you can define your own neural network with custom number of inputs, hidden neurons, and output neurons\n\n```python\n# define the number of neurons in each layer\nno_inputs  = 2\nno_hiddens = 7\nno_outputs = 2\n\n# constructor\nann = NeuralNetwork.MLP([no_inputs,no_hiddens,no_outputs])\n```\n\nwhere the activation function is set by default to be the hyperbolic tangent.\n\nThen, given a training dataset `xt, yt` and a test dataset `x`\n\n```python\n# train the neural network\nann.train(xt, yt)\n\n# output prediction\ny_pred = ann.predict(x)\n```\n\n### Test #1: logical exclusive OR (XOR)\n\nThis is a classical non–linearly separable problem for logical XOR with noisy inputs.\n\nThe truth table of the logical exclusive OR (XOR) shows that it outputs true whenever the inputs differ:\n\n| x\u003csub\u003e1\u003c/sub\u003e | x\u003csub\u003e2\u003c/sub\u003e | y |\n|:-------------:|:-------------:|:-:|\n|       0       |       0       | 0 |\n|       0       |       1       | 1 |\n|       1       |       0       | 1 |\n|       1       |       1       | 0 |\n\n* input: $X = x \\pm\\epsilon,\\ X\\in \\mathbb{R}^{N\\times 2}$;\n* training set: $X_T \\subset X$;\n* the output consists of 2 classes.\n\nThe test dataset is a dataset that is independent of the training dataset, but that follows the same probability distribution as the training dataset.\n\nThe model is initially fit on the training dataset, so we can take a look at the loss per epoch graph. The figure below shows that the loss monotonically decreasing towards a minimum, which is consistent with the gradient descent optimisation algorithm.\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"./figures/xor_cost.png\" width=\"700px\"\u003e\u003c/img\u003e\u003cp\u003e\n\nLet's look at the final prediction (output) using the implemented Artificial Neural Network with 7 neurons in hidden layer $\\theta$.\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"./figures/xor_output.png\" width=\"700px\"\u003e\u003c/img\u003e\u003cp\u003e\n\nAs you can see, the neural network has been able to find a decision boundary that successfully separates the classes.\n\n### Test #2: multiple classes prediction\n\nThis is another non-linearly separable problem where the dataset consists of four (noisy) spirals rotated by a fixed angle $\\Phi$ between them.\n\n* input: $X = x \\pm\\epsilon,\\ X\\in \\mathbb{R}^{N\\times 2}$;\n* training set: $X_T \\subset X$;\n* the output consists of 4 classes\n\nThe figure below shows that the loss monotonically decreasing towards a minimum, which is consistent with the gradient descent optimisation algorithm.\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"./figures/moon_cost.png\" width=\"700px\"\u003e\u003c/img\u003e\u003cp\u003e\n\nLet's look at the final prediction (output) using the implemented Artificial Neural Network with 15 neurons in hidden layer $\\theta$.\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"./figures/moon_output.png\" width=\"700px\"\u003e\u003c/img\u003e\u003cp\u003e\n\nAs you can see, the neural network has been able to find a decision boundary that successfully separates the classes.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavid-palma%2Fneural-network-from-scratch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdavid-palma%2Fneural-network-from-scratch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavid-palma%2Fneural-network-from-scratch/lists"}