{"id":16058718,"url":"https://github.com/juzershakir/linear_regression","last_synced_at":"2026-03-19T19:16:43.241Z","repository":{"id":157834024,"uuid":"140061581","full_name":"JuzerShakir/Linear_Regression","owner":"JuzerShakir","description":"A Mathematical Intuition behind Linear Regression Algorithm","archived":false,"fork":false,"pushed_at":"2021-11-04T16:43:10.000Z","size":303,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-21T14:16:04.067Z","etag":null,"topics":["algorithm","bias-variance","cost-function","feature-scaling","gradient-descent","house-price-prediction","hypothesis","linear-algebra","linear-equations","linear-regression","machine-learning","matrices","mean-normalization","mean-square-error","mse","multivariate-regression","partial-derivative","regularized-linear-regression","univariate-regressions","vector"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JuzerShakir.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-07-07T07:11:48.000Z","updated_at":"2023-11-19T23:32:02.000Z","dependencies_parsed_at":null,"dependency_job_id":"939cfa63-b780-40be-90ea-d6dd30512392","html_url":"https://github.com/JuzerShakir/Linear_Regression","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuzerShakir%2FLinear_Regression","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuzerShakir%2FLinear_Regression/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuzerShakir%2FLinear_Regression/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuzerShakir%2FLinear_Regression/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JuzerShakir","download_url":"https://codeload.github.com/JuzerShakir/Linear_Regression/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243652711,"owners_count":20325607,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithm","bias-variance","cost-function","feature-scaling","gradient-descent","house-price-prediction","hypothesis","linear-algebra","linear-equations","linear-regression","machine-learning","matrices","mean-normalization","mean-square-error","mse","multivariate-regression","partial-derivative","regularized-linear-regression","univariate-regressions","vector"],"created_at":"2024-10-09T03:40:27.113Z","updated_at":"2026-01-02T10:09:01.377Z","avatar_url":"https://github.com/JuzerShakir.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Linear Regression\n###### Written by [Juzer Shakir](https://juzershakir.github.io/)\n\n## Table of Contents\n\n- [Description](#description)\n- [Notations](#notations)\n- [Definition](#definition)\n- [Flowchart](#flowchart)\n- [Univariate Linear Regression](#univariate-linear-regression)\n    - [Definition](#definition-for-univariate-linear-regression)\n    - [Formula](#formula-for-univariate-linear-regression)\n    - [Cost Function](#cost-function-for-univariate-linear-regression)\n    - [Gradient Descent](#gradient-descent-for-univariate-linear-regression)\n- [Multivariate Linear Regression](#multivariate-linear-regression)\n    - [Definition](#definition-for-multivariate-linear-regression)\n    - [Formula](#formula-for-multivariate-linear-regression)\n    - [Cost Function](#cost-function-for-multivariate-linear-regression)\n    - [Gradeint Descent](#gradient-descent-for-multivariate-linear-regression)\n- [Feature Scaling and Mean Normalization](#feature-scaling-and-mean-normalization)\n- [Bias - Variance](#bias---variance)\n    - [High Bias](#high-bias)\n    - [Just Right](#just-right)\n    - [High Variance](#high-variance)\n- [Resolving High Variance](#resolving-high-variance)\n    - [Cost Function](#cost-function-for-regularization)\n    - [Gradeint Descent](#gradient-descent-for-regularization)\n\n## Description\nA Mathematical intuition and quick guide and understanding of how Linear Regression Algorithms works. Given links to other study materials in order to understand the concepts more concretly.\n\n## Notations\n- `m` 👉 Number of Training Examples.\n- `x` 👉 \"input\" variable / features.\n- `y` 👉 \"ouput\" variable / \"target\" variable.\n- `n` 👉 Number of feature variable `(x)`\n- `(x, y)` 👉 One training example.\n- `x`\u003csub\u003ei\u003c/sub\u003e , `y`\u003csub\u003ei\u003c/sub\u003e  👉 i\u003csup\u003eth\u003c/sup\u003e training example.\n- `x`\u003csub\u003ei\u003csub\u003ej\u003c/sub\u003e\u003c/sub\u003e 👉 i\u003csup\u003eth\u003c/sup\u003e training example of the j\u003csup\u003eth\u003c/sup\u003e column / feature.\n\n-----\n\n## Definition\nA linear equation that models a function such that if we give any `x` to it, it will predict a value `y` , where both `x and y` are input and output varaibles respectively. These are numerical and continous values.\n\nIt is the most simple and well known algorithm used in machine learning.\n\n## Flowchart \n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/Linear_Reg_Flowchart.png'\u003e\u003c/p\u003e\n\n\u003cbr\u003e\n\nThe above Flowchart represents that we choose our training set, feed it to an algorithm, it will learn the patterns and will output a function called `Hypothesis function 'H(x)'`. We then give any `x` value to that function and it will output an estimated `y` value for it.\n\nFor historical reasons, this function `H(x)` is called `hypothesis function.`\n\n-----\n\n## Univariate Linear Regression\n### Definition for Univariate Linear Regression\nWhen you have one feature / variable `x` as an input to the function to predict `y`, we call this `Univariate Linear Regression` problem.\n\n### Formula for Univariate Linear Regression\n\n\u003cp align='center'\u003eH(x) = θ\u003csub\u003e0\u003c/sub\u003e + θ\u003csub\u003e1\u003c/sub\u003ex\u003c/p\u003e\n\nOther way of representing this formula as what we are familiar with:\n\n\u003cp align='center'\u003eH(x) = b + mx\u003c/p\u003e\n\n\u003e Where :\n\u003e- b = θ\u003csub\u003e0\u003c/sub\u003e 👉 y intercept\n\u003e- m = θ\u003csub\u003e1\u003c/sub\u003e 👉 slope\n\u003e- x = x 👉 feature / input variable\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/Linear_model_representation.jpg'\u003e\u003c/p\u003e\n\u003cp align = 'center'\u003e\u003ca href = 'https://archive.cnx.org/contents/20986bfa-2c2a-47f1-a48a-786122b0c606@3/graphical-analysis-of-one-dimensional-motion'\u003eSource\u003c/a\u003e\u003c/p\u003e\n\n\u003cbr\u003e\n\n\u003e **Help** ✍🏼 \n\u003e - \u003ca href = 'https://www.khanacademy.org/math/algebra/two-var-linear-equations/slope-intercept-form/v/slope-intercept-form'\u003eIntuition behind linear equation.\u003c/a\u003e\n\u003e - \u003ca href = 'https://www.khanacademy.org/math/algebra/two-var-linear-equations/slope-intercept-form/e/slope-from-an-equation-in-slope-intercept-form'\u003eNeed to Practice?\u003c/a\u003e\n\n\n### Cost Function for Univariate Linear Regression\nAll that said, how do we figure out the best possible straight line to the data that we feed?\n\n**This is where `Cost Function` will help us:**\n\nThe best fit line to our data will be where we have least distance between the `predicted 'y' value` and `trained 'y' value`.\n\n#### Formula for Cost Function\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/MSE.png'\u003e\u003c/p\u003e\n\n\u003e Where :\n\u003e- h(x\u003csub\u003ei\u003c/sub\u003e) 👉 hypothesis function\n\u003e- y\u003csub\u003ei\u003c/sub\u003e 👉 actual values of `y`\n\u003e- 1/m 👉 gives Mean of Squared Errors\n\u003e- 1/2 👉 Mean is halved as a convenience for the computation of the `Gradient Descent`.\n\n\nThe above formula takes the sum of the distances between \u003ci\u003e`predicted values` and `actual values` of training set, sqaure it, take the average and multiply it by `1/2`.\u003c/i\u003e\n\u003cbr\u003e\n\u003cbr\u003e\nThis cost function is also called as `Squared Error Function` or `Mean Squared Error`.\n\u003cbr\u003e\n\u003cbr\u003e\n🙋‍ Why do we take squares of the error's?\u003cbr\u003e\nThe `MSE` function is commonly used and is a reasonable choice and works well for most Regression problems.\n\u003cbr\u003e\n\u003cbr\u003e\nLet's subsititute `MSE` function to function `J` :\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/MSE1.png'\u003e\u003c/p\u003e\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n\u003e **Help** ✍🏼 \n\u003e - \u003ca href='https://youtu.be/0kns1gXLYg4'\u003eIntuition behind Cost Function.\u003c/a\u003e\n\n\n### Gradient Descent for Univariate Linear Regression\nSo now we have our hypothesis function and we have a way of measuring how well it fits into the data. Now we need to estimate the parameters in the hypothesis function. That's where `Gradient Descent` comes in.\u003cbr\u003e\n`Gradient Descent` is used to minimize the cost function `J`, minimizing `J` is same as minimizing `MSE` to get best possible fit line to our data.\n\n#### Formula for Gradient Descent\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/Gradient_Descent.PNG'\u003e\u003c/p\u003e\n\n\u003e Where :\n\u003e- `:=` 👉 Is the Assignment Operator\n\u003e- `α` 👉 is `Alpha`, it's the number which is called learning rate. If its too high it may fail to converge and if too low then descending will be slow.\n\u003e- 'θ\u003csub\u003ej\u003c/sub\u003e' 👉 Taking Gradient Descent of a feature or a column of a dataset.\n\u003e - ∂/(∂θ\u003csub\u003ej\u003c/sub\u003e) J(θ\u003csub\u003e0\u003c/sub\u003e,θ\u003csub\u003e1\u003c/sub\u003e) 👉 Taking partial derivative of `MSE` cost function.\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n\u003e **Additional Resources** ✍🏼 \n\u003e - \u003ca href='https://youtu.be/YovTqTY-PYY'\u003eIntuition behind Gradient Descent.\u003c/a\u003e\n\u003e - \u003ca href='https://www.khanacademy.org/math/multivariable-calculus/multivariable-derivatives/partial-derivatives/v/partial-derivatives-introduction'\u003ePartial Derivative.\u003c/a\u003e\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n**Now Let's apply Gradient Descend to minmize our `MSE` function.**\n\u003cbr\u003e\nIn order to apply `Gradient Descent`, we need to figure out the partial derivative term.\u003cbr\u003e\nSo let's solve partial derivative of cost function `J`.\n\n\u003cbr\u003e\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/Solving_Partial_Derivative.PNG'\u003e\u003c/p\u003e\n\n\u003cbr\u003e\n\nNow let's plug these 2 values to our `Gradient Descent`:\n\n\u003cbr\u003e\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/Final_Gradient_Descent.PNG'\u003e\u003c/p\u003e\n\n\u003cbr\u003e\n\n\u003e **Note :** 🚩\u003cbr\u003e\n\u003e - Cost Function for Linear Regression is always going to be Convex or Bowl Shaped Function, so this function doesn't have any local minimum but one global minimum, thus always converging to global minimum.\n\u003e - The above hypothesis function has 2 parameters, θ\u003csub\u003e0\u003c/sub\u003e \u0026 θ\u003csub\u003e1\u003c/sub\u003e, so Gradient Descent will run on each feature, hence here two times, one for feature and one for base `(y-intercept)`, to get minimum value of `j`. So if we have `n` features, Gradient Descent will run on all `n+1` features.\n\n-----\n\n## Multivariate Linear Regression\n\n\u003e **Linear Algebra** ✍🏼 \n\u003e - \u003ca href='https://www.khanacademy.org/math/precalculus/vectors-precalc/modal/v/introduction-to-vectors-and-scalars'\u003eIntro to Vectors \u0026 Scalars.\u003c/a\u003e\n\u003e - \u003ca href=  'https://www.khanacademy.org/math/precalculus/vectors-precalc/modal/a/vector-operations-review'\u003eCombined Vector Operations.\u003c/a\u003e\n\u003e - \u003ca href=  'https://www.khanacademy.org/math/precalculus/precalc-matrices/modal/v/introduction-to-the-matrix'\u003eIntro to Matrices.\u003c/a\u003e\n\u003e - \u003ca href=  'https://www.khanacademy.org/math/precalculus/precalc-matrices/modal/a/representing-systems-with-matrices'\u003eRepresenting linear systems with matrices.\u003c/a\u003e\n\u003e - \u003ca href=  'https://www.khanacademy.org/math/precalculus/precalc-matrices/modal/v/matrix-addition-and-subtraction-1'\u003eAdd \u0026 subtract matrices.\u003c/a\u003e\n\u003e - \u003ca href=  'https://www.khanacademy.org/math/precalculus/precalc-matrices/modal/v/matrix-multiplication-intro'\u003eMultipling matrices.\u003c/a\u003e\n\u003e - \u003ca href=  'https://www.khanacademy.org/math/precalculus/precalc-matrices/modal/v/identity-matrix'\u003eIdentity Matrix.\u003c/a\u003e\n\u003e - \u003ca href=  'https://www.khanacademy.org/math/precalculus/precalc-matrices/modal/a/properties-of-matrix-multiplication'\u003eProperties of Matrix Multiplication.\u003c/a\u003e\n\u003e - \u003ca href=  'https://www.khanacademy.org/math/precalculus/precalc-matrices/modal/v/inverse-matrix-part-1'\u003eMatrix Inverses.\u003c/a\u003e\n\u003e - \u003ca href=  'https://www.khanacademy.org/math/linear-algebra/matrix-transformations/matrix-transpose/v/linear-algebra-transpose-of-a-matrix'\u003eMatrix Transpose.\u003c/a\u003e\n\n\n### Definition for Multivariate Linear Regression\nIts same as `Univariate Linear Regression`, except it has more than one feature variable `(x)` to predict target variable `(y)`.\n\n### Formula for Multivariate Linear Regression\nOur hypothesis function for `n` = 4 :\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/Multi_Hypo_Func.PNG'\u003e\u003c/p\u003e\n\n\u003cbr\u003e\n\n\u003e Where :\n\u003e- θ\u003csub\u003e0\u003c/sub\u003e 👉 y intercept\n\u003e - And rest are features `x` to help predict `y` value.\n\u003e\u003e **Intuition:**\u003cbr\u003e\n\u003e\u003eIn order to develop an intuition about this function, let's imagine that this function represents price of a house `(y)` based on the features given `(x)`, then we can think of this function as:\n\u003e\u003e - θ\u003csub\u003e0\u003c/sub\u003e as the basic price of a house.\n\u003e\u003e - θ\u003csub\u003e1\u003c/sub\u003e as price/m\u003csup\u003e2\u003c/sup\u003e.\n\u003e\u003e - x\u003csub\u003e1\u003c/sub\u003e as area of a house (m\u003csup\u003e2\u003c/sup\u003e).\n\u003e\u003e - θ\u003csub\u003e2\u003c/sub\u003e as price/floor.\n\u003e\u003e - x\u003csub\u003e2\u003c/sub\u003e as number of floors.\n\u003e\u003e - etc _(You get the idea)_\n\n\u003cbr\u003e\n\u003cbr\u003e\n\nLet's set all the parameters:\n\u003cp align='center'\u003e\u003cb\u003e θ\u003csub\u003e0\u003c/sub\u003e, θ\u003csub\u003e1\u003c/sub\u003e, θ\u003csub\u003e2\u003c/sub\u003e, θ\u003csub\u003e3\u003c/sub\u003e.......θ\u003csub\u003en\u003c/sub\u003e = θ \u003c/b\u003e\u003c/p\u003e\n\u003cbr\u003e\nAnd Let's set all the features:\n\u003cp align='center'\u003e\u003cb\u003e x\u003csub\u003e0\u003c/sub\u003e, x\u003csub\u003e1\u003c/sub\u003e, x\u003csub\u003e2\u003c/sub\u003e, x\u003csub\u003e3\u003c/sub\u003e.......x\u003csub\u003en\u003c/sub\u003e = x \u003c/b\u003e\u003c/p\u003e\n\u003cbr\u003e\n\n\u003e Where :\n\u003e- θ 👉 will be `n+1` dimensional vector because we have θ\u003csub\u003e0\u003c/sub\u003e which is not a feature.\n\u003e - x\u003csub\u003e0\u003c/sub\u003e 👉 is added just for convenience so that we can take matrix multiplication of `θ` as θ\u003csup\u003eT\u003c/sup\u003e and `x` and we will set x\u003csub\u003e0\u003c/sub\u003e value to 1, so this doesn't change the values.\n\u003e - x 👉 will also be `n+1` dimensional vector.\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n### Cost Function for Multivariate Linear Regression\n\nSo, \n\u003cp align='center'\u003eJ(θ\u003csub\u003e0\u003c/sub\u003e, θ\u003csub\u003e1\u003c/sub\u003e, θ\u003csub\u003e2\u003c/sub\u003e, θ\u003csub\u003e3\u003c/sub\u003e.......θ\u003csub\u003en\u003c/sub\u003e) = J(θ)\u003c/p\u003e\n\u003cbr\u003e\nWhere,\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/Multi_Linear_Cost_Func.PNG'\u003e\u003c/p\u003e\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n### Gradient Descent for Multivariate Linear Regression\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/Multi_Linear_Gradient_Descent.PNG'\u003e\u003c/p\u003e\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n**Appling Gradient Descend to minmize our `MSE` function after solving partial derivative of J(θ), we get :**\n\u003cbr\u003e\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/Final_Gradient_Descent_Multi_Linear.PNG'\u003e\u003c/p\u003e\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n**For Example : If we had 2 features, this is how gradient descent would run on each parameter:**\n\n\u003cbr\u003e\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/Param_Final_Gradient_Descent_Multi_Linear.PNG'\u003e\u003c/p\u003e\n\n\u003cbr\u003e\n\n------\n\n## Feature Scaling and Mean Normalization\n\nWe can speed up `Gradient Descent` by having each of our input values in roughly the same range. This is because `θ` will descend quickly on small ranges and slowly on large ranges. If we have large ranges, it will oscilate inefficiently down to optimum (minimum) when the variables are very uneven.\n\n\u003cbr\u003e\n\nThe way to prevent this is to modify the ranges of our input variables so that they are all roughly the same. Ideally between :\n\u003cp align = 'center'\u003e-1  ≤  x\u003csub\u003ei\u003c/sub\u003e  ≤  1\u003c/p\u003e\n\u003cp align = 'center'\u003eOR\u003c/p\u003e\n\u003cp align = 'center'\u003e-0.5  ≤  x\u003csub\u003ei\u003c/sub\u003e  ≤  0.5\u003c/p\u003e\n\n\u003cbr\u003e\n\nThese aren't exact requirements, we are only trying to speed things up. The goal is to get all input variables into roughly one of these ranges.\u003cbr\u003e\nThis can make `Gradient Descent` run much faster and converge in a lot few iterations.\n\n\u003cbr\u003e\n\n\u003cbr\u003eTwo techniques to help with this are `Feature Scaling` and `Mean Normalization`.\u003c/b\u003e\n\n- `Feature Scaling` involves diving the input values by the range (i.e max value - min value) of the input variable, resulting in a new values.\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/Feature_Scaling.PNG'\u003e\u003c/p\u003e\n\n- `Mean Normalization` involves subtracting the average of a feature variable from the values of the feature, dividing by _range_ of values or by _standard deviation_, resulting in a new values.\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/Mean_Normalization.PNG'\u003e\u003c/p\u003e\n\n\u003e Where:\n\u003e - μ\u003csub\u003ej\u003c/sub\u003e 👉 Average of a Feature variable `j`.\n\u003e - s\u003csub\u003ei\u003c/sub\u003e 👉 Either `Range` or `Standard Deviation` of a Feature `j`.\n\n-----\n\n## Bias - Variance\n\n### High Bias\n\nConsider a problem of predicting `y`. The figure below shows the result of fitting a `hypothesis function` θ\u003csub\u003e0\u003c/sub\u003e + θ\u003csub\u003e1\u003c/sub\u003ex to a dataset.\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/fig_1.PNG'\u003e\u003c/p\u003e\n\nWe see that the data doesn't really lie on a straight line and so the fit is not very good.\u003cbr\u003e\nThis figure is an instance of `Underfitting`, in which the data clearly shows structure not captured by the model `h(x)`.\u003cbr\u003e\n`Underfitting` or `high bias` is when the form of our `h(x)` function maps poorly to the trend of the data. It is usually caused by a function that is too simple or uses too few features.\n\n-----\n\n### Just Right\n\nIf we add an extra feature x\u003csup\u003e2\u003c/sup\u003e and fit `hypothesis function` θ\u003csub\u003e0\u003c/sub\u003e + θ\u003csub\u003e1\u003c/sub\u003ex + θ\u003csub\u003e2\u003c/sub\u003ex\u003csup\u003e2\u003c/sup\u003e, then we obtain a slightly better fit to the data.\n\n\u003cbr\u003e\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/fig_2.PNG'\u003e\u003c/p\u003e\n\n\u003cbr\u003e\n\n------\n\n### High Variance\n\nIf we add more features, it would intuitively seem that it could perform much better, however there's also a danger in adding too amny features. The result of fitting 4th order polynomial where our `h(x)` is  θ\u003csub\u003e0\u003c/sub\u003e + θ\u003csub\u003e1\u003c/sub\u003ex + θ\u003csub\u003e2\u003c/sub\u003ex\u003csup\u003e2\u003c/sup\u003e + θ\u003csub\u003e3\u003c/sub\u003ex\u003csup\u003e3\u003c/sup\u003e + θ\u003csub\u003e4\u003c/sub\u003ex\u003csup\u003e4\u003c/sup\u003e\n\n\u003cbr\u003e\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/fig_3.PNG'\u003e\u003c/p\u003e\n\n\u003cbr\u003e\n\nWe see that even though the fitted curve passes through the data perfectly, we would not expect this to be a very good predictor of for example housing prices `y` for different areas `x`.\n\n\u003cbr\u003e\n\n`Overfitting` or `High Variance`, is caused by `h(x)` function that fits the available data but does not generalize (unable to accurately predict) well to predict new data. It is usually caused by giving too many features or having a complicated `h(x)` function that creates lots of unnecessory curves and angles unrelated to the data.\n\n\u003cbr\u003e\n\n## Resolving High Variance\n\nThere are 2 main options to address the issue of Overfitting:\n\n- **Regularization**\n    - Keep all features, but reduce the magnitude of parameters θ\u003csub\u003ej\u003c/sub\u003e.\n    - It works well when we have a lot of slightly useful features.\n- **Reduce number of features**\n    - Manually select which features to keep.\n    - Use `model selection` algorithm.\n\n\u003cbr\u003e\n\n### Cost Function for Regularization\n\nIf we have overfitting from our `hypothesis function`, for example like in figure 3, we can reduce the weight that some of the terms in our function carry by increasing their cost.\n\u003cbr\u003e\nFor example, let's say we wanted to make the following function more quadratic:\u003cbr\u003e\nθ\u003csub\u003e0\u003c/sub\u003e + θ\u003csub\u003e1\u003c/sub\u003ex + θ\u003csub\u003e2\u003c/sub\u003ex\u003csup\u003e2\u003c/sup\u003e + θ\u003csub\u003e3\u003c/sub\u003ex\u003csup\u003e3\u003c/sup\u003e + θ\u003csub\u003e4\u003c/sub\u003ex\u003csup\u003e4\u003c/sup\u003e\u003cbr\u003e\nWe'll want to eliminate the influence of θ\u003csub\u003e3\u003c/sub\u003ex\u003csup\u003e3\u003c/sup\u003e and θ\u003csub\u003e4\u003c/sub\u003ex\u003csup\u003e4\u003c/sup\u003e, without actually getting rid of these features or changing the form of our hypothesis, we can instead modify our cost function.\u003cbr\u003e\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/regularized_cost_func_ex.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\n\nWe've added two extra terms at the end to inflate the cost of θ\u003csub\u003e3\u003c/sub\u003e and θ\u003csub\u003e4\u003c/sub\u003e. Now, in order for the cost function to get close to `0`, we will have to reduce the values θ\u003csub\u003e3\u003c/sub\u003e and θ\u003csub\u003e4\u003c/sub\u003e to near `0`. This will in turn greatly reduce the values of θ\u003csub\u003e3\u003c/sub\u003ex\u003csup\u003e3\u003c/sup\u003e and θ\u003csub\u003e4\u003c/sub\u003ex\u003csup\u003e4\u003c/sup\u003e in our `hypothesis function`.\u003cbr\u003e\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/fig_4.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\n\nAs a result we see that the new `H(x)` looks like a quadratic function but fits the data better compared to _figure 3_ due to extra small terms θ\u003csub\u003e3\u003c/sub\u003ex\u003csup\u003e3\u003c/sup\u003e and θ\u003csub\u003e4\u003c/sub\u003ex\u003csup\u003e4\u003c/sup\u003e.\n\n\u003cbr\u003e\n\u003cbr\u003e\n\nWe can also regularize all of our parameters in a single summation as:\n\u003cbr\u003e\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/regularized_cost_func.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\n\nThe `λ`, or `lambda`, is regularization parameter. It determines how much costs of our `θ` parameters are inflated.\u003cbr\u003e\nUsing the above cost function with the extra summation, we can smooth the output of our hypothesis function to reduce overfitting. If `lambda` is choosen to be too large, it may smooth our function too much and causing `High bias` or `Underfitting`.\n\n\u003e **Note:**\u003cbr\u003e\n\u003e We penalize the parameters from θ\u003csub\u003e1\u003c/sub\u003e .... θ\u003csub\u003en\u003c/sub\u003e , but we don't penalize θ\u003csub\u003e0\u003c/sub\u003e. We treat this differently.\n\n------\n\n\u003cbr\u003e\n\n### Gradient Descent for Regularization\n\nWe will modify our gradient descent function to separate out θ\u003csub\u003e0\u003c/sub\u003e from rest of the parameters because we do not want to penalize θ\u003csub\u003e0\u003c/sub\u003e.\n\n\u003cbr\u003e\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/regularized_gradient_descent.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\n\nThe term λ/m times θ\u003csub\u003ej\u003c/sub\u003e performs our regularization.\u003cbr\u003e\n\nWith some manipulation our updated Gradient Descent for θ\u003csub\u003ej\u003c/sub\u003e can also be represented as:\u003cbr\u003e\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/regularized_gradient_descent_simple.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\n\nThe first term in the equation \u003cimg src = 'Formulas/regularization_first_term.PNG'\u003e will always be less than 1. Intuitively you can see it as reducing the value θ\u003csub\u003ej\u003c/sub\u003e by some amount on every update. And the second term is exactly the same as it was in `Gradient Descent` without applying regularization.\n\n\n-----\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuzershakir%2Flinear_regression","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjuzershakir%2Flinear_regression","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuzershakir%2Flinear_regression/lists"}