{"id":16058724,"url":"https://github.com/juzershakir/logistic_regression","last_synced_at":"2026-01-02T06:19:15.828Z","repository":{"id":157834025,"uuid":"140746940","full_name":"JuzerShakir/Logistic_Regression","owner":"JuzerShakir","description":"A Mathematical Intuition behind Logistic Regression Algorithm","archived":false,"fork":false,"pushed_at":"2021-11-04T16:45:42.000Z","size":235,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-21T14:16:01.618Z","etag":null,"topics":["algorithm","bias-variance","cost-function","gradient-descent","hypothesis","logarithmic-regression","logarithms","logistic-function","logistic-regression","machine-learning","multiclass-classification","overfitting","probability","regularized-logistic-regression","sigmoid-function"],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JuzerShakir.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-07-12T17:51:13.000Z","updated_at":"2022-12-18T19:39:54.000Z","dependencies_parsed_at":"2024-06-20T06:04:41.745Z","dependency_job_id":null,"html_url":"https://github.com/JuzerShakir/Logistic_Regression","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuzerShakir%2FLogistic_Regression","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuzerShakir%2FLogistic_Regression/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuzerShakir%2FLogistic_Regression/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuzerShakir%2FLogistic_Regression/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JuzerShakir","download_url":"https://codeload.github.com/JuzerShakir/Logistic_Regression/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243652713,"owners_count":20325607,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithm","bias-variance","cost-function","gradient-descent","hypothesis","logarithmic-regression","logarithms","logistic-function","logistic-regression","machine-learning","multiclass-classification","overfitting","probability","regularized-logistic-regression","sigmoid-function"],"created_at":"2024-10-09T03:40:27.850Z","updated_at":"2026-01-02T06:19:15.794Z","avatar_url":"https://github.com/JuzerShakir.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Logistic Regression\n\u003e Written by [Juzer Shakir](https://juzershakir.github.io/)\n\n\n\n## Table of Contents\n- [Description](#description)\n- [Prerequisite](#prerequisite)\n- [Notations](#notations)\n- [Defintion](#definition)\n- [Hypothesis Function](#hypothesis-function)\n- [Decision Boundary](#decision-boundary)\n- [Cost Function](#cost-function)\n- [Gradient Descent](#gradient-descent)\n- [Multiclass Classification](#multiclass-classification)\n- [Regularization](#regularization)\n\n## Description\nA Mathematical intuition and quick guide and understanding of how Logistic Regression Algorithm works. \n\n## Prerequisite\n - [Standard equation of a Circle](https://www.khanacademy.org/math/algebra2/intro-to-conics-alg2/modal/v/writing-standard-equation-of-circle)\n- [Dividing by Zero](https://youtu.be/J2z5uzqxJNU)\n- [Logarithm](https://www.khanacademy.org/math/algebra2/exponential-and-logarithmic-functions/introduction-to-logarithms/v/logarithms)\n- [Dependent Probability](https://www.khanacademy.org/math/statistics-probability/probability-library/modal/v/analyzing-dependent-probability)\n\n## Notations\n- `m` 👉 Number of Training Examples.\n- `x` 👉 \"input\" variable / features.\n- `y` 👉 \"ouput\" variable / \"target\" variable.\n- `n` 👉 Number of feature variable `(x)`\n- `(x, y)` 👉 One training example.\n- `x`\u003csub\u003ei\u003c/sub\u003e , `y`\u003csub\u003ei\u003c/sub\u003e  👉 i\u003csup\u003eth\u003c/sup\u003e training example.\n- `x`\u003csub\u003ei\u003csub\u003ej\u003c/sub\u003e\u003c/sub\u003e 👉 i\u003csup\u003eth\u003c/sup\u003e training example of the j\u003csup\u003eth\u003c/sup\u003e column / feature.\n\n## Definition\n`Logisitic Regression` is a classification algorithm where a dependent variable `'y'` that we want to predict takes on discrete values, for example `y ϵ {0,1}`. It is the most popular and widely used.\n\n\u003cbr\u003e\n\n### Example of Classification Problem\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/example.png'\u003e\u003c/p\u003e\n\n\u003cbr\u003e\n\nThese are some of the area where `Logistic Regression` is used. Where we want to know whether an email recieved is `'Spam'` or `'Not-Spam'` and then place them to their predicted category. Whether the transaction is fradulent or not and whether the tumor is `'Benign'` or `'Malignant'`.\n\u003cbr\u003e\nThe way we approach to these type of classification problems where the prediction variable `'y'` does not take on continous value, is we set `'y'` to take on `'discrete values'`, for example:\u003cbr\u003e\n\u003cpre align = center\u003ey ϵ {0,1}          0 : 'Nagative Class'\n                    1 :  'Positive Class' \u003c/pre\u003e\n\n\u003cbr\u003e\n\nWe can think of predicting value `'y'` taking on two value either `'0'` or `'1'`, either `'Not-Spam'` or `'Spam'`, either `'Benign'` or `'Malignant'` etc.\n\n\u003cbr\u003e\n\nAnother name for the class that we denote with `'0'` is the `'negative class'` and another name for the class that we denote with `'1'` is `'positive class'`. So `'0'` we denote as `'Not-Spam'` and `'1'` as `'Spam'`. The assignment of these classes is arbitrary and it doesn't really matter but often there is an intuition that a `'negative class' '0'` is conveying the absence of something.\n\n\u003cbr\u003e\n\nClassification probelms like these are also called `'Binary Classification'` problem where we have only two outputs, either `'0'` or `'1'`.\n\n## Hypothesis Function\nWe could appraoch the classification problem ignoring the fact that `'y'` is discrete valued, and use [Linear Regression]( https://github.com/JuzerShakir/Linear_Regression#formula-for-univariate-linear-regression) algorithm to try to predict `'y'` given `'x'`. However, it is easy to construct examples where this method performs very poorly. And also it doesn't make sense for our `'h(x)'` to take values larger than `1` or smaller than `0` when we konw `'y ϵ {0,1}'`. To fix this, we need to change the form of our `'h(x)'` to satisfy 0 ≤ h(x) ≤ 1.\n\u003cbr\u003e\nThis is achieved by plugging θ\u003csup\u003eT\u003c/sup\u003ex into the `'Logistic Function'` or also known as `'Sigmoid Function'`.\n\n\u003cbr\u003e\n\n**Sigmoid Function:**\n\n\u003cbr\u003e\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/Sigmoid_Func.PNG'\u003e\u003c/p\u003e\n\n\u003cbr\u003e\n\n**Graph of Sigmoid Function: g(z) with respect to z**\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/Sigmoid_Func_Graph.png'\u003e\u003c/p\u003e\n\n\u003cbr\u003e\n\nThis function has 2 horizonatal asymptotes. As `'z'` approaches to `-∞` , g(z) approaches to `0` and z approaches to `∞`, g(z) approaches to `1` and `y-intercept` is `0.5` when `'z'` is `0`.\u003cbr\u003e\nThe function `g(z)` shown above, maps to any real number between `0` and `1` interval, making it useful for tranforming an arbitrary valued function into a fucntion better suited for classification.\n\n\u003cbr\u003e\n\nNow lets set `'z'` to θ\u003csup\u003eT\u003c/sup\u003ex and pass it to our `'h(x)'`:\n\n\u003cbr\u003e\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/hypothesis_fun.PNG'\u003e\u003c/p\u003e\n\n\u003cbr\u003e\n\n`h(x)` will give us the probability that our output is 1. For example, `h(x) = 0.7` gives us the probability of `70%` that our output is `1`. Here's how we interpret it:\n\n\u003cbr\u003e\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/probability_1.PNG'\u003e\u003c/p\u003e\n\n\u003cbr\u003e\n\nSince here the probability of `'y'` is `0.7` then the probability of `'y'` being `0` is `0.3` since both probability should add up to `1`.\u003cbr\u003e\nHere's how we can interpret it:\n\n\u003cbr\u003e\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/probability_2.PNG'\u003e\u003c/p\u003e\n\n\u003cbr\u003e\n\n### Setting discrete values\nWe can translate the output of the `h(x)` function as follows:\u003cbr\u003e\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/discrete_value_1.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\n\nBecause the way that our `Logistic Function g(z)` behaves is that when its input is greater than or equal to `0`, its output is greater than equal to `0.5`.\n\n\u003e **Note:**\u003cbr\u003e\n\u003e if z = 0, then e\u003csup\u003e0\u003c/sup\u003e = 1, ∴ g(z) = 0.5\u003cbr\u003e\n\u003e if z = ∞, then e\u003csup\u003e-∞\u003c/sup\u003e = 0, ∴ g(z) approaches 1\u003cbr\u003e\n\u003e if z = -∞, then e\u003csup\u003e∞\u003c/sup\u003e = 1, ∴ g(z) approaches 0\u003cbr\u003e\n\nSo if our input to the function `g` is θ\u003csup\u003eT\u003c/sup\u003ex, then that means when θ\u003csup\u003eT\u003c/sup\u003ex ≥ 0, then `h(x)` ≥ 0.5.\u003cbr\u003e\nFrom all of these statements we can now say:\u003cbr\u003e\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/discrete_value_2.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\n\n## Decision Boundary\nThe decision boundary is the line that separates the area of `y=1` and `y=0`. It is created by our hypothesis function.\n\n### Linear Decision Boundary\nFor example : if\u003cbr\u003e\n\u003cp align = center\u003eh(x) = θ\u003csub\u003e0\u003c/sub\u003e + θ\u003csub\u003e1\u003c/sub\u003ex\u003csub\u003e1\u003c/sub\u003e + θ\u003csub\u003e2\u003c/sub\u003ex\u003csub\u003e2\u003c/sub\u003e\u003cbr\u003e\nand θ\u003csub\u003e0\u003c/sub\u003e = 5, θ\u003csub\u003e1\u003c/sub\u003e = -1, θ\u003csub\u003e2\u003c/sub\u003e = 0\u003c/p\u003e\n\nSo `y = 1` if:\n\u003cp align = center\u003e5 + (-1)x\u003csub\u003e1\u003c/sub\u003e + 0x\u003csub\u003e2\u003c/sub\u003e ≥ 0\u003cbr\u003e\n5-x\u003csub\u003e1\u003c/sub\u003e  ≥ 0 \u003cbr\u003e\n-x\u003csub\u003e1\u003c/sub\u003e ≥ -5 \u003cbr\u003e\nx\u003csub\u003e1\u003c/sub\u003e ≤ 5 \u003c/p\u003e \n\n\u003cbr\u003e\n\nIn this case our decision boundary is a straight line placed on the graph where x\u003csub\u003e1\u003c/sub\u003e = 5 and everything to the left of that denotes `y = 1` while everything to the right of that denotes `y = 0`.\n\n\u003cbr\u003e\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/linear_decision_graph.PNG'\u003e\u003c/p\u003e\n\n\u003cbr\u003e\n\n### Non-Linear Decision Boundary\nThe above input to the `Logistic or Sigmoid Function` was linear but \nθ\u003csup\u003eT\u003c/sup\u003ex can also be a function that describes a circle or any other function.\u003cbr\u003e\n\nFor example:\u003cbr\u003e\n\u003cp align = center\u003eh(x) = θ\u003csub\u003e0\u003c/sub\u003e + θ\u003csub\u003e1\u003c/sub\u003ex\u003csub\u003e1\u003c/sub\u003e\u003csup\u003e2\u003c/sup\u003e + θ\u003csub\u003e2\u003c/sub\u003ex\u003csub\u003e2\u003c/sub\u003e\u003csup\u003e2\u003c/sup\u003e\u003cbr\u003e\nand θ\u003csub\u003e0\u003c/sub\u003e = -1, θ\u003csub\u003e1\u003c/sub\u003e = 1, θ\u003csub\u003e2\u003c/sub\u003e = 1\u003c/p\u003e\n\nSo `y = 1` if:\n\u003cp align = center\u003e-1 + x\u003csub\u003e1\u003c/sub\u003e\u003csup\u003e2\u003c/sup\u003e + x\u003csub\u003e2\u003c/sub\u003e\u003csup\u003e2\u003c/sup\u003e ≥ 0\u003cbr\u003e\nor \u003cbr\u003e\nx\u003csub\u003e1\u003c/sub\u003e\u003csup\u003e2\u003c/sup\u003e + x\u003csub\u003e2\u003c/sub\u003e\u003csup\u003e2\u003c/sup\u003e ≥ 1\u003cbr\u003e\u003c/p\u003e \n\n\u003cbr\u003e\n\nSo if we were to plot the decision boundary of this, it would be a circle with radius 1 centered at the origin.\n\n\u003cbr\u003e\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/non-linear_decision_graph.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\n\nEverything outside the circle is `y=1` and inside is `y=0`.\n\n\u003e **Note:**\u003cbr\u003e\n\u003e We do not need to define the decision boundary. The training set will fit the parameters θ and once you have them then that will define decision boundary.\n\n## Cost Function\nIf we choose the cost function of [Linear Regression](https://github.com/JuzerShakir/Linear_Regression#cost-function-for-univariate-linear-regression) (MSE), it turns out that this would not guarantee that when we run Gradient Desccent, it will converge to global minimum because here our hypothesis function `h(x)` is not linear, it is a sigmoid function and when we plot `J(θ)` with respect to `θ`, this is what it looks like:\u003cbr\u003e\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/non-convex.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\nSince this is has many local minimum, the Gradient Descent will not guarantee to converge to global minimum. We can also call this as non-convex function. We need a convex function which has no local minimum but one globla minimum.\n\n\u003cbr\u003e\n\nOur cost function for Logistic Regression:\u003cbr\u003e\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/cost-func_1.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\n\nWhen `y=1`, we get the following plot for `J(θ)` vs `h(x)`:\u003cbr\u003e\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/cost-func_graph_1.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\n\nFew interesting properties about this we see are:\n\u003cp align = center\u003eif y = 1 and h(x) = 1, then Cost J(θ) = 0\u003cbr\u003e\nBut as h(x) approaches 0, Cost approaches ∞\u003c/p\u003e\u003cbr\u003e\n\nSimilarly When `y = 0`, we get the following plot for `J(θ)` vs `h(x)`:\u003cbr\u003e\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/cost-func_graph_2.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\n\nFew interesting properties about this we see are:\n\u003cp align = center\u003eif y = 0 and h(x) = 0, then Cost J(θ) = 0\u003cbr\u003e\nBut as h(x) approaches 1, Cost approaches ∞\u003c/p\u003e\u003cbr\u003e\n\nWriting the cost function this way guarantees that J(θ) is convex for logistic regression.\u003cbr\u003e\n\nWe can compress our `Cost Function's` two conditional cases into one case:\u003cbr\u003e\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/cost-func_2.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\n\nNotice that when `y=1`, then the second term will be zero and will not affect the result. And if `y=0`, then the first term will be zero and will not affect the result.\u003cbr\u003e\nTherefore, our `Cost function` is :\u003cbr\u003e\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/cost-func_3.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\n\n## Gradient Descent\nGeneral form of Gradient Descent :\u003cbr\u003e\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/general_gradient_descent.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\nWorking out the derivative part using Calculus we get:\u003cbr\u003e\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/gradient_descent.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\n\nThis algorithm looks similar to [Gradient Descent of Linear Regression](https://github.com/JuzerShakir/Linear_Regression#gradient-descent-for-multivariate-linear-regression) but its not since `h(x)` here is a `logistic/sigmoid function` and `h(x)` in `linear regression` is θ\u003csup\u003eT\u003c/sup\u003ex.\n\n## Multiclass Classification\nNow we will approach the classification of data when we have more than `2 categories`. Instead of y ϵ {0,1} we will expand our definition so that y ϵ {0,1,2....,n}.\u003cbr\u003e\nWe need to predict the probability that `y` is a member of one of our classes from {0,1,2....,n}.\u003cbr\u003e\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/Multiclass_Probability.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\n\nWe apply `Logistic Regression` to each class, and then choose hypothesis which returned the highest probability and use to predict new `x` value.\n\n## Regularization\nLet's say that we have a function:\u003cbr\u003e\nh(x) = g(θ\u003csub\u003e0\u003c/sub\u003e + θ\u003csub\u003e1\u003c/sub\u003ex\u003csub\u003e1\u003c/sub\u003e + θ\u003csub\u003e2\u003c/sub\u003ex\u003csub\u003e1\u003c/sub\u003e\u003csup\u003e2\u003c/sup\u003e + θ\u003csub\u003e3\u003c/sub\u003ex\u003csub\u003e1\u003c/sub\u003e\u003csup\u003e2\u003c/sup\u003ex\u003csub\u003e2\u003c/sub\u003e + \nθ\u003csub\u003e4\u003c/sub\u003ex\u003csub\u003e1\u003c/sub\u003e\u003csup\u003e2\u003c/sup\u003ex\u003csub\u003e2\u003c/sub\u003e\u003csup\u003e2\u003c/sup\u003e + \nθ\u003csub\u003e5\u003c/sub\u003ex\u003csub\u003e1\u003c/sub\u003e\u003csup\u003e2\u003c/sup\u003ex\u003csub\u003e2\u003c/sub\u003e\u003csup\u003e3\u003c/sup\u003e\n............)\u003cbr\u003e\nand it fits the data as follows:\u003cbr\u003e\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/overfit.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\nAs we can clearly see this function overfits the data and will not generalize well for new or unseen data.\n\n### Cost Function for Regularization\nWe'll want to eliminate the influence of the parameters without actually getting rid of these features or changing the form of the hypothesis function. We instead modify our `Cost function.`\u003cbr\u003e\nOur Cost Function was:\u003cbr\u003e\n\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/cost-func_3.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\n\nWe can regularize this equation by adding a term to the end:\u003cbr\u003e\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/regularized_cost-func.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\n\nThe second sum, \u003cimg src = 'Formulas/term_1.PNG'\u003e, means to explicitly exclude the bias term, θ\u003csub\u003e0\u003c/sub\u003e, i.e the θ vector is indexed from 0 to n (holding n+1 values, θ\u003csub\u003e0\u003c/sub\u003e through θ\u003csub\u003en\u003c/sub\u003e), and this sum explicitly skips θ\u003csub\u003e0\u003c/sub\u003e, by running from 1 to n. Thus, when computing the equation, we should continously update the 2 following equation:\n\n### Gradeint Descent\n\u003cp align = 'center'\u003e\u003cimg src = 'Formulas/regularized_gradient_descent.PNG'\u003e\u003c/p\u003e\u003cbr\u003e\n\nThis may look identical to [Linear Regression's regularization Gradient Descent](https://github.com/JuzerShakir/Linear_Regression#gradient-descent-for-regularization) but the hypothesis function is different, here we have `Sigmoid or Logistic Function` and for Linear we have θ\u003csup\u003eT\u003c/sup\u003ex.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuzershakir%2Flogistic_regression","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjuzershakir%2Flogistic_regression","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuzershakir%2Flogistic_regression/lists"}