https://github.com/magnitopic/ft-linear-regression

Your first implementation of a machine learning algorithm. Predicting the price of a car by it's milage
https://github.com/magnitopic/ft-linear-regression

42school ai ai-algorithm ft-linear-regression linear-regression machine-learning matplotlib numpy pandas prediction-algorithm python

Last synced: 2 months ago
JSON representation

Your first implementation of a machine learning algorithm. Predicting the price of a car by it's milage

Host: GitHub
URL: https://github.com/magnitopic/ft-linear-regression
Owner: magnitopic
Created: 2024-10-28T06:33:17.000Z (7 months ago)
Default Branch: main
Last Pushed: 2024-11-09T23:01:03.000Z (6 months ago)
Last Synced: 2025-01-19T13:55:45.663Z (4 months ago)
Topics: 42school, ai, ai-algorithm, ft-linear-regression, linear-regression, machine-learning, matplotlib, numpy, pandas, prediction-algorithm, python
Language: Python
Homepage:
Size: 15.6 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # ft-linear-regression

Your first implementation of a machine learning algorithm.







## Description

This project is about implementing a simple linear regression algorithm. The goal is to predict the price of a car based on its mileage.

To do this we will use the following formula:

$$ y = \theta_0 + \theta_1 \cdot x$$

Where:

-   $y$ is the price of the car (dependent variable)

-   $x$ is the mileage of the car (independent variable)

-   $\theta_0$ is the intercept

-   $\theta_1$ is the slope

To figure out the values of $\theta_0$ and $\theta_1$ we will use the [gradient descent algorithm](https://en.wikipedia.org/wiki/Gradient_descent).

## Clone and run project

```

git clone https://github.com/magnitopic/ft-linear-regression.git

cd ft-linear-regression

pip install -r requirements.txt

python src/main.py

```

## Math

$$ y = \theta_0 + \theta_1 \cdot x$$

$$ERROR = \frac{1}{m} \sum_{i=1}^{m} (\widehat{y}-y)^2$$

Cost function to minimize:

$$J = \frac{1}{2m} \sum_{i=1}^{m} (\widehat{y}-y)^2$$

$$J = \frac{1}{2m} \sum_{i=1}^{m} (\widehat{y}-(\theta_0 + \theta_1 \cdot x))^2$$

To minimize the cost, we need the partial derivative of the cost function.

> The gradient is: $\nabla J = \left( \frac{\partial J}{\partial \theta_0} , \frac{\partial J}{\partial \theta_1} \right) $

$$\frac{\partial J}{\partial \theta_0} = \frac{-1}{m} \sum_{i=1}^{m} (\widehat{y}-y)$$

$$\frac{\partial J}{\partial \theta_1} = \frac{-1}{m} \sum_{i=1}^{m} (\widehat{y}-y) \cdot x$$

$$ \theta_0 := \alpha \cdot \left( \frac{\partial J}{\partial \theta_0} \right) = \alpha \left( \frac{-1}{m} \sum(\widehat{y} - y) \right) $$

$$ \theta_1 := \alpha \cdot \left( \frac{\partial J}{\partial \theta_1} \right) = \alpha \left( \frac{-1}{m} \sum x\cdot(\widehat{y} - y) \right) $$

## Evaluation of the Model: Coefficient of Determination ($R^2$)

$R^2$ is a metric that varies between 0 and 1, where:

-   $R^2 = 0$: There is no linear relationship between the variables

-   $R^2 = 1$: There is a perfect linear relationship between the variables (all points are on the line)

-   $R^2 \approx 0.8$: Good fit in many contexts

-   $R^2 < 0.2$: Poor fit, suggests non-linear relationship or no relationship at all

```math

R^2 = 1 - \frac{\sum_{i=1}^{n} (Y_i - \widehat{Y}_i)^2}{\sum_{i=1}^{n} (Y_i - \bar{Y})^2}

```

-   $\sum_{i=1}^{n} (Y_i - \widehat{Y}_i)^2$ calculates the sum of squared residuals (difference between observed and predicted values).

-   $\sum_{i=1}^{n} (Y_i - \bar{Y})^2$ calculates the total sum of squares (total variation in the data).

## Correlation vs Causation

It's important in this statistics example to remember that:

-   Correlation does not imply causation

-   There may exist false correlations

    > [As shown by this website](https://www.tylervigen.com/spurious-correlations)

-   The value of $R^2$ should be interpreted with the specific context in mind

    -   In social sciences, a value of $R^2$ of 0.7 is considered high

    -   In physics, a value of $R^2$ of 0.7 is considered low

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/magnitopic/ft-linear-regression

Awesome Lists containing this project

README