Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/magnitopic/ft-linear-regression
Your first implementation of a machine learning algorithm. Predicting the price of a car by it's milage
https://github.com/magnitopic/ft-linear-regression
42school ai ai-algorithm ft-linear-regression linear-regression machine-learning prediction-algorithm
Last synced: about 2 months ago
JSON representation
Your first implementation of a machine learning algorithm. Predicting the price of a car by it's milage
- Host: GitHub
- URL: https://github.com/magnitopic/ft-linear-regression
- Owner: magnitopic
- Created: 2024-10-28T06:33:17.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-11-09T23:01:03.000Z (about 2 months ago)
- Last Synced: 2024-11-10T00:16:48.274Z (about 2 months ago)
- Topics: 42school, ai, ai-algorithm, ft-linear-regression, linear-regression, machine-learning, prediction-algorithm
- Language: Python
- Homepage:
- Size: 15.6 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ft-linear-regression
Your first implementation of a machine learning algorithm.
## Description
This project is about implementing a simple linear regression algorithm. The goal is to predict the price of a car based on its mileage.
To do this we will use the following formula:
$$ y = \theta_0 + \theta_1 \cdot x$$
Where:
- $y$ is the price of the car (dependent variable)
- $x$ is the mileage of the car (independent variable)
- $\theta_0$ is the intercept
- $\theta_1$ is the slopeTo figure out the values of $\theta_0$ and $\theta_1$ we will use the [gradient descent algorithm](https://en.wikipedia.org/wiki/Gradient_descent).
## Clone and run project
```
git clone https://github.com/magnitopic/ft-linear-regression.gitcd ft-linear-regression
pip install -r requirements.txt
python src/main.py
```## Math
$$ y = \theta_0 + \theta_1 \cdot x$$
$$ERROR = \frac{1}{m} \sum_{i=1}^{m} (\widehat{y}-y)^2$$
Cost function to minimize:
$$J = \frac{1}{2m} \sum_{i=1}^{m} (\widehat{y}-y)^2$$
$$J = \frac{1}{2m} \sum_{i=1}^{m} (\widehat{y}-(\theta_0 + \theta_1 \cdot x))^2$$
To minimize the cost, we need the partial derivative of the cost function.
> The gradient is: $\nabla J = \left( \frac{\partial J}{\partial \theta_0} , \frac{\partial J}{\partial \theta_1} \right) $
$$\frac{\partial J}{\partial \theta_0} = \frac{-1}{m} \sum_{i=1}^{m} (\widehat{y}-y)$$
$$\frac{\partial J}{\partial \theta_1} = \frac{-1}{m} \sum_{i=1}^{m} (\widehat{y}-y) \cdot x$$
$$ \theta_0 := \alpha \cdot \left( \frac{\partial J}{\partial \theta_0} \right) = \alpha \left( \frac{-1}{m} \sum(\widehat{y} - y) \right) $$
$$ \theta_1 := \alpha \cdot \left( \frac{\partial J}{\partial \theta_1} \right) = \alpha \left( \frac{-1}{m} \sum x\cdot(\widehat{y} - y) \right) $$
## Evaluation of the Model: Coefficient of Determination ($R^2$)
$R^2$ is a metric that varies between 0 and 1, where:
- $R^2 = 0$: There is no linear relationship between the variables
- $R^2 = 1$: There is a perfect linear relationship between the variables (all points are on the line)
- $R^2 \approx 0.8$: Good fit in many contexts
- $R^2 < 0.2$: Poor fit, suggests non-linear relationship or no relationship at all```math
R^2 = 1 - \frac{\sum_{i=1}^{n} (Y_i - \widehat{Y}_i)^2}{\sum_{i=1}^{n} (Y_i - \bar{Y})^2}
```- $\sum_{i=1}^{n} (Y_i - \widehat{Y}_i)^2$ calculates the sum of squared residuals (difference between observed and predicted values).
- $\sum_{i=1}^{n} (Y_i - \bar{Y})^2$ calculates the total sum of squares (total variation in the data).
## Correlation vs Causation
It's important in this statistics example to remember that:
- Correlation does not imply causation
- There may exist false correlations
> [As shown by this website](https://www.tylervigen.com/spurious-correlations)
- The value of $R^2$ should be interpreted with the specific context in mind
- In social sciences, a value of $R^2$ of 0.7 is considered high
- In physics, a value of $R^2$ of 0.7 is considered low