Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/shimazadeh/ft_linear_regression
Implementing a modular linear regression from scratch to predict the price of cars using a gradient descent algorithm.
https://github.com/shimazadeh/ft_linear_regression
data-analysis data-science hyperparameter-tuning linear-regression predictive-modeling
Last synced: about 2 months ago
JSON representation
Implementing a modular linear regression from scratch to predict the price of cars using a gradient descent algorithm.
- Host: GitHub
- URL: https://github.com/shimazadeh/ft_linear_regression
- Owner: shimazadeh
- Created: 2023-07-07T21:06:54.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-12-26T19:18:37.000Z (about 1 year ago)
- Last Synced: 2023-12-26T21:02:58.780Z (about 1 year ago)
- Topics: data-analysis, data-science, hyperparameter-tuning, linear-regression, predictive-modeling
- Language: Python
- Homepage:
- Size: 232 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# DataScience | Linear regression from Scratch | 42Paris
## Objective
To create a modular linear regression model from scratch, train the model on the given dataset, save generated indexes and use them to predict car price depending on it's mileage.## Requirements
Before running the program, make sure you have the following dependencies installed:
- numpy: A library for numerical computations.
- pandas: A library for data manipulation and analysis.
- matplotlib: A library for data visualization.
- scikit-learn: A machine learning library for data preprocessing and evaluation.
You can install these packages using pip if you don't have them already: pip install numpy pandas matplotlib scikit-learn## Usage
- git clone https://github.com/shimazadeh/Ft_linear_regression.git Linear-regression
- cd Linear-regression
- python3 main.py [path/to/dataset.csv] [#iteration] [learning_rate] [mode]
- There are two modes to the program:
- mode 1: finds the best thetas to be initialized using hyperparameter tuning technique
- mode 2: uses the parameters from best_params.json file created in the previous option and train and test the model based on that.## Approach
- Data Preprocessing: The program reads the dataset from the CSV file, normalizes the data to the range of 0-1.
- Train-Test Splitting: The program splits the dataset into training and test sets.
- Hyperparameter Tuning: It uses hyperparameter tuning method to find the best initial parameters (thetas) for the linear regression model.
- Model Training: The linear regression model is trained using gradient descent. The program visualize the training process and print the model parameters, Loss, MSE and MAE each iteration:
![Alt text]()- Model Evaluation: After training, the program predicts prices using the test set and calculates the Mean Squared Error (MSE) and Mean Absolute Error (MAE) to evaluate the model's performance.
- Visualization: The program visualizes the normalized dataset, the regression model, and the cost function as shown below:
![Alt text](output.png)