https://github.com/jahanostg/linear-regression_ml-algorithm

Linear Regression Algorithm
https://github.com/jahanostg/linear-regression_ml-algorithm

colab-notebook matplotlib numpy pandas scikit-learn seaborn

Last synced: 2 months ago
JSON representation

Linear Regression Algorithm

Host: GitHub
URL: https://github.com/jahanostg/linear-regression_ml-algorithm
Owner: jahanOSTG
Created: 2025-06-18T07:56:13.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-06-18T14:53:45.000Z (about 1 year ago)
Last Synced: 2025-06-18T15:19:34.779Z (about 1 year ago)
Topics: colab-notebook, matplotlib, numpy, pandas, scikit-learn, seaborn
Language: Jupyter Notebook
Homepage:
Size: 144 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Linear Regression on Custom Dataset

This repository demonstrates a full linear regression workflow using a dataset.

---

## Dataset Overview

The dataset (`friend.csv`) contains the following columns:

| Column | Description |
|--------|-------------------------------------------|
| ID | Unique identifier (ignored) |
| size | Custom feature (e.g., test score, item size, etc.) |
| Age | Age of the individual |
| prize | Prize or cost-related value |
| Number | 🎯 Target variable |

---

## ✅ Steps Performed

### 1. Mount Google Drive
To access the dataset stored in Google Drive.

### 2. Import Libraries
Used libraries:
- `pandas`
- `numpy`
- `matplotlib`
- `seaborn`
- `scikit-learn`

### 3. Load and Preprocess Data
- Removed unnecessary columns (`ID`, if not useful).
- Handled whitespace in column names.
- One-hot encoding skipped as no categorical columns exist.

### 4. Train-Test Split
Split the data into training and testing sets using `train_test_split()`.

### 5. Train Model
Trained a `LinearRegression()` model on the training data.

### 6. Prediction & Evaluation
- Generated predictions on the test set.
- Calculated **MAE**, **MSE**, **RMSE**, and **R²** for performance analysis.

### 7. Visualization
- **Regression Line Plot** to compare actual vs predicted values.
- **Residual Plot** to check errors.
- **Heatmap** to show correlation between all numeric features.

---

## About **Linear Regression**
### Advantages
- Simple to implement and efficient to train
- Overfitting can be reduced by regularization
- Performs well when the dataset is linearly separable.

### Disadvantages
- Assumes that the data is independent which is rare in real life
- Prone to noise and overfitting
- Sensitive to outliers.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jahanostg/linear-regression_ml-algorithm

Awesome Lists containing this project

README