https://github.com/jahanostg/linear-regression_ml-algorithm
Linear Regression Algorithm
https://github.com/jahanostg/linear-regression_ml-algorithm
colab-notebook matplotlib numpy pandas scikit-learn seaborn
Last synced: about 2 months ago
JSON representation
Linear Regression Algorithm
- Host: GitHub
- URL: https://github.com/jahanostg/linear-regression_ml-algorithm
- Owner: jahanOSTG
- Created: 2025-06-18T07:56:13.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-18T14:53:45.000Z (about 1 year ago)
- Last Synced: 2025-06-18T15:19:34.779Z (about 1 year ago)
- Topics: colab-notebook, matplotlib, numpy, pandas, scikit-learn, seaborn
- Language: Jupyter Notebook
- Homepage:
- Size: 144 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Linear Regression on Custom Dataset
This repository demonstrates a full linear regression workflow using a dataset.
---
## Dataset Overview
The dataset (`friend.csv`) contains the following columns:
| Column | Description |
|--------|-------------------------------------------|
| ID | Unique identifier (ignored) |
| size | Custom feature (e.g., test score, item size, etc.) |
| Age | Age of the individual |
| prize | Prize or cost-related value |
| Number | 🎯 Target variable |
---
## ✅ Steps Performed
### 1. Mount Google Drive
To access the dataset stored in Google Drive.
### 2. Import Libraries
Used libraries:
- `pandas`
- `numpy`
- `matplotlib`
- `seaborn`
- `scikit-learn`
### 3. Load and Preprocess Data
- Removed unnecessary columns (`ID`, if not useful).
- Handled whitespace in column names.
- One-hot encoding skipped as no categorical columns exist.
### 4. Train-Test Split
Split the data into training and testing sets using `train_test_split()`.
### 5. Train Model
Trained a `LinearRegression()` model on the training data.
### 6. Prediction & Evaluation
- Generated predictions on the test set.
- Calculated **MAE**, **MSE**, **RMSE**, and **R²** for performance analysis.
### 7. Visualization
- **Regression Line Plot** to compare actual vs predicted values.
- **Residual Plot** to check errors.
- **Heatmap** to show correlation between all numeric features.
---
## About **Linear Regression**
### Advantages
- Simple to implement and efficient to train
- Overfitting can be reduced by regularization
- Performs well when the dataset is linearly separable.
### Disadvantages
- Assumes that the data is independent which is rare in real life
- Prone to noise and overfitting
- Sensitive to outliers.