https://github.com/potakaaa/regression-from-scratch
Linear regression formulas built from scratch.
https://github.com/potakaaa/regression-from-scratch
Last synced: 9 months ago
JSON representation
Linear regression formulas built from scratch.
- Host: GitHub
- URL: https://github.com/potakaaa/regression-from-scratch
- Owner: potakaaa
- Created: 2025-09-11T13:51:06.000Z (9 months ago)
- Default Branch: development
- Last Pushed: 2025-09-13T17:15:53.000Z (9 months ago)
- Last Synced: 2025-09-13T19:19:56.157Z (9 months ago)
- Language: Python
- Size: 7.81 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Linear Regression From Scratch
A complete implementation of linear regression (single and multiple variables) built entirely from scratch without using any machine learning libraries like scikit-learn.
## 📂 Project Structure
```
├── main.py # Main entry point and pipeline orchestration
├── data/
│ ├── __init__.py
│ └── loader.py # Data loading functionality
├── utils/
│ ├── __init__.py
│ ├── data_split.py # Train/test splitting utilities
│ └── preprocessing.py # Feature normalization/scaling
├── model/
│ ├── __init__.py
│ ├── parameters.py # Weight initialization
│ ├── predict.py # Hypothesis function
│ ├── gradients.py # Gradient computation
│ ├── update.py # Parameter updates
│ └── train.py # Main training loop
├── metrics/
│ ├── __init__.py
│ ├── loss.py # MSE, RMSE calculations
│ └── evaluation.py # R², NRMSE metrics
└── visualization/
├── __init__.py
└── plot.py # Training curves and regression plots
```
## 🚀 Implementation Pipeline
1. **Data Preparation**
- Load data from CSV/text files
- Split into training and testing sets
- Optional feature normalization
2. **Model Setup**
- Initialize weights and bias parameters
- Define hypothesis function
3. **Training**
- Implement gradient descent algorithm
- Compute cost function (MSE)
- Update parameters iteratively
4. **Evaluation**
- Calculate performance metrics (R², RMSE)
- Validate on test set
5. **Visualization** (Optional)
- Plot training loss curves
- Visualize regression line vs actual data
## 📋 Function Signatures
### Data Module
- `load_data(filepath)` → Load and return X, y
- `train_test_split(X, y, test_size=0.2, seed=42)` → Split data
### Utils Module
- `normalize(X)` → Scale features to [0,1] or standardize
### Model Module
- `initialize_weights(n_features)` → Return weights, bias
- `predict(X, weights, bias)` → Return predictions
- `compute_gradients(X, y, weights, bias)` → Return gradients
- `update_weights(weights, bias, gradients, lr)` → Update parameters
- `train(X, y, lr, epochs)` → Train model and return parameters
### Metrics Module
- `mse(y_true, y_pred)` → Mean Squared Error
- `rmse(y_true, y_pred)` → Root Mean Squared Error
- `r2_score(y_true, y_pred)` → R² coefficient
- `nrmse(y_true, y_pred)` → Normalized RMSE
### Visualization Module
- `plot_loss(history)` → Plot training loss curve
- `plot_regression_line(X, y, y_pred)` → Scatter plot with regression line
## 🎯 Key Features
- **No external ML libraries**: Pure Python/NumPy implementation
- **Modular design**: Each component in separate files
- **Educational focus**: Step-by-step implementation for learning
- **Multiple metrics**: Comprehensive evaluation suite
- **Visualization support**: Training progress and results plotting
## 🏃 Getting Started
1. Implement functions in each module (follow the comments in each file)
2. Run `python main.py` to execute the complete pipeline
3. Modify hyperparameters and observe results
4. Use visualization functions to understand model behavior
## 📊 Mathematical Foundation
The implementation follows the standard linear regression approach:
**Hypothesis**: `h(x) = w₁x₁ + w₂x₂ + ... + wₙxₙ + b`
**Cost Function**: `J(w,b) = (1/(2m)) Σ(h(xⁱ) - yⁱ)²`
**Gradient Descent**:
- `w := w - α * ∂J/∂w`
- `b := b - α * ∂J/∂b`
Where:
- `m` = number of training examples
- `α` = learning rate
- `w` = weight parameters
- `b` = bias term