{"id":25819036,"url":"https://github.com/pngo1997/multiple-regression-and-feature-selection-analysis","last_synced_at":"2026-05-12T08:31:38.510Z","repository":{"id":275032700,"uuid":"924855474","full_name":"pngo1997/Multiple-Regression-and-Feature-Selection-Analysis","owner":"pngo1997","description":"Explores multiple linear regression, feature selection, Ridge \u0026 Lasso regression, and Stochastic Gradient Descent (SGD) regression.","archived":false,"fork":false,"pushed_at":"2025-01-30T19:15:46.000Z","size":1317,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-30T20:25:34.482Z","etag":null,"topics":["feature-selection","lasso-regression","multiple-linear-regression","python","ridge-regression","stochastic-gradient-descent"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pngo1997.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-30T19:11:50.000Z","updated_at":"2025-01-30T19:34:29.000Z","dependencies_parsed_at":"2025-01-30T20:36:22.710Z","dependency_job_id":null,"html_url":"https://github.com/pngo1997/Multiple-Regression-and-Feature-Selection-Analysis","commit_stats":null,"previous_names":["pngo1997/multiple-regression-and-feature-selection-analysis"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pngo1997%2FMultiple-Regression-and-Feature-Selection-Analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pngo1997%2FMultiple-Regression-and-Feature-Selection-Analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pngo1997%2FMultiple-Regression-and-Feature-Selection-Analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pngo1997%2FMultiple-Regression-and-Feature-Selection-Analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pngo1997","download_url":"https://codeload.github.com/pngo1997/Multiple-Regression-and-Feature-Selection-Analysis/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241122324,"owners_count":19913456,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["feature-selection","lasso-regression","multiple-linear-regression","python","ridge-regression","stochastic-gradient-descent"],"created_at":"2025-02-28T08:14:26.126Z","updated_at":"2025-11-23T08:02:43.639Z","avatar_url":"https://github.com/pngo1997.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🏗️ Multiple Regression and Feature Selection Analysis  \n\n## 📜 Overview  \nThis project explores **multiple linear regression, feature selection, Ridge \u0026 Lasso regression, and Stochastic Gradient Descent (SGD) regression**. The dataset is split into **80% training and 20% testing**, and different regression techniques are applied to predict a target variable. The analysis includes **cross-validation, feature selection, and model optimization** to improve regression performance.  \n\n## 🎯 Problem Explanation  \nThe project aims to:  \n1. **Perform standard multiple linear regression** and evaluate its effectiveness.  \n2. **Select the most informative features** using `SelectPercentile`.  \n3. **Optimize Ridge \u0026 Lasso Regression** by tuning the **alpha parameter**.  \n4. **Train a Stochastic Gradient Descent (SGD) Regressor** with grid search for hyperparameter selection.  \n5. **Compare models using RMSE, MAE, and cross-validation performance**.  \n\n## 🛠️ Implementation Details  \n### **1. Data Preprocessing**  \n- **Missing values handled** using mean imputation.  \n- **Basic statistics computed** (mean, std dev, min, max).  \n- **Target variable extracted**, and dataset is split (80% train, 20% test).  \n\n### **2. Multiple Linear Regression**  \n- Standard **multiple linear regression** applied.  \n- **RMSE computed** on training data.  \n- **Regression coefficients plotted** to visualize feature importance.  \n- **10-fold cross-validation RMSE** compared to training RMSE.  \n\n### **3. Feature Selection with Regression**  \n- `SelectPercentile` with `f_regression` used to identify top features.  \n- **K-fold cross-validation (k=5)** determines the optimal percentage of features.  \n- **Mean Absolute Error (MAE) plotted** vs. feature selection percentage.  \n\n### **4. Ridge \u0026 Lasso Regression with Alpha Optimization**  \n- A function is implemented to:  \n  - Accept **data, target variable, parameter range (alpha), and model type**.  \n  - Perform **K-fold cross-validation (k=5)**.  \n  - **Plot error values** vs. alpha for Ridge \u0026 Lasso regression.  \n  - Train on the best alpha and **evaluate on test data**.  \n- **Bias-variance trade-off analyzed**.  \n\n### **5. Stochastic Gradient Descent Regression**  \n- **Features standardized** using `StandardScaler`.  \n- **GridSearchCV applied** to compare penalty parameters (`l2`, `l1`) and different alpha values (0.0001 to 10).  \n- **Elastic Net model selection** performed to find the best `l1_ratio`.  \n\n## 🚀 Technologies Used  \n- **Python** (for regression modeling and evaluation).  \n- **Pandas \u0026 NumPy** (for data preprocessing and statistical computations).  \n- **Scikit-learn** (for regression models, feature selection, and cross-validation).  \n- **Matplotlib \u0026 Seaborn** (for data visualization).  \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpngo1997%2Fmultiple-regression-and-feature-selection-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpngo1997%2Fmultiple-regression-and-feature-selection-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpngo1997%2Fmultiple-regression-and-feature-selection-analysis/lists"}