{"id":31132909,"url":"https://github.com/sjain2580/simple-linear-regression-model","last_synced_at":"2026-04-30T08:30:57.047Z","repository":{"id":314427206,"uuid":"1055476015","full_name":"sjain2580/Simple-Linear-Regression-Model","owner":"sjain2580","description":"This project demonstrates a simple, yet robust, multiple linear regression model built with Python and scikit-learn to predict median house values in California.","archived":false,"fork":false,"pushed_at":"2025-09-12T11:02:22.000Z","size":424,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-12T12:26:13.345Z","etag":null,"topics":["joblib","linear-regression","matplotlib","matplotlib-pyplot","numpy","python","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sjain2580.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-12T10:17:05.000Z","updated_at":"2025-09-12T11:02:26.000Z","dependencies_parsed_at":"2025-09-12T12:26:22.169Z","dependency_job_id":"45a43600-b874-4804-9b37-21a9d64a76f7","html_url":"https://github.com/sjain2580/Simple-Linear-Regression-Model","commit_stats":null,"previous_names":["sjain2580/simple-linear-regression-model"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/sjain2580/Simple-Linear-Regression-Model","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sjain2580%2FSimple-Linear-Regression-Model","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sjain2580%2FSimple-Linear-Regression-Model/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sjain2580%2FSimple-Linear-Regression-Model/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sjain2580%2FSimple-Linear-Regression-Model/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sjain2580","download_url":"https://codeload.github.com/sjain2580/Simple-Linear-Regression-Model/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sjain2580%2FSimple-Linear-Regression-Model/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275712384,"owners_count":25514205,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-18T02:00:09.552Z","response_time":77,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["joblib","linear-regression","matplotlib","matplotlib-pyplot","numpy","python","scikit-learn"],"created_at":"2025-09-18T05:02:09.994Z","updated_at":"2025-09-18T05:03:43.883Z","avatar_url":"https://github.com/sjain2580.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Simple Linear Regression Model - California Housing Price Prediction with Linear Regression\n## Overview\n\nThis project demonstrates a simple, yet robust, multiple linear regression model built with Python and scikit-learn to predict median house values in California.\n\n## Features\n\n- Multiple Features: The model uses multiple features (Median Income, House Age, and Average Rooms) for more accurate predictions.\n\n- Data Preprocessing: It includes a machine learning pipeline to handle data scaling, a crucial step for many models.\n\n- Model Persistence: The trained model is automatically saved to disk (linear_regression_model.joblib), allowing for easy reuse without retraining.\n\n- Comprehensive Evaluation: The script calculates and prints key metrics (Mean Squared Error and R-squared) to evaluate the model's performance.\n\n- Data Visualization: It generates and saves multiple plots (housing_prices_plot.png and housing_prices_residual_plot.png) for visual analysis.\n\n- Prediction Functionality: The script includes a practical example of how to use the trained model to make a prediction on new, unseen data.\n\n## Technologies used\n\n- Python: The core programming language for the project.\n\n- scikit-learn: A powerful machine learning library used for building the model, data splitting, and evaluation.\n\n- NumPy: A fundamental library for numerical operations and handling the dataset arrays.\n\n- Matplotlib: Used for creating the data visualizations, including the scatter and residual plots.\n\n- joblib: A library for saving and loading the trained machine learning model.\n\n## Model used (Architecture)\n\nThe core of this project is a LinearRegression model, which is a fundamental algorithm in supervised machine learning. The model is implemented within a scikit-learn pipeline. This pipeline's architecture consists of two main stages:\n\n1. Data Preprocessing: The StandardScaler scales the features to have a mean of 0 and a standard deviation of 1. This is crucial for linear models to perform well, as it prevents features with larger values from disproportionately influencing the model.\n\n2. Regression Model: The LinearRegression estimator fits a linear model to the preprocessed data, finding the best-fit line (or hyperplane in this case) that minimizes the sum of squared errors between the predicted and actual values.\n\n## Data Processing\n\nThe project performs the following data processing steps:\n\n- Data Splitting: The dataset is divided into a training set (80%) and a testing set (20%) to ensure the model's performance is evaluated on unseen data.\n\n- Feature Scaling: A StandardScaler is applied to the input features. This process transforms the data such that it has zero mean and unit variance. Scaling prevents features with a larger magnitude from dominating the learning process.\n\n## Data Analysis\n\nThis project performs data analysis through both quantitative metrics and visual inspection:\n\n- Quantitative Metrics: The model's performance is evaluated using two standard metrics:\n\n- Mean Squared Error (MSE): Measures the average squared difference between the estimated values and the actual value. A lower MSE indicates a better fit.\n\n- R-squared (R2): Represents the proportion of the variance in the dependent variable that can be predicted from the independent variables. A score closer to 1.0 indicates a stronger fit.\n\n## Model Training\n\nThe model training process is managed to be efficient and reproducible:\n\n- Training: The fit() method is called on the machine learning pipeline, which first scales the training data and then trains the LinearRegression model.\n\n- Persistence: Once trained, the entire pipeline is saved to a .joblib file. This is a common practice that \"persists\" the model, allowing it to be loaded directly for making predictions without the need for a full retraining process. The script intelligently checks for the existence of this file and either loads the existing model or trains a new one.\n\n## Prerequisites\n\n- Python 3.11+\n- Required packages (install via `pip`):\n  \n## How to Run the Project\n\n1. Clone this repository to your local machine:\n\n```bash\ngit clone [https://github.com/sjain2580/simple-linear-regression](https://github.com/sjain2580/simple-linear-regression.git)\ncd your-repo-name\n```\n\n2. Create and activate a virtual environment (optional but recommended):python -m venv venv\n\n- On Windows:\n  \n```bash\n.\\venv\\Scripts\\activate\n```\n\n- On macOS/Linux:\n\n```bash\nsource venv/bin/activate\n```\n\n3. Install the required libraries:\n\n```bash\n   pip install -r requirements.txt\n```\n\n4. To Run the Script: Simply execute the main Python script from your terminal.\n\n```bash\npython simple_linear_regression.py\n```\n\n## Visualization\n\n- Prediction Plot: Compares the model's predicted house values against the actual values to show how well the linear relationship is captured.\n![Housing Prices Plot](./housing_prices_plot.png)\n\n- Residual Plot: Plots the difference between the actual and predicted values. A good residual plot shows a random scatter of points around the zero line, indicating that the model's assumptions are met and it is not systematically under- or over-predicting.\n![Residual Plot](./housing_prices_residual_plot.png)\n\n## Contributors\n\n**\u003chttps://github.com/sjain2580\u003e**\nFeel free to fork this repository, submit issues, or pull requests to improve the project. Suggestions for model enhancement or additional visualizations are welcome!\n\n## Connect with Me\n\nFeel free to reach out if you have any questions or just want to connect!\n**[![LinkedIn](https://img.shields.io/badge/-LinkedIn-0A66C2?style=flat-square\u0026logo=linkedin\u0026logoColor=white)](https://www.linkedin.com/in/sjain04/)**\n**[![GitHub](https://img.shields.io/badge/-GitHub-181717?style=flat-square\u0026logo=github\u0026logoColor=white)](https://github.com/sjain2580)**\n**[![Email](https://img.shields.io/badge/-Email-D14836?style=flat-square\u0026logo=gmail\u0026logoColor=white)](mailto:sjain040395@gmail.com)**\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsjain2580%2Fsimple-linear-regression-model","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsjain2580%2Fsimple-linear-regression-model","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsjain2580%2Fsimple-linear-regression-model/lists"}