{"id":24794075,"url":"https://github.com/njaffe/eda_example_2025","last_synced_at":"2026-05-09T15:31:29.261Z","repository":{"id":272844085,"uuid":"917441244","full_name":"njaffe/eda_example_2025","owner":"njaffe","description":"Sample end-to-end data analysis walkthrough using Python and Scikit-learn.","archived":false,"fork":false,"pushed_at":"2025-01-16T23:39:18.000Z","size":4278,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-29T22:42:51.352Z","etag":null,"topics":["data-science","data-visualization","jupyter-notebooks","machine-learning","python","regression","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/njaffe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-16T01:45:27.000Z","updated_at":"2025-01-17T19:22:57.000Z","dependencies_parsed_at":"2025-01-17T00:34:31.592Z","dependency_job_id":null,"html_url":"https://github.com/njaffe/eda_example_2025","commit_stats":null,"previous_names":["njaffe/eda_example_2025"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/njaffe%2Feda_example_2025","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/njaffe%2Feda_example_2025/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/njaffe%2Feda_example_2025/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/njaffe%2Feda_example_2025/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/njaffe","download_url":"https://codeload.github.com/njaffe/eda_example_2025/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245321350,"owners_count":20596334,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","data-visualization","jupyter-notebooks","machine-learning","python","regression","scikit-learn"],"created_at":"2025-01-29T22:32:48.978Z","updated_at":"2026-05-09T15:31:24.236Z","avatar_url":"https://github.com/njaffe.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Housing Data Analysis and Model Comparison\n\nThis repository contains a series of Jupyter notebooks that demonstrate the process of analyzing a housing dataset and comparing various machine learning models for predicting housing prices. The analysis is broken down into distinct steps to ensure modularity and easy understanding of each stage of the process.\n\n## Project Structure\n\nThe project consists of the following notebooks and files:\n\n### Notebooks:\n\n- 1_data_loading_preprocessing.ipynb\n    - Loads the housing dataset and performs initial data preprocessing steps. This includes handling missing values, encoding categorical variables, and saving the cleaned data for further use.\n\n- 2_modeling-linreg.ipynb\n    - Implements a simple linear regression model on the processed dataset. The model is trained and evaluated with performance metrics such as Mean Squared Error (MSE).\n\n- 3_modeling-quad_factor.ipynb\n    - Extends the modeling approach by adding quadratic features to the dataset and fitting a quadratic regression model to improve performance.\n\n- 4_modeling-regularization.ipynb\n    - Applies regularization techniques such as Lasso and Ridge regression to prevent overfitting and improve model generalization.\n\n- 5_compare_to_other_models.ipynb\n    - Compares the performance of the linear, quadratic, and regularized models to other machine learning models, such as decision trees and random forests.\n\n- 6_compare_model_performance.ipynb\n    - Evaluates and compares the performance of all models based on metrics like R-squared, MSE, and visualization of prediction errors.\n\n### Other Files:\n\n- data/\n    - The folder contains the raw dataset used for the analysis. It should be referenced for loading data in the relevant notebooks.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnjaffe%2Feda_example_2025","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnjaffe%2Feda_example_2025","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnjaffe%2Feda_example_2025/lists"}