{"id":25042481,"url":"https://github.com/s1dewalker/model_validation","last_synced_at":"2025-06-17T03:35:54.733Z","repository":{"id":268202658,"uuid":"903605375","full_name":"s1dewalker/Model_Validation","owner":"s1dewalker","description":"Model Management in Python. Steps involved in Model Validation and tuning. Testing Model Assumptions in Factor Analysis with OLS Regression.","archived":false,"fork":false,"pushed_at":"2025-02-07T04:19:20.000Z","size":6424,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-30T23:14:30.797Z","etag":null,"topics":["assumptions","bias-variance","cross-validation","hyperparameter-tuning","linear-regression-models","model-validation","ols-regression","python","regression","regression-models","tuning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/s1dewalker.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-15T03:31:35.000Z","updated_at":"2025-02-07T04:19:23.000Z","dependencies_parsed_at":null,"dependency_job_id":"2552807a-8f25-4eb1-b9fb-5f080ffa3d0a","html_url":"https://github.com/s1dewalker/Model_Validation","commit_stats":null,"previous_names":["s1dewalker/model_validation"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/s1dewalker/Model_Validation","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/s1dewalker%2FModel_Validation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/s1dewalker%2FModel_Validation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/s1dewalker%2FModel_Validation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/s1dewalker%2FModel_Validation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/s1dewalker","download_url":"https://codeload.github.com/s1dewalker/Model_Validation/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/s1dewalker%2FModel_Validation/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260286414,"owners_count":22986585,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["assumptions","bias-variance","cross-validation","hyperparameter-tuning","linear-regression-models","model-validation","ols-regression","python","regression","regression-models","tuning"],"created_at":"2025-02-06T04:46:25.179Z","updated_at":"2025-06-17T03:35:54.713Z","avatar_url":"https://github.com/s1dewalker.png","language":"Jupyter Notebook","readme":"\u003cimg src=\"sc/MODEL VALIDATION.png\" alt=\"Description\" width=\"1000\"\u003e\n\u003cbr/\u003e\n\n## Example 1: Model validation of Assumptions of Linear regression in Fama French 3-Factor Model\n\n### 1. Checking Multicollinearity of features or independent variables w/ Correlation matrix\n### 2. Checking Linearity w/ Scatter plots\n### 3. Checking Independence of residuals w/ Autocorrelation Function (ACF) and D-W test\n### 4. Checking Normality of residuals w/ histogram\n### 5. Checking Homoscedasticity (equal variance) of Residuals w/ scatter plot of residuals and fitted values\n\u003cbr/\u003e\n\n\u003cimg src=\"sc/corr_sp.JPG\" alt=\"Description\" width=\"700\"\u003e\n\n\u003cimg src=\"sc/resid.JPG\" alt=\"Description\" width=\"700\"\u003e\n\n\u003cbr/\u003e\n\nConsequences: \u003cbr/\u003e\n### 1. Multicollinearity = Redundancy = It will be difficult for the model to find which feature is actually contributing to predict the target \n### 2. Non-linearity = Model won't capture the relationship closely, leading to large errors in fitting\n### 3. Autocorrelation in residuals = Missing something important. Check for some important feature\n### 4. Non-Normality of residuals = Assumption of tests of having a normal distribution on residuals won't hold. Apply transformations on features.\n### 5. No Homoscedasticity of residuals = less precision in estimates\n\u003cbr/\u003e\n\n### [Check out Model Validation for Linear Regression in Factor analysis in Python](https://github.com/s1dewalker/Model_Validation/blob/main/Multi_Factor_Analysis3.ipynb)\n\n\u003cbr/\u003e\n\n## Example 2: Model validation and tuning in Random Forest Regression, on a continuos data\n\n### 1. Get the data\n### 2. Define the target (y) and features (X)\n### 3. Split the data into training and testing set (validation if required)\n### 4. Initiate a model, set parameters, and Fit the training set | `X_train, y_train`\n### 5. Predict on `X_test`\n### 6. Accuracy or Error metrics on `y_test` | Ex: R squared\n### 7. Bias-Variance trade-off check | Balancing underfitting and overfitting\n### 8. Iterate to tune the model (from step 4)\n### 9. Cross Validation | if model not generalizing well\n### 10. Selecting the best model w/ Hyperparameter tuning\n\u003cbr/\u003e\n\n### [Check out Model Validation and Tuning for RFR in Python](https://github.com/s1dewalker/Model_Validation/blob/main/Model_Validation.ipynb) \n\n\u003cbr/\u003e\u003cbr/\u003e\n\nFew Details:\n\n## Bias-Variance trade-off \n\n\u003cimg src=\"sc/biasvariance.JPG\" alt=\"Description\" width=\"500\"\u003e\n\n\n**Bias = failing to find relationship b/w data and response** = ERROR due to OVERLY SIMPLISTIC models (underfitting) \u003cbr/\u003e\n\n**Variance = following training data too closely** = ERROR due to OVERLY COMPLEX models (overfitting) that are SENSITIVE TO FLUCTUATIONS (noise) in the training data \u003cbr/\u003e\n\u003cbr/\u003e\n\u003cbr/\u003e\n\nHigh Bias + Low Variance: Underfitting (simpler models) \u003cbr/\u003e\nLow Bias + High Variance: Overfitting (complex models) \u003cbr/\u003e\n\u003cbr/\u003e\n### **Training error high = Underfitting** \n### **Testing error \u003e\u003e Training error = Overfitting** \u003cbr/\u003e\n \u003cbr/\u003e\n\n\n## Cross Validation - An efficient method to find the balance\n\u003cimg src=\"sc/cvimg2.png\" alt=\"Description\" width=\"500\"\u003e\n\n###### by sharpsightlabs.com\n\n### Splitting data into distinct subsets. Each subset used once as a test set while the remaining as training set. Results from all splits are averaged. \u003cbr/\u003e\n\u003cbr/\u003e\nWhy use? \u003cbr/\u003e\n\n- Better Generalization: If our models are not generalizing well (Generalization refers to a model's ability to perform well on new, unseen data, not just the data it was trained on)\n- Reliable Evaluation\n- Efficient use of data (if we have limited data)\n \u003cbr/\u003e\n \nTypes: \u003cbr/\u003e\n1. **cross_val_score**\n\u003cimg src=\"sc/cvs.JPG\" alt=\"Description\" width=\"500\"\u003e\n\n \u003cbr/\u003e\n \n2. **Leave-one-out-cross-validation (LOOCV)**\n\nUse when data is limited, but computationally expensive \u003cbr/\u003e\n**Each data point is used as a test set** \u003cbr/\u003e\n\n`cv = X.shape[0]`\n\n\u003cbr/\u003e\n\n\n\n##### [LinkedIn](https://www.linkedin.com/in/sujay-bhaumik-d12/) | s1dewalker23@gmail.com | [Research Works](https://github.com/s1dewalker/Research-Works)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fs1dewalker%2Fmodel_validation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fs1dewalker%2Fmodel_validation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fs1dewalker%2Fmodel_validation/lists"}