Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/s1dewalker/model_validation

Model Management in Python. Steps involved in Model Validation and tuning. Testing Model Assumptions in Factor Analysis with OLS Regression.
https://github.com/s1dewalker/model_validation

assumptions bias-variance cross-validation hyperparameter-tuning linear-regression-models model-validation ols-regression python regression regression-models tuning

Last synced: about 5 hours ago
JSON representation

Model Management in Python. Steps involved in Model Validation and tuning. Testing Model Assumptions in Factor Analysis with OLS Regression.

Awesome Lists containing this project

README

        

Description

## Example 1: Model validation of Assumptions of OLS regression in Fama French 3-Factor Model

### 1. Checking Multicollinearity of features or independent variables w/ Correlation matrix
### 2. Checking Linearity w/ Scatter plots
### 3. Checking Idependence of residuals w/ Autocorrelation Function (ACF) and D-W test
### 4. Checking Normality of residuals w/ histogram
### 5. Checking Homoscedasticity (equal variance) of Residuals w/ scatter plot of residuals and fitted values

Description

Description


Consequences:

### 1. Multicollinearity = Redundancy = It will be difficult for the model to find which feature is actually contributing to predict the target
### 2. Non-linearity = Model won't capture the relationship closely, leading to large errors in fitting
### 3. Autocorrelation in residuals = Missing something important. Check for some important feature
### 4. Non-Normality of residuals = Assumption of tests of having a normal distribution on residuals won't hold. Apply transformations on features.
### 5. No Homoscedasticity of residuals = less precision in estimates

### [Check out Model Validation for OLS Regression in Factor analysis in Python](https://github.com/s1dewalker/Model_Validation/blob/main/Multi_Factor_Analysis3.ipynb)


## Example 2: Model validation and tuning in Random Forest Regression, on a continuos data

### 1. Get the data
### 2. Define the target (y) and features (X)
### 3. Split the data into training and testing set (validation if required)
### 4. Initiate a model, set parameters, and Fit the training set | `X_train, y_train`
### 5. Predict on `X_test`
### 6. Accuracy or Error metrics on `y_test` | Ex: R squared
### 7. Bias-Variance trade-off check | Balancing underfitting and overfitting
### 8. Iterate to tune the model (from step 4)
### 9. Cross Validation | if model not generalizing well
### 10. Selecting the best model w/ Hyperparameter tuning

### [Check out Model Validation and Tuning for RFR in Python](https://github.com/s1dewalker/Model_Validation/blob/main/Model_Validation.ipynb)



Few Details:

## Bias-Variance trade-off

Description

**Bias = failing to find relationship b/w data and response** = ERROR due to OVERLY SIMPLISTIC models (underfitting)

**Variance = following training data too closely** = ERROR due to OVERLY COMPLEX models (overfitting) that are SENSITIVE TO FLUCTUATIONS (noise) in the training data




High Bias + Low Variance: Underfitting (simpler models)

Low Bias + High Variance: Overfitting (complex models)



### **Training error high = Underfitting**
### **Testing error >> Training error = Overfitting**


## Cross Validation
Description

###### by sharpsightlabs.com

### Splitting data into distinct subsets. Each subset used once as a test set while the remaining as training set. Results from all splits are averaged.



Why use?

- Better Generalization: If our models are not generalizing well (Generalization refers to a model's ability to perform well on new, unseen data, not just the data it was trained on)
- Reliable Evaluation
- Efficient use of data (if we have limited data)



Types:

1. **cross_val_score**
Description




2. **Leave-one-out-cross-validation (LOOCV)**

Use when data is limited, but computationally expensive

**Each data point is used as a test set**

`cv = X.shape[0]`


##### [LinkedIn](https://www.linkedin.com/in/sujay-bhaumik-d12/) | [email protected] | [Research Works](https://github.com/s1dewalker/Research-Works)