An open API service indexing awesome lists of open source software.

https://github.com/s1dewalker/model_validation

Model Management in Python. Steps involved in Model Validation and tuning. Testing Model Assumptions in Factor Analysis with OLS Regression.
https://github.com/s1dewalker/model_validation

assumptions bias-variance cross-validation hyperparameter-tuning linear-regression-models model-validation ols-regression python regression regression-models tuning

Last synced: 2 months ago
JSON representation

Model Management in Python. Steps involved in Model Validation and tuning. Testing Model Assumptions in Factor Analysis with OLS Regression.

Awesome Lists containing this project

README

        

Description

## Example 1: Model validation of Assumptions of Linear regression in Fama French 3-Factor Model

### 1. Checking Multicollinearity of features or independent variables w/ Correlation matrix
### 2. Checking Linearity w/ Scatter plots
### 3. Checking Independence of residuals w/ Autocorrelation Function (ACF) and D-W test
### 4. Checking Normality of residuals w/ histogram
### 5. Checking Homoscedasticity (equal variance) of Residuals w/ scatter plot of residuals and fitted values

Description

Description


Consequences:

### 1. Multicollinearity = Redundancy = It will be difficult for the model to find which feature is actually contributing to predict the target
### 2. Non-linearity = Model won't capture the relationship closely, leading to large errors in fitting
### 3. Autocorrelation in residuals = Missing something important. Check for some important feature
### 4. Non-Normality of residuals = Assumption of tests of having a normal distribution on residuals won't hold. Apply transformations on features.
### 5. No Homoscedasticity of residuals = less precision in estimates

### [Check out Model Validation for Linear Regression in Factor analysis in Python](https://github.com/s1dewalker/Model_Validation/blob/main/Multi_Factor_Analysis3.ipynb)


## Example 2: Model validation and tuning in Random Forest Regression, on a continuos data

### 1. Get the data
### 2. Define the target (y) and features (X)
### 3. Split the data into training and testing set (validation if required)
### 4. Initiate a model, set parameters, and Fit the training set | `X_train, y_train`
### 5. Predict on `X_test`
### 6. Accuracy or Error metrics on `y_test` | Ex: R squared
### 7. Bias-Variance trade-off check | Balancing underfitting and overfitting
### 8. Iterate to tune the model (from step 4)
### 9. Cross Validation | if model not generalizing well
### 10. Selecting the best model w/ Hyperparameter tuning

### [Check out Model Validation and Tuning for RFR in Python](https://github.com/s1dewalker/Model_Validation/blob/main/Model_Validation.ipynb)



Few Details:

## Bias-Variance trade-off

Description

**Bias = failing to find relationship b/w data and response** = ERROR due to OVERLY SIMPLISTIC models (underfitting)

**Variance = following training data too closely** = ERROR due to OVERLY COMPLEX models (overfitting) that are SENSITIVE TO FLUCTUATIONS (noise) in the training data




High Bias + Low Variance: Underfitting (simpler models)

Low Bias + High Variance: Overfitting (complex models)



### **Training error high = Underfitting**
### **Testing error >> Training error = Overfitting**


## Cross Validation - An efficient method to find the balance
Description

###### by sharpsightlabs.com

### Splitting data into distinct subsets. Each subset used once as a test set while the remaining as training set. Results from all splits are averaged.



Why use?

- Better Generalization: If our models are not generalizing well (Generalization refers to a model's ability to perform well on new, unseen data, not just the data it was trained on)
- Reliable Evaluation
- Efficient use of data (if we have limited data)



Types:

1. **cross_val_score**
Description




2. **Leave-one-out-cross-validation (LOOCV)**

Use when data is limited, but computationally expensive

**Each data point is used as a test set**

`cv = X.shape[0]`


##### [LinkedIn](https://www.linkedin.com/in/sujay-bhaumik-d12/) | [email protected] | [Research Works](https://github.com/s1dewalker/Research-Works)