https://github.com/desininja/vehicle-insurance
Data science project
https://github.com/desininja/vehicle-insurance
Last synced: 9 months ago
JSON representation
Data science project
- Host: GitHub
- URL: https://github.com/desininja/vehicle-insurance
- Owner: desininja
- Created: 2020-06-11T07:17:57.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2021-06-30T21:00:24.000Z (over 4 years ago)
- Last Synced: 2025-01-02T21:17:03.365Z (11 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 670 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Vehicle-insurance
Vehicle Insurance data:
This dataset contains multiple features according to the customer’s vehicle and insurance type.
OBJECTIVE: Business requirement is to increase the clv (customer lifetime value) that means clv is the target variable.
Data Cleansing:
This dataset is pretty clean already, a few outliers are there. Remove the outliers.
Why remove Outliers?
Outliers are unusual values in dataset, and they can distort statistical analyses and violate their assumptions.
Feature selection:
This step is required to remove unwanted features.
VIF and Correlation Coefficient can be used to find important features.
VIF: Variance Inflation Factor
It is a measure of collinearity among predictor variables within a multiple regression. It is calculated by taking the the ratio of the variance of all a given model's betas divide by the variance of a single beta if it were fit alone.
Correlation Coefficient:
A positive Pearson coefficient mean that one variable's value increases with the others. And a negative Pearson coefficient means one variable decreases as other variable decreases. Correlations coefficients of -1 or +1 mean the relationship is exactly linear.
Log transformation and Normalisation:
Many ML algorithms perform better or converge faster when features are on a relatively similar scale and/or close to normally distributed.
Applying different ML Algorithms to the dataset for predictions. Their accuracies are in notebook.
Please see my work. And I am open to suggestion.