Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/invictusaman/insurance-cost-analysis-regression
Showcasing Simple Linear, Multiple Linear, Polynomial and Ridge Regression on Insurance Cost Dataset to predict insurance price. Also, I have generated a report using Quarto.
https://github.com/invictusaman/insurance-cost-analysis-regression
jupyter-notebook linear-regression machine-learning multiple-linear-regression notebooks pipeline polynomial-regression quarto regression ridge-regression
Last synced: 4 days ago
JSON representation
Showcasing Simple Linear, Multiple Linear, Polynomial and Ridge Regression on Insurance Cost Dataset to predict insurance price. Also, I have generated a report using Quarto.
- Host: GitHub
- URL: https://github.com/invictusaman/insurance-cost-analysis-regression
- Owner: invictusaman
- Created: 2024-09-14T17:35:49.000Z (3 months ago)
- Default Branch: master
- Last Pushed: 2024-09-14T21:19:03.000Z (3 months ago)
- Last Synced: 2024-11-07T12:58:05.117Z (about 2 months ago)
- Topics: jupyter-notebook, linear-regression, machine-learning, multiple-linear-regression, notebooks, pipeline, polynomial-regression, quarto, regression, ridge-regression
- Language: Jupyter Notebook
- Homepage:
- Size: 197 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Project Scenario
You have to perform data analytics on a medical insurance charges dataset. This is a filtered and modified version of the [Medical Insurance Price Prediction](https://www.kaggle.com/datasets/harishkumardatalab/medical-insurance-price-prediction?resource=download) dataset, available under the [CC0 1.0 Universal License](https://creativecommons.org/publicdomain/zero/1.0/legalcode) on the Kaggle website.
## About the dataset
Download it from the `Dataset` folder.
## Getting the notebok
You can find the `Jupyter` notebook under Jupyter Notebook folder.
I have also created a `Quarto` notebook to generate my report.
## Report and Finding
View the report and analysis of this dataset under `Report` folder. It was generated using `Quarto`.
## Parameters
The parameters used in the dataset are:
- **Age**: Age of the insured. Integer quantity.
- **Gender**: Gender of the insured. This parameter has been mapped to numerical values as follows:| Gender | Assigned Value |
| ------ | -------------- |
| Female | 1 |
| Male | 2 |- **BMI**: Body Mass Index of the insured. Float value quantity.
- **No_of_Children**: Number of children the insured person has. Integer quantity.
- **Smoker**: Whether the insured person is a smoker or not. This parameter has been mapped to numerical values as follows:| Smoker | Assigned Value |
| ---------- | -------------- |
| Smoker | 1 |
| Non smoker | 2 |- **Region**: Which region of the USA does the insured belong to. This parameter has been mapped to numerical values as follows:
| Region | Assigned Value |
| --------- | -------------- |
| Northwest | 1 |
| Northeast | 2 |
| Southwest | 3 |
| Southeast | 4 |- **Charges**: Charges for the insurance in USD. Floating value quantity.
## Objectives
1. Load the data as a pandas dataframe.
2. Clean the data, taking care of the blank entries.
3. Run exploratory data analysis (EDA) and identify the attributes that most affect the charges.
4. Develop single variable and multi-variable Linear Regression models for predicting the charges.
5. Use Ridge Regression to refine the performance of Linear Regression models.### Hey, [Visit my portfolio](https://amanbhattarai.com.np)