An open API service indexing awesome lists of open source software.

https://github.com/rahulsm20/insurance-data

A data analytics project dealing with risk assessment and it's effects in health insurance.
https://github.com/rahulsm20/insurance-data

data-analysis data-analytics machine-learning matplotlib numpy pandas python scikit-learn

Last synced: 2 months ago
JSON representation

A data analytics project dealing with risk assessment and it's effects in health insurance.

Awesome Lists containing this project

README

          

# Insurance Dataset Analysis Project
[Tableau Dashboard](https://public.tableau.com/views/InsuranceData_16744970011090/Dashboard1?:language=en-US&:display_count=n&:origin=viz_share_link)
### Objectives
The main objective of this project is to analyze the given dataset to understand patterns and trends in the data, to draw conclusions relating to how factors like region, age, gender and pre-existing conditions affect a customer's insurance charges and to develop a model that can predict the charges for a given individual based on their age, sex, region, smoker status, and number of children.

### About the data
The dataset used for this project consists of the following columns:

- age: age of the individual
- sex: male or female
- region: the region where the individual resides
- charges: the cost of insurance for the individual
- smoker: whether the individual is a smoker (yes or no)
- number of children: number of children the individual has
The dataset is reasonably large and contains a good mix of demographic data and insurance charges.

## Table of Contents

- [Overview](#overview)
- [Methodology](#methodology)
- [Conclusion](#conclusion)

## Overview

The insurance dataset contains information on various policyholders, including their age, gender, BMI, smoking status, region, and insurance charges. The goal of this analysis is to use the dataset to gain insights into the insurance market and inform decision-making.

## Methodology

We analyzed the dataset by performing the following steps:

1. Data exploration: We explored the dataset to understand its size, structure, and format. We also checked for missing values and outliers.

2. Identify trends and patterns: We identified trends and patterns within the data, such as the correlation between smoking status and insurance charges, and the distribution of insurance charges across different regions.

3. Segment the data: We segmented the data by age, gender, BMI, smoking status, and region to analyze the trends and patterns for each group.

4. Visualize the data: We created various charts and graphs to visualize the relationships between different variables, such as scatterplots to show the correlation between age and insurance charges, and bar charts to compare the average insurance charges for different regions.

5. Draw conclusions: We drew conclusions from the insights and used them to inform decision-making.

## Conclusions
1. We observe a strong correlation between smoking and higher insurance charges.
Average insurances charges for smokers is $32,050.23, whereas for non-smokers, it is $8,434.27.
Difference in average insurance charges between smokers and non-smokers: +73.68%
2. We also observe a correlation between smoking, BMI and insurance charges.
Charges for a smoker with above average BMI are 2.15x higher than sample average.
3. Average charges for male customers are 9.94% higher than that for female customers.
4. Insurance charges also vary by region, Southeast being the highest at $14,735.41