Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/easonlai/eda_for_prudential_life_insurance_sample_data

Notebook sample of Exploratory Data Analysis (EDA) for Prudential Life Insurance Sample Data
https://github.com/easonlai/eda_for_prudential_life_insurance_sample_data

azure-databricks azuredatabricks data-analysis data-analysis-python data-analytics databricks databricks-notebooks eda exploratory-data-analysis insurance insurance-sample-data jupyter-notebook python python3

Last synced: 13 days ago
JSON representation

Notebook sample of Exploratory Data Analysis (EDA) for Prudential Life Insurance Sample Data

Awesome Lists containing this project

README

        

# Exploratory Data Analysis (EDA) for Prudential Life Insurance Sample Data

I find great Insurance sample data from [Kaggle](https://www.kaggle.com/) which is about ["Prudential Life Insurance Assessment - Can you make buying life insurance easier?"](https://www.kaggle.com/c/prudential-life-insurance-assessment). This sample data is great for practice data analysis. As usual, I use [Jupyter Notebook](https://jupyter.org/) & [Azure Databricks](https://docs.microsoft.com/en-us/azure/databricks/scenarios/what-is-azure-databricks) notebook to perform analysis.

Data Fields Description
* Id, A unique identifier associated with an application.
* Product_Info_1-7, A set of normalized variables relating to the product applied for
* Ins_Age, Normalized age of applicant
* Ht, Normalized height of applicant
* Wt, Normalized weight of applicant
* BMI, Normalized BMI of applicant
* Employment_Info_1-6, A set of normalized variables relating to the employment history of the applicant.
* InsuredInfo_1-6, A set of normalized variables providing information about the applicant.
* Insurance_History_1-9, A set of normalized variables relating to the insurance history of the applicant.
* Family_Hist_1-5, A set of normalized variables relating to the family history of the applicant.
* Medical_History_1-41, A set of normalized variables relating to the medical history of the applicant.
* Medical_Keyword_1-48, A set of dummy variables relating to the presence of/absence of a medical keyword being associated with the application.
* Response, This is the target variable, an ordinal variable relating to the final decision associated with an application

File Content Description
* data/prudential_life_insurance_sample_data.csv <-- Sample data from Kaggle
* eda_for_prudential_life_insurance_sample_data.ipynb <-- Notebook sample of EDA
* eda_for_prudential_life_insurance_sample_data_databricks.ipynb <-- Notebook sample for Databricks
* eda_for_prudential_life_insurance_sample_data_databricks.html <-- Notebook HTML export from Databricks