Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/easonlai/eda_for_prudential_life_insurance_sample_data
Notebook sample of Exploratory Data Analysis (EDA) for Prudential Life Insurance Sample Data
https://github.com/easonlai/eda_for_prudential_life_insurance_sample_data
azure-databricks azuredatabricks data-analysis data-analysis-python data-analytics databricks databricks-notebooks eda exploratory-data-analysis insurance insurance-sample-data jupyter-notebook python python3
Last synced: 13 days ago
JSON representation
Notebook sample of Exploratory Data Analysis (EDA) for Prudential Life Insurance Sample Data
- Host: GitHub
- URL: https://github.com/easonlai/eda_for_prudential_life_insurance_sample_data
- Owner: easonlai
- Created: 2021-07-30T05:55:26.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-07-31T15:44:12.000Z (over 3 years ago)
- Last Synced: 2024-11-10T23:39:03.095Z (2 months ago)
- Topics: azure-databricks, azuredatabricks, data-analysis, data-analysis-python, data-analytics, databricks, databricks-notebooks, eda, exploratory-data-analysis, insurance, insurance-sample-data, jupyter-notebook, python, python3
- Language: Jupyter Notebook
- Homepage:
- Size: 4.18 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Exploratory Data Analysis (EDA) for Prudential Life Insurance Sample Data
I find great Insurance sample data from [Kaggle](https://www.kaggle.com/) which is about ["Prudential Life Insurance Assessment - Can you make buying life insurance easier?"](https://www.kaggle.com/c/prudential-life-insurance-assessment). This sample data is great for practice data analysis. As usual, I use [Jupyter Notebook](https://jupyter.org/) & [Azure Databricks](https://docs.microsoft.com/en-us/azure/databricks/scenarios/what-is-azure-databricks) notebook to perform analysis.
Data Fields Description
* Id, A unique identifier associated with an application.
* Product_Info_1-7, A set of normalized variables relating to the product applied for
* Ins_Age, Normalized age of applicant
* Ht, Normalized height of applicant
* Wt, Normalized weight of applicant
* BMI, Normalized BMI of applicant
* Employment_Info_1-6, A set of normalized variables relating to the employment history of the applicant.
* InsuredInfo_1-6, A set of normalized variables providing information about the applicant.
* Insurance_History_1-9, A set of normalized variables relating to the insurance history of the applicant.
* Family_Hist_1-5, A set of normalized variables relating to the family history of the applicant.
* Medical_History_1-41, A set of normalized variables relating to the medical history of the applicant.
* Medical_Keyword_1-48, A set of dummy variables relating to the presence of/absence of a medical keyword being associated with the application.
* Response, This is the target variable, an ordinal variable relating to the final decision associated with an applicationFile Content Description
* data/prudential_life_insurance_sample_data.csv <-- Sample data from Kaggle
* eda_for_prudential_life_insurance_sample_data.ipynb <-- Notebook sample of EDA
* eda_for_prudential_life_insurance_sample_data_databricks.ipynb <-- Notebook sample for Databricks
* eda_for_prudential_life_insurance_sample_data_databricks.html <-- Notebook HTML export from Databricks