Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/archanakokate/diabetes_prediction_capstoneproject

Analyzing and Modelling National Institute of Diabetes(NIDDK) dataset for accurate prediction of Diabetes in Patients, with Tableau dashboard visualization.
https://github.com/archanakokate/diabetes_prediction_capstoneproject

data-engineering data-visualization exploratory-data-analysis machine-learning-algorithms tableau-dashboards

Last synced: 7 days ago
JSON representation

Analyzing and Modelling National Institute of Diabetes(NIDDK) dataset for accurate prediction of Diabetes in Patients, with Tableau dashboard visualization.

Awesome Lists containing this project

README

        

# Diabetes_Prediction_Capstone Project

NIDDK (National Institute of Diabetes and Digestive and Kidney Diseases) research creates knowledge about and treatments for the most chronic, costly, and consequential diseases.

The dataset used in this project is originally from NIDDK. The objective is to build a model to accurately predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset.

#### Project Tasks:
1. Perform descriptive analysis.
2. Visually explore the variables, you may need to look for the distribution of the variables using
histograms. Treat the missing values accordingly.
3. We observe integer as well as float data-type of variables in this dataset. Create a count
(frequency) plot describing the data types and the count of variables.
4. Check the balance of the data by plotting the count of outcomes by their value. Describe
your findings and plan future course of actions.
5. Create scatter charts between the pair of variables to understand the relationships. Describe
your findings.
6. Perform correlation analysis. Visually explore it using a heat map.
7. Devise strategies for model building. It is important to decide the right validation framework.
Express your thought process. Would Cross validation be useful in this scenario?
8. Apply an appropriate classification algorithm to build a model. Compare various models with
the results from KNN.
9. Create a classification report by analysing sensitivity, specificity, AUC(ROC curve) etc. Please
try to be as descriptive as possible to explain what values of these parameter you settled for?
any why?

10. Create a dashboard in tableau by choosing appropriate chart types and metrics useful for the
business. The dashboard must entail the following:
a) Pie chart to describe the diabetic/non-diabetic population.

b) Scatter charts between relevant variables to analyse the relationship.

c) Histogram/frequency charts to analyse the distribution of the data.

d) Heatmap of correlation analysis among the relevant variables.

e) Create bins of Age values – 20-25, 25-30, 30-35 etc. and analyse different variables for
these age brackets using a bubble chart.