Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/shahardekel/diabetes-analysis
https://github.com/shahardekel/diabetes-analysis
bigquery cognos-dashboard python sql
Last synced: 9 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/shahardekel/diabetes-analysis
- Owner: shahardekel
- Created: 2024-10-30T13:33:29.000Z (9 days ago)
- Default Branch: main
- Last Pushed: 2024-10-30T14:09:26.000Z (9 days ago)
- Last Synced: 2024-10-30T14:39:38.218Z (9 days ago)
- Topics: bigquery, cognos-dashboard, python, sql
- Language: Jupyter Notebook
- Homepage:
- Size: 16.7 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Diabetes Hospital Data Cleaning and Preprocessing
## Overview
This project contains a process and cleaning of a comprehensive healthcare dataset containing 10 years (1999-2008) of clinical care data from 130 US hospitals. The dataset focuses on hospital records of patients diagnosed with diabetes, including laboratory tests, medications, and hospital stays.
Afterwards, the cleaned data was uploaded to IBM Cognos- and created 4 dashboards visualizing different aspects in the diabetic data## Dataset Description
- Source: Clinical care data from 130 US hospitals and integrated delivery networks
- Time Period: 1999-2008
- Focus: Diabetes-related hospital stays (up to 14 days)
- Format: BigQuery database ("ALL_DATA")
- Link- https://www.archive.ics.uci.edu/dataset/296/diabetes+130-us+hospitals+for+years+1999-2008## Dependencies
- google.cloud.bigquery
- pandas
- numpy
- seaborn
- matplotlib.pyplot## Key Features
The notebook performs the following data cleaning operations:
- Connects to BigQuery database and retrieves data
- Handles missing values
- Calculates null value percentages for each column
- Standardizes column values
- Removes columns with 100% null values
- Eliminates duplicate recordsThe cleaning process is comprehensive and methodical, focusing on handling missing values, standardizing data, and removing duplicates to ensure the dataset is clean and ready for analysis.
This approach is crucial for maintaining data integrity and preparing the dataset for accurate and meaningful analysis.## Output
The notebook produces a cleaned and preprocessed pandas DataFrame ready for further analysis and visualization.
Afterwards, the cleaned data was uploaded to IBM Cognos- and created 4 dashboards visualizing different aspects in the diabetic data-
1. Patient Demographics- offers insights into the characteristics of patients in the dataset. It breaks down demographics such as age, gender, and race, providing a snapshot of the population's diversity.
2. Medication Usage- provides an overview of the top seven diabetes medications used by patients.
3. Diabetics Managment Outcomes- help assess the effectiveness of diabetes management by examining key health metrics and treatment impacts.
4. Resourse Utilization- helps analyze how healthcare resources are allocated for diabetes patients. It identifies patterns in procedure use, hospital visits, and length of stay, aiding in efficient resource management and cost reduction.