Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ibensusan/medical-data-visualizer
Medical Data Visualizer Project from FreeCodeCamp using Python
https://github.com/ibensusan/medical-data-visualizer
Last synced: 25 days ago
JSON representation
Medical Data Visualizer Project from FreeCodeCamp using Python
- Host: GitHub
- URL: https://github.com/ibensusan/medical-data-visualizer
- Owner: iBensusan
- Created: 2024-09-25T12:18:38.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-09-28T09:20:51.000Z (3 months ago)
- Last Synced: 2024-10-18T07:02:30.623Z (3 months ago)
- Language: Python
- Size: 2.93 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Project: Medical Data Analysis and Visualization
This project demonstrates how to preprocess, analyze, and visualize medical examination data, focusing on BMI calculation, normalization of health indicators, and correlation analysis.
## Objectives:
1. **Data Preprocessing**:
- Load and clean the dataset using pandas.
- Calculate the Body Mass Index (BMI) and create a new column `overweight` to indicate individuals with a BMI greater than 25.
- Normalize the `cholesterol` and `gluc` columns by adjusting the values to either `0` (normal) or `1` (above normal).2. **Categorical Analysis**:
- Transform the dataset into a long format using `pd.melt()` to facilitate categorical plotting.
- Create a categorical plot to compare the distribution of health indicators (cholesterol, glucose, smoking, alcohol consumption, activity, and overweight) for individuals with and without cardiovascular disease.3. **Outlier Removal**:
- Filter the dataset to remove inconsistent and extreme values:
- Ensure systolic blood pressure (`ap_hi`) is greater than or equal to diastolic blood pressure (`ap_lo`).
- Remove outliers based on height and weight using the 2.5th and 97.5th percentiles.4. **Correlation Analysis**:
- Calculate a correlation matrix for numerical variables in the cleaned dataset.
- Visualize the correlations using a heatmap, masking the upper triangle to improve readability.## Tools and Libraries:
- **Pandas**: For data loading, manipulation, and cleaning.
- **NumPy**: For numerical operations and matrix calculations.
- **Matplotlib & Seaborn**: For data visualization, including categorical plots and heatmaps.## Outcomes:
- A cleaned and preprocessed dataset ready for analysis.
- Visualization of the distribution of health indicators based on cardiovascular disease presence.
- Identification of key correlations between medical variables using a heatmap.
- Understanding of how cholesterol, glucose levels, smoking, alcohol consumption, activity, and BMI relate to cardiovascular disease.