https://github.com/mehradi-github/ref-jupyter-datapreparation
Data preparation on Heart Disease Dataset from UCI
https://github.com/mehradi-github/ref-jupyter-datapreparation
Last synced: 3 months ago
JSON representation
Data preparation on Heart Disease Dataset from UCI
- Host: GitHub
- URL: https://github.com/mehradi-github/ref-jupyter-datapreparation
- Owner: mehradi-github
- Created: 2025-11-03T06:07:59.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-11-03T06:40:23.000Z (8 months ago)
- Last Synced: 2025-11-03T08:22:35.903Z (8 months ago)
- Language: Jupyter Notebook
- Size: 4.88 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Preparation for the UCI Heart Disease Dataset
This notebook walks through the step-by-step data preparation process for the "processed.cleveland.data" file from the UCI Machine Learning Repository.
## **The main preparation tasks are**:
1. **Load Data:** Load the dataset from the web and assign the correct column names.
2. **Handle Missing Values:** The dataset uses `'?'` for missing data. We will replace these and impute them.
3. **Transform Target Variable:** Convert the multi-class target (0-4) to a binary target (0 vs. 1).
4. **Encode & Scale Features:** Use `ColumnTransformer` to properly one-hot encode categorical features and scale numerical features.