https://github.com/abhijeet107/task-1
Data Cleaning and Preprocessing
https://github.com/abhijeet107/task-1
datacleaning excel pandas python
Last synced: 9 months ago
JSON representation
Data Cleaning and Preprocessing
- Host: GitHub
- URL: https://github.com/abhijeet107/task-1
- Owner: Abhijeet107
- Created: 2025-04-07T14:01:04.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-04-07T14:03:23.000Z (9 months ago)
- Last Synced: 2025-04-07T15:22:48.122Z (9 months ago)
- Topics: datacleaning, excel, pandas, python
- Language: Jupyter Notebook
- Homepage: https://www.kaggle.com/datasets/joniarroba/noshowappointments
- Size: 2.94 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Task-1
Data Cleaning and Preprocessing
# Medical Appointment No-Show Dataset ๐ฅ
This project involves cleaning and preprocessing the [KaggleV2-May-2016.csv](https://www.kaggle.com/datasets/joniarroba/noshowappointments) dataset containing information on 110,000+ medical appointments in Brazil, with the aim of identifying patterns behind patient no-shows.
## ๐งน Cleaning Summary
The following changes were made to prepare the dataset for analysis:
- โ
**Renamed Columns**:
- `Handcap` โ `Handicap`
- `No-show` โ `No_Show`
- ๐
**Converted to DateTime**:
- `ScheduledDay` and `AppointmentDay`
- ๐ **Mapped `No_Show` to Boolean**:
- `Yes` โ `True` (No-show)
- `No` โ `False` (Showed up)
- ๐งผ **Removed Invalid Rows**:
- 1 entry with a negative age removed
- ๐ข **Formatted ID Columns**:
- `PatientId` converted to string to prevent scientific notation
- ๐งฏ **Duplicate Check**:
- No duplicate rows found
## ๐ Files
- `KaggleV2-May-2016.csv` โ Original dataset
- `cleaned_appointments.csv` โ Cleaned dataset ready for analysis
- `clean_data.py` โ Python script used for cleaning
## ๐ Next Steps
- Perform EDA (Exploratory Data Analysis)
- Build predictive models for patient no-shows
- Visualize trends and demographic patterns
## ๐ Credits
Dataset from Kaggle: [No-show appointments](https://www.kaggle.com/datasets/joniarroba/noshowappointments)
---
๐ง Maintained by: *Abhijeet Kuanr*