An open API service indexing awesome lists of open source software.

https://github.com/abhijeet107/task-1

Data Cleaning and Preprocessing
https://github.com/abhijeet107/task-1

datacleaning excel pandas python

Last synced: 9 months ago
JSON representation

Data Cleaning and Preprocessing

Awesome Lists containing this project

README

          

# Task-1
Data Cleaning and Preprocessing
# Medical Appointment No-Show Dataset ๐Ÿฅ

This project involves cleaning and preprocessing the [KaggleV2-May-2016.csv](https://www.kaggle.com/datasets/joniarroba/noshowappointments) dataset containing information on 110,000+ medical appointments in Brazil, with the aim of identifying patterns behind patient no-shows.

## ๐Ÿงน Cleaning Summary

The following changes were made to prepare the dataset for analysis:

- โœ… **Renamed Columns**:
- `Handcap` โž `Handicap`
- `No-show` โž `No_Show`
- ๐Ÿ“… **Converted to DateTime**:
- `ScheduledDay` and `AppointmentDay`
- ๐Ÿ” **Mapped `No_Show` to Boolean**:
- `Yes` โž `True` (No-show)
- `No` โž `False` (Showed up)
- ๐Ÿงผ **Removed Invalid Rows**:
- 1 entry with a negative age removed
- ๐Ÿ”ข **Formatted ID Columns**:
- `PatientId` converted to string to prevent scientific notation
- ๐Ÿงฏ **Duplicate Check**:
- No duplicate rows found

## ๐Ÿ“ Files

- `KaggleV2-May-2016.csv` โ€“ Original dataset
- `cleaned_appointments.csv` โ€“ Cleaned dataset ready for analysis
- `clean_data.py` โ€“ Python script used for cleaning

## ๐Ÿ“Š Next Steps

- Perform EDA (Exploratory Data Analysis)
- Build predictive models for patient no-shows
- Visualize trends and demographic patterns

## ๐Ÿ“Œ Credits

Dataset from Kaggle: [No-show appointments](https://www.kaggle.com/datasets/joniarroba/noshowappointments)

---

๐Ÿง  Maintained by: *Abhijeet Kuanr*