An open API service indexing awesome lists of open source software.

https://github.com/deller23/hotel_booking_data_cleaning

Efficiently transforming raw hotel booking data into actionable insights! This project leverages Python and Pandas for advanced data cleaning—handling missing values, detecting outliers, and optimizing features—ensuring a high-quality dataset ready for analysis and modeling.
https://github.com/deller23/hotel_booking_data_cleaning

data-analysis data-cleaning data-preprocessing data-visualization data-wrangling pandas python

Last synced: about 1 year ago
JSON representation

Efficiently transforming raw hotel booking data into actionable insights! This project leverages Python and Pandas for advanced data cleaning—handling missing values, detecting outliers, and optimizing features—ensuring a high-quality dataset ready for analysis and modeling.

Awesome Lists containing this project

README

          

# Hotel Booking Data Cleaning

## 📌 Project Overview

This project involves cleaning and preprocessing a hotel booking dataset. The dataset undergoes various data cleaning steps such as handling missing values, dealing with outliers, and transforming categorical features.

## 📂 Files in This Repository

- `hotel_bookings_cleaned.ipynb` → Jupyter Notebook containing the full data preprocessing steps.
- `hotel_bookings.csv` → Raw dataset (Make sure to download it if not included).
- `README.md` → This file, explaining the project and how to run it.

## 📥 Dataset

The dataset can be downloaded from [Kaggle](https://www.kaggle.com/datasets) or another specified source. Make sure to place it in the **same directory** as the notebook before running the code.

## 🛠️ Steps Performed in Data Cleaning

1. **Handling Missing Values** → Filled or removed missing data in columns such as `agent`, `country`, etc.
2. **Removing Outliers** → Identified and treated outliers in numerical columns like `adr`.
3. **Encoding Categorical Features** → Converted categorical variables (`arrival_date_month`, etc.) into numerical format.
4. **Dropping Unnecessary Columns** → Removed columns that are not useful for analysis.
5. **Final Cleaned Dataset Output** → The cleaned data is saved as `hotel_bookings_cleaned.csv`.

## 🚀 How to Run the Notebook

### **Using Jupyter Notebook (Locally)**

1. Install Jupyter if not installed:
```bash
pip install notebook pandas numpy
```
2. Open the notebook:
```bash
jupyter notebook
```
3. Navigate to `hotel_bookings_cleaned.ipynb` and run all cells.
4. The cleaned dataset `hotel_bookings_cleaned.csv` will be generated in the same directory.

### **Using Google Colab**

1. Upload `hotel_bookings_cleaned.ipynb` and `hotel_bookings.csv` to Colab.
2. Run all cells.
3. Download the cleaned CSV using:
```python
from google.colab import files
files.download("hotel_bookings_cleaned.csv")
```

## 📌 Notes

- If you encounter missing file errors, make sure `hotel_bookings.csv` is in the correct location.
- You can modify the notebook to include additional data processing steps as needed.

---

Let me know if you have any questions or need further modifications! 🚀