https://github.com/deller23/hotel_booking_data_cleaning
Efficiently transforming raw hotel booking data into actionable insights! This project leverages Python and Pandas for advanced data cleaning—handling missing values, detecting outliers, and optimizing features—ensuring a high-quality dataset ready for analysis and modeling.
https://github.com/deller23/hotel_booking_data_cleaning
data-analysis data-cleaning data-preprocessing data-visualization data-wrangling pandas python
Last synced: about 1 year ago
JSON representation
Efficiently transforming raw hotel booking data into actionable insights! This project leverages Python and Pandas for advanced data cleaning—handling missing values, detecting outliers, and optimizing features—ensuring a high-quality dataset ready for analysis and modeling.
- Host: GitHub
- URL: https://github.com/deller23/hotel_booking_data_cleaning
- Owner: Deller23
- Created: 2025-03-31T05:29:33.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-31T06:21:55.000Z (about 1 year ago)
- Last Synced: 2025-03-31T08:41:21.250Z (about 1 year ago)
- Topics: data-analysis, data-cleaning, data-preprocessing, data-visualization, data-wrangling, pandas, python
- Language: Jupyter Notebook
- Homepage:
- Size: 2.04 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Hotel Booking Data Cleaning
## 📌 Project Overview
This project involves cleaning and preprocessing a hotel booking dataset. The dataset undergoes various data cleaning steps such as handling missing values, dealing with outliers, and transforming categorical features.
## 📂 Files in This Repository
- `hotel_bookings_cleaned.ipynb` → Jupyter Notebook containing the full data preprocessing steps.
- `hotel_bookings.csv` → Raw dataset (Make sure to download it if not included).
- `README.md` → This file, explaining the project and how to run it.
## 📥 Dataset
The dataset can be downloaded from [Kaggle](https://www.kaggle.com/datasets) or another specified source. Make sure to place it in the **same directory** as the notebook before running the code.
## 🛠️ Steps Performed in Data Cleaning
1. **Handling Missing Values** → Filled or removed missing data in columns such as `agent`, `country`, etc.
2. **Removing Outliers** → Identified and treated outliers in numerical columns like `adr`.
3. **Encoding Categorical Features** → Converted categorical variables (`arrival_date_month`, etc.) into numerical format.
4. **Dropping Unnecessary Columns** → Removed columns that are not useful for analysis.
5. **Final Cleaned Dataset Output** → The cleaned data is saved as `hotel_bookings_cleaned.csv`.
## 🚀 How to Run the Notebook
### **Using Jupyter Notebook (Locally)**
1. Install Jupyter if not installed:
```bash
pip install notebook pandas numpy
```
2. Open the notebook:
```bash
jupyter notebook
```
3. Navigate to `hotel_bookings_cleaned.ipynb` and run all cells.
4. The cleaned dataset `hotel_bookings_cleaned.csv` will be generated in the same directory.
### **Using Google Colab**
1. Upload `hotel_bookings_cleaned.ipynb` and `hotel_bookings.csv` to Colab.
2. Run all cells.
3. Download the cleaned CSV using:
```python
from google.colab import files
files.download("hotel_bookings_cleaned.csv")
```
## 📌 Notes
- If you encounter missing file errors, make sure `hotel_bookings.csv` is in the correct location.
- You can modify the notebook to include additional data processing steps as needed.
---
Let me know if you have any questions or need further modifications! 🚀