Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fayzankj/airbnb-nyc-data-cleaning
Data cleaning and preprocessing project for the Airbnb NYC dataset from Kaggle. This repository documents the process of handling missing values, removing duplicates, and addressing inconsistencies. The cleaning workflow is detailed in a Jupyter Notebook for clarity.
https://github.com/fayzankj/airbnb-nyc-data-cleaning
airbnb data-cleaning data-preprocessing jupyter-notebook kaggle nyc python
Last synced: 19 days ago
JSON representation
Data cleaning and preprocessing project for the Airbnb NYC dataset from Kaggle. This repository documents the process of handling missing values, removing duplicates, and addressing inconsistencies. The cleaning workflow is detailed in a Jupyter Notebook for clarity.
- Host: GitHub
- URL: https://github.com/fayzankj/airbnb-nyc-data-cleaning
- Owner: fayzankj
- License: mit
- Created: 2024-09-06T19:07:08.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-09-06T19:14:55.000Z (2 months ago)
- Last Synced: 2024-10-10T17:21:27.787Z (about 1 month ago)
- Topics: airbnb, data-cleaning, data-preprocessing, jupyter-notebook, kaggle, nyc, python
- Language: Jupyter Notebook
- Homepage:
- Size: 5.14 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Airbnb NYC Data Cleaning Challenge
## Project Overview
This project involves cleaning and preprocessing the [Airbnb NYC dataset]([link-to-dataset](https://www.kaggle.com/datasets/thedevastator/airbnbs-nyc-overview?resource=download)) from Kaggle. The dataset includes details about Airbnb listings in New York City. The primary goal was to handle missing values, outliers, and inconsistencies to prepare the dataset for further analysis or modeling.
## Dataset
- **Original Dataset:** [`Airbnb_NYC.csv`](https://github.com/fayzankj/airbnb-nyc-data-cleaning/blob/main/Airbnb_NYC.csv)
- **Cleaned Dataset:** [`cleaned_airbnb_nyc.csv`](https://github.com/fayzankj/airbnb-nyc-data-cleaning/blob/main/cleaned_airbnb_nyc.csv)## Jupyter Notebook
The data cleaning process is thoroughly documented in the Jupyter Notebook:
- [`Data_Cleaning_Notebook.ipynb`](https://github.com/fayzankj/airbnb-nyc-data-cleaning/blob/main/Data_Cleaning_Notebook.ipynb)
## Project Steps
1. **Initial Data Exploration**
- Loading the dataset
- Overview of the data
- Checking for missing values and duplicates2. **Data Cleaning**
- Handling missing values
- Removing duplicates
- Dealing with inconsistent data
- Handling outliers3. **Data Preprocessing**
- Feature engineering
- Data type conversion## Key Findings
- **Missing Values:** Identified and addressed missing values in columns such as `name`, `host_name`, `reviews_per_month` and `last_review`.
- **Duplicates:** Removed duplicate entries to ensure dataset integrity.
- **Inconsistent Data:** Standardized values in columns with inconsistent entries.
- **Outliers:** Detected and removed outliers from numerical columns like `price`.## Getting Started
To run the Jupyter Notebook locally:
1. Clone the repository:
```bash
git clone https://github.com/fayzankj/Airbnb-NYC-Data-Cleaning.git