https://github.com/siddh30/The-Airbnb-Classification-Project
This project is from the Airbnb Recruitment Challenge on Kaggle. The challenge is to solve a multi-class classification problem of predicting new users first booking destination.
https://github.com/siddh30/The-Airbnb-Classification-Project
airbnb binary-classification data-mining kaggle-challenge multi-class-classification r-programming
Last synced: 4 months ago
JSON representation
This project is from the Airbnb Recruitment Challenge on Kaggle. The challenge is to solve a multi-class classification problem of predicting new users first booking destination.
- Host: GitHub
- URL: https://github.com/siddh30/The-Airbnb-Classification-Project
- Owner: siddh30
- Created: 2019-05-14T23:19:01.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2022-02-22T19:11:13.000Z (about 3 years ago)
- Last Synced: 2024-08-13T07:14:23.440Z (8 months ago)
- Topics: airbnb, binary-classification, data-mining, kaggle-challenge, multi-class-classification, r-programming
- Language: R
- Homepage:
- Size: 33.9 MB
- Stars: 11
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- jimsghstars - siddh30/The-Airbnb-Classification-Project - This project is from the Airbnb Recruitment Challenge on Kaggle. The challenge is to solve a multi-class classification problem of predicting new users first booking destination. (R)
README
#
The Airbnb Classification Project
![]()
In this kaggle challenge by Airbnb, we are provided with a list of users along with their demographics, web session records, and some summary statistics. We were asked to predict which country a new user's first booking destination will be.
There are 12 possible outcomes of the destination country: 'US', 'FR', 'CA', 'GB', 'ES', 'IT', 'PT', 'NL','DE', 'AU', 'NDF' (no destination found), and 'other'. Please note that 'NDF' is different from 'other' because 'other' means there was a booking, but is to a country not included in the list, while 'NDF' means there wasn't a booking.
### We have implemented this project in four Stages.
1. Data Visulalisation and Anlaysis of the entire dataset
2. Data preprocessing, which includes using One Hot Encoding to create binary labels for different countries present in the country_destination column.
Out of these newly created variables we have used USA for our binary classification.
3. Implemetation of different models, These include: -
1) Naive Bayes
2) K - Nearest Neighbours (KNN)
3) Artificial Neural Network (ANN)
4) C50
5) Random Forest
6) Xgboost (Extreme Gradient Descent) for multi-classifictaion.