Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nataliabeltranarg/nlp-booking-scraping-sentimentanalysis
Natural Language Processing sentiment analysis of Booking.com. Rental price impact of SONAR festival in Barcelona using difference in difference and OLS.
https://github.com/nataliabeltranarg/nlp-booking-scraping-sentimentanalysis
beautifulsoup data-science natural-language-processing nltk python selenium text-mining webscraping
Last synced: about 2 months ago
JSON representation
Natural Language Processing sentiment analysis of Booking.com. Rental price impact of SONAR festival in Barcelona using difference in difference and OLS.
- Host: GitHub
- URL: https://github.com/nataliabeltranarg/nlp-booking-scraping-sentimentanalysis
- Owner: nataliabeltranarg
- Created: 2024-01-23T16:19:02.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-07-20T11:54:49.000Z (6 months ago)
- Last Synced: 2024-11-22T02:13:42.747Z (about 2 months ago)
- Topics: beautifulsoup, data-science, natural-language-processing, nltk, python, selenium, text-mining, webscraping
- Language: Jupyter Notebook
- Homepage:
- Size: 2.4 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Natural Language Processing: Sentiment Analysis of Booking.com
## Project Overview
This project explores how significant annual events in Barcelona, Spain, can influence rental prices. We focused on the the SONAR festival, held annually in June, which attracks an influx of visitors to the city. The data for this project was obtained through scraping the booking website, conducted solely for research purposes as part of an assignment for the Introduction to Text Mining and Natural Language Processing course at the Barcelona School of Economics. Our research compares rental prices in Barcelona during the week of June 12th to 16th, 2024, with a control week of June 5th to 9th, 2024. Additionally, we extended the analysis to include a comparison with the control city of Alicante, Spain.
## Section 1: Web Scraping
This section focuses on the web scraping aspect of the project. We collected data on hotel names, dates, room prices, and hotel descriptions. For more detailed information on the event, cities, time periods, and the scraping process, please refer to our notebook titled '**scraping_booking.ipynb**'.#### Notebook Contents
1. **Future Event Selection**
2. **Time Period & City Selection**
3. **Scraping Pipeline**
4. **Scrape Information**
## Section 2: Sentiment Analysis
This section focuses on the analysis aspect of the project. We extracted and analyzed text features from hotel descriptions to understand their impact on rental prices. Alicante was chosen as the control city for comparison. For more detailed information on the sentiment analysis and its results, please refer to our notebook titled '**sentiment_analysis.ipynb**'.#### Notebook Contents
5. **Difference in Difference Equations**
6. **Regression Models**
7. **Feature Extraction Control**
8. **Hotel Fixed Effect Regression**#### Libraries Used
* BeautifulSoup
* NumPy
* Pandas
* Selenium
* NLTK
* Matplotlib
* Translator
* Statsmodels
* Scikit-learn## Contact the Authors
For additional questions, feel free to reach out to the authors of this project:
* Natalia Beltrán ([email protected])
* Harry Morley ([email protected])
* Xi Cheng ([email protected])
## How to navigate the repository
```bash
├── Booking Data
│ ├── Control Data
│ │ ├── Alicante_before.csv
│ │ ├── Barcelona_before.csv
│ └── Treatment Data
│ │ ├── Alicante_festival.csv
│ │ ├── Barcelona_festival.csv
├── scraping_booking.ipynb
├── sentiment_analysis.ipynb
└── README.md
```