https://github.com/nathanaelmutua/british-airways-data-science-challenge
My solutions for the Forage program: web scraping, data cleaning, analysis, and visualization to extract business insights. Demonstrating practical data science skills for real-world problem-solving.
https://github.com/nathanaelmutua/british-airways-data-science-challenge
british-airways british-airways-virtual-program data-science data-visualization dataanalysis forage internship-project internship-task jupyter-notebook python sentiment-analysis webscraping
Last synced: about 2 months ago
JSON representation
My solutions for the Forage program: web scraping, data cleaning, analysis, and visualization to extract business insights. Demonstrating practical data science skills for real-world problem-solving.
- Host: GitHub
- URL: https://github.com/nathanaelmutua/british-airways-data-science-challenge
- Owner: NathanaelMutua
- License: mit
- Created: 2025-02-07T04:48:33.000Z (8 months ago)
- Default Branch: master
- Last Pushed: 2025-03-13T21:54:05.000Z (7 months ago)
- Last Synced: 2025-05-29T20:35:31.040Z (4 months ago)
- Topics: british-airways, british-airways-virtual-program, data-science, data-visualization, dataanalysis, forage, internship-project, internship-task, jupyter-notebook, python, sentiment-analysis, webscraping
- Language: Jupyter Notebook
- Homepage:
- Size: 6.59 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

# **British Airways Data Science Internship**
## **Description**
My solutions for the Forage program are web scraping, data cleaning, analysis, and visualization to extract business insights.Demonstrating practical data science skills for real-world problem-solving.
## **Task 1**
### **1. Scrape data from the web**
The first thing to do will be to scrape review data from the site [Skytrax](https://www.airlinequality.com/airline-reviews/british-airways/)I will use Jupyter Notebook to gather data, clean it, and analyze it.
### **2. Clean the Data**
We start by cleaning our messy text dataset to prepare it for analysis.Explore insights using techniques like topic modeling, sentiment analysis, and word clouds.
### **3. Present Insights**
Summarize my key findings in a single PowerPoint slide.Include visualizations, metrics, and clear explanations to convey my results quickly.
# **Task 2**
## **Description**
We will explore the dataset using the Jupyter Notebook, analyzing its structure and key statistics, preparing it for modeling by creating relevant features, and then training a machine learning model—such as RandomForest—to predict customer bookings while assessing variable importance.### **1. Explore and Prepare the Dataset:**
Use the provided Jupyter Notebook to understand the dataset’s columns and statistics.
Clean and engineer new features that may improve predictive performance.### **2. Train a Machine Learning Model:**
Build a predictive model to determine if a customer will make a booking.
Use an algorithm (e.g., Random Forest) that provides insights into the importance of each feature.### **3. Evaluate and Present Findings:**
Assess model performance with cross-validation.
Report relevant metrics to demonstrate how well the model predicts bookings.