An open API service indexing awesome lists of open source software.

https://github.com/nathanaelmutua/british-airways-data-science-challenge

My solutions for the Forage program: web scraping, data cleaning, analysis, and visualization to extract business insights. Demonstrating practical data science skills for real-world problem-solving.
https://github.com/nathanaelmutua/british-airways-data-science-challenge

british-airways british-airways-virtual-program data-science data-visualization dataanalysis forage internship-project internship-task jupyter-notebook python sentiment-analysis webscraping

Last synced: about 2 months ago
JSON representation

My solutions for the Forage program: web scraping, data cleaning, analysis, and visualization to extract business insights. Demonstrating practical data science skills for real-world problem-solving.

Awesome Lists containing this project

README

          

![british-airways horizontal-blue-fill](https://github.com/user-attachments/assets/f41a94bf-9017-4aba-a580-18cb42e61537)

# **British Airways Data Science Internship**

## **Description**
My solutions for the Forage program are web scraping, data cleaning, analysis, and visualization to extract business insights.

Demonstrating practical data science skills for real-world problem-solving.

## **Task 1**
### **1. Scrape data from the web**
The first thing to do will be to scrape review data from the site [Skytrax](https://www.airlinequality.com/airline-reviews/british-airways/)

I will use Jupyter Notebook to gather data, clean it, and analyze it.

### **2. Clean the Data**
We start by cleaning our messy text dataset to prepare it for analysis.

Explore insights using techniques like topic modeling, sentiment analysis, and word clouds.

### **3. Present Insights**
Summarize my key findings in a single PowerPoint slide.

Include visualizations, metrics, and clear explanations to convey my results quickly.

# **Task 2**

## **Description**
We will explore the dataset using the Jupyter Notebook, analyzing its structure and key statistics, preparing it for modeling by creating relevant features, and then training a machine learning model—such as RandomForest—to predict customer bookings while assessing variable importance.

### **1. Explore and Prepare the Dataset:**

Use the provided Jupyter Notebook to understand the dataset’s columns and statistics.
Clean and engineer new features that may improve predictive performance.

### **2. Train a Machine Learning Model:**

Build a predictive model to determine if a customer will make a booking.
Use an algorithm (e.g., Random Forest) that provides insights into the importance of each feature.

### **3. Evaluate and Present Findings:**

Assess model performance with cross-validation.
Report relevant metrics to demonstrate how well the model predicts bookings.