An open API service indexing awesome lists of open source software.

https://github.com/edaaydinea/britishairwaysdatascience


https://github.com/edaaydinea/britishairwaysdatascience

Last synced: 26 days ago
JSON representation

Awesome Lists containing this project

README

          

# Data Science Virtual Internship at British Airways

## Introduction

This repository contains the code and data for the Data Science Virtual Internship at British Airways. The internship was completed in the month of January 2025. The internship was divided into 2 tasks. The tasks were as follows:

1. Task 1: Web Scarping to gain company insights
2. Task 2: Predicting customer buying behaviour

## Task 1: Web Scraping to gain company insights

The first thing I did was to scrape review data from the web using a website called Skytrax.

### Data Collection

I focused on reviews specifically about the airline itself. I collected as much data as I could to improve the output of my analysis. To get started with the data collection, I used the “Jupyter Notebook” in the Resources section below to run some Python code that helped me collect some data.

### Data Analysis

Once I had my dataset, I needed to prepare it. The data was very messy and contained purely text. I performed data cleaning to prepare the data for analysis. When the data was clean, I performed my own analysis to uncover some insights. As a starting point, I looked at topic modelling, sentiment analysis, and wordclouds to provide some insight into the content of the reviews. I completed this task using Python, but you can use any tool that you wish. You can use some of the documentation websites provided in the Resources section below to analyse the data.

### Presentation of Insights

I summarised my findings within a single PowerPoint slide, so that they could be presented at the next board meeting. I created visualisations and metrics to include within this slide, as well as clear and concise explanations to quickly provide the key points from my analysis. I used the “PowerPoint Template” provided to complete the slide.

## Task 2: Predicting customer buying behaviour

For this task, I used a Jupyter Notebook to perform predictive modeling on customer booking data.

### Exploratory Data Analysis

First, I explored the data to understand its structure and statistical properties. This included loading the dataset, checking for missing values, and visualizing the distribution of various features.

### Data Preparation

I handled outliers and missing values to ensure the data was clean and ready for modeling. This involved identifying and replacing outliers, as well as encoding categorical variables using one-hot encoding and label encoding.

### Feature Engineering

I performed feature engineering to create new features and transform existing ones. This included target encoding for high-cardinality features and creating new features based on domain knowledge.

### Model Training

I split the data into training, validation, and test sets. I used SMOTE to handle class imbalance in the training set. I trained several models, including Logistic Regression and Random Forest, and tuned their hyperparameters using GridSearchCV.

### Model Evaluation

I evaluated the models using classification metrics such as accuracy and classification reports. I compared the performance of the models on the validation and test sets to select the best model.

### Presentation of Insights

I summarized the findings and model performance within a single PowerPoint slide, including visualizations and key metrics. This slide was prepared using the provided “PowerPoint Template” and submitted for review.