https://github.com/francis-calingo/ibm-capstone-data-science-for-rocket-science

api dashboard data-visualization data-wrangling exploratory-data-analysis folium-maps geospatial-analysis machine-learning presentation sqlite

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/francis-calingo/ibm-capstone-data-science-for-rocket-science
Owner: Francis-Calingo
Created: 2025-03-07T22:38:00.000Z (about 2 months ago)
Default Branch: main
Last Pushed: 2025-03-22T23:38:23.000Z (about 1 month ago)
Last Synced: 2025-03-23T00:23:50.658Z (about 1 month ago)
Topics: api, dashboard, data-visualization, data-wrangling, exploratory-data-analysis, folium-maps, geospatial-analysis, machine-learning, presentation, sqlite
Language: Jupyter Notebook
Homepage:
Size: 6.42 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# IBM Data Science Professional Certificate Capstone Project: Using Data Science Concepts to Optimize Rocket Launches

## Table of Contents
* [Introduction](#introduction)
* [Code and Resources Used](#code-and-resources-used)
* [Summary of Methodologies and Results](#summary-of-methodologies-and-results)
* [Data Collection](#data-collection)
* [Data Wrangling](#data-wrangling)
* [Exploratory Data Analysis](#exploratory-data-analysis)
* [Geospatial Analysis with Folium](#geospatial-analysis-with-folium)
* [Dashboard Visualization with Plotly Dash](#dashboard-visualization-with-plotly-dash)
* [Predictive Modelling with Machine Learning](#predictive-modelling-with-machine-learning)
* [Discussion and Next Steps](#discussion-and-next-steps)

Introduction

Amalgamated several data science topics to extract and visualize data from SpaceX, and, serving in a hypothetical data scientist role, helped the hypothetical aerospace company SpaceY optimize the success of their rocket launches (and thus save money) through predictive modelling.

Presented findings and insights in an attached PowerPoint presentation, mirroring presentations that would be presented to executives and non-technical audiences.

Part of IBM's Data Science Professional Certificate Program.

Draws upon several data science topics:

Data collection

Data wrangling

Exploratory Data Analysis with Pandas and SQL queries

Data Visualization with Matplotlib and Folium

Dashboarding with Plotly Dash

Predictive Modelling with Machine Learning

Communicating and Presenting to Non-Technical Audiences

Code and Resources Used

IDEs Used: Google Colab, Jupyter Notebook

Python Version: 3.10.12

SQL tool: sqlite3

SQL queries and functions:

SELECT DISTINCT

FROM

ORDER BY

SELECT *

WHERE

LIMIT

SELECT SUM() AS

SELECT MIN() AS

BETWEEN

COUNT(*)

GROUP BY

SELECT MAX()

SELECT substr()

Presentation tool: Microsoft 360 PowerPoint

Libraries and Packages:

Data collection via API:requests, pandas, numpy

Data collection via webscraping: sys, requests, BeautifulSoup, re, unicodedata, pandas

Data wrangling: pandas, numpy

Exploratory Data Analysis with SQL queries:csv, sqlite3, prettytable

Exploratory Data Analysis with Pandas and Matplotlib: pandas, numpy, seaborn, matplotlib

Data Visualization with Folium: folium (including MarkerCluster, MousePosition, DivIcon), wget, pandas

Dashboarding with Plotly Dash: pandas, dash (including dash_html_components, dash_core_components, Input, Output), plotly.express

Predictive Modelling with Machine Learning: pandas, numpy, seaborn, matplotlib, sklearn (including preprocessing, train_test_split, GridSearchCV, LogisticRegression, SVC, DecisionTreeClassifier, KNeighborsClassifier)

Summary of Methodologies and Results

#### Data Collection

Process (data collection from API):

First, use the request library to parse the SpaceX launch data from the SpaceX API (https://api.spacexdata.com/v4/rockets/)

Second, use pandas to filter for Falcon 9 launches and deal with missing values.

Process (data collection from webscraping):

First, use the request library to scrape data from Falcon 9's Wikipedia article (https://en.wikipedia.org/wiki/List_of_Falcon_9_and_Falcon_Heavy_launches)

Second, extract all column/variable names from the HTML table header

Third, create a data frame by parsing the launch HTML tables

#### Data Wrangling

Exploratory Data Analysis (EDA) and data cleaning (e.g., checking for null values) were performed on the dataset. The results allowed us to summarize the following: raw launch count by site, number and occurrences of each orbit type and mission outcomes. The last step was creating a landing outcome label from the Outcome column of the dataset.

#### Exploratory Data Analysis

Space X uses 4 different launch sites: CCAFS LC-40, CCAFS SLC-40, KSC LC-39A, VAFB SLC-4E. The success rate for each site improved over time. Very high success rate for payloads over 8000 kg for launch sites, and 9000 kg for orbits. Orbit types ES-L1, GEO, HEO, and SSO are the most successful orbit types. Total payload mass for NASA launches: 111,268 kg.

#### Geospatial Analysis with Folium

The sites were concentrated in the coastal part of Southern California and Florida, possibly due to safety considerations in the event of a failed launch, which allows debris to have a better chance of falling into the ocean rather than highly populated centres. Despite the relative isolation, there is sufficient infrastructure in the vicinity of the launch sites to help sustain them.

![image](https://github.com/user-attachments/assets/1bc051b7-b6de-400d-9898-1bc4f6d74933)

![image](https://github.com/user-attachments/assets/9259c91b-430f-4b94-b94c-8eb39b326149)

#### Dashboard Visualization with Plotly Dash

Making up 41.7% of the total number of successful launches, KSC LC-39A is the most successful site when using that metric, followed by CCAFS LC-40 with 29.2%.

![image](https://github.com/user-attachments/assets/40185f60-93b2-4261-ae1d-e5dd428d1f1c)

#### Predictive Modelling with Machine Learning

Decision Tree Classifier, despite having the lowest test accuracy, had by far the highest accuracy overall, suggesting that it is the machine learning algorithm that SpaceY should deploy for higher accuracy in predicting successful landings and launches.

![image](https://github.com/user-attachments/assets/df1c92a6-6770-4093-8cc2-83fecbec77f1)

Discussion and Next Steps

This research proves that data science is a versatile and multi-sectoral field with use cases beyond the business world and Big Tech. In this case, it has significantly helped SpaceY using data parsing and visualization techniques, as well as predictive modelling, to help plan successful rocket launches. The next steps that SpaceY could take includes establishing partnerships with manufacturers and securing government funding using the insights from this research.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome