Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mg380/ibm-applied-data-science-capstone
This Capstone is the 10th (final) course in IBM Data Science Professional Certificate specialization, and it actually summarises in the form of project all materials that have been learned during this specialization
https://github.com/mg380/ibm-applied-data-science-capstone
capstone data data-analysis data-science datascience ibm machine-learning plotly python scikit-learn sql
Last synced: 27 days ago
JSON representation
This Capstone is the 10th (final) course in IBM Data Science Professional Certificate specialization, and it actually summarises in the form of project all materials that have been learned during this specialization
- Host: GitHub
- URL: https://github.com/mg380/ibm-applied-data-science-capstone
- Owner: mg380
- Created: 2024-08-19T04:40:20.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-08-23T04:42:20.000Z (3 months ago)
- Last Synced: 2024-09-23T06:02:57.454Z (about 1 month ago)
- Topics: capstone, data, data-analysis, data-science, datascience, ibm, machine-learning, plotly, python, scikit-learn, sql
- Language: Jupyter Notebook
- Homepage:
- Size: 6.44 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# IBM Applied Data Science Capstone
This Capstone is the professional certification project for [IBM Data Science Professional Certificate](https://www.coursera.org/professional-certificates/ibm-data-science) specialization, and summarizesthe project materials learned during this specialization.## Final Presentation
Please refer to the final presentation document for the overview of the task and results. The slides will guide the viewer through the different stages of the project and links are provided to the apporpriate coding tasks.[](https://github.com/mg380/IBM-Applied-Data-Science-Capstone/blob/main/ds-capstone-template-coursera.pdf)
## :page_facing_up: Project Background
SpaceX is the most successful company of the commercial space
age, making space travel affordable. The company advertises Falcon
9 rocket launches on its website, with a cost of 62 million dollars;
other providers cost upward of 165 million dollars each, much of the
savings is because SpaceX can reuse the first stage. Therefore, if we
can determine if the first stage will land, we can determine the cost
of a launch. Based on public information and machine learning
models, we are going to predict if SpaceX will reuse the first stage.
## :page_facing_up: Questions to be answered
- How do variables such as payload mass, launch site, number of
flights, and orbits affect the success of the first stage landing?
- Does the rate of successful landings increase over the years?
- What is the best algorithm that can be used for binary classification
in this case?
## :page_facing_up: Methodology
### 1. Data collection methodology
- Using SpaceX Rest API
- Using Web Scrapping from Wikipedia
### 2. Performed data wrangling
- Filtering the data
- Dealing with missing values
- Using One Hot Encoding to prepare the data to a binary classification
### 3. Performed exploratory data analysis (EDA) using visualization and SQL
### 4. Performed interactive visual analytics using Folium and Plotly Dash
### 5. Performed predictive analysis using classification models
- Building, tuning and evaluation of classification models to ensure the best
results