Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dmarks84/coursework_project_airfoil-noise-prediction

Project for IBM Data Engineering & Python course on ML & AI -- Created predictions for noise of an airfoil based on various physical features
https://github.com/dmarks84/coursework_project_airfoil-noise-prediction

apache-spark api automation data-modeling etl linear-algebra numpy pandas pipelines python regression statistics supervised-ml

Last synced: 11 days ago
JSON representation

Project for IBM Data Engineering & Python course on ML & AI -- Created predictions for noise of an airfoil based on various physical features

Awesome Lists containing this project

README

        

## Project(Project_Airfoil-Noise-Prediction)
### Part of the Coursera series: IBM Data Engineering & Python

## Summary
I took on the role of data engineer at an aeronautics consulting company. This fictional company prides itself in being able to efficiently design airfoils for use in planes and sports cars. Data scientists in the office need to work with different algorithms and data in different formats. While they are good at Machine Learning, they counted on me to be able to do ETL jobs and build ML pipelines. In this project I used a modified version of the NASA Airfoil Self Noise dataset. I cleaned this dataset, by dropping the duplicate rows, and removing the rows with null values. I then created an ML pipe line to create a model that predicted the SoundLevel based on all the other columns. I evaluated the model and then persisted it for future use. Here were the steps:
- Part 1 Perform ETL activity
- Load a csv dataset
- Remove duplicates if any
- Drop rows with null values if any
- Make transformations
- Store the cleaned data in parquet format
- Part 2 Create a Machine Learning Pipeline
- Create a machine learning pipeline for prediction
- Part 3 Evaluate the Model
- Evaluate the model using relevant metrics
- Part 4 Persist the Model
- Save the model for future production use
- Load and verify the stored model

## Skills (Developed & Applied)
Programming, Python, Statistics, Linear Algebra, Numpy, Pandas, ETL &| ELT & Data Pipelines, Apache Spark, Automation, APIs, Data Modeling, Data Summarization, Regression, Supervised ML