Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rishisankineni/machine-learning-pipeline-lr-pyspark
Power Plant ML Pipeline Application - Apache Spark
https://github.com/rishisankineni/machine-learning-pipeline-lr-pyspark
apache-spark edx-course pyspark
Last synced: about 1 month ago
JSON representation
Power Plant ML Pipeline Application - Apache Spark
- Host: GitHub
- URL: https://github.com/rishisankineni/machine-learning-pipeline-lr-pyspark
- Owner: RishiSankineni
- Created: 2016-10-09T19:33:48.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2016-12-12T01:06:26.000Z (about 8 years ago)
- Last Synced: 2024-06-24T23:28:35.695Z (6 months ago)
- Topics: apache-spark, edx-course, pyspark
- Language: Jupyter Notebook
- Homepage:
- Size: 43 KB
- Stars: 12
- Watchers: 4
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MLPipeline-Lab1-EdX
#![Spark Logo](http://spark-mooc.github.io/web-assets/images/ta_Spark-logo-small.png) + ![Python Logo](http://spark-mooc.github.io/web-assets/images/python-logo-master-v3-TM-flattened_small.png)
Power Plant Machine Learning Pipeline Application -EdX - Lab1- Big Data Analysis with Apache Spark
This notebook is an end-to-end exercise of performing Extract-Transform-Load and Exploratory Data Analysis on a real-world dataset, and then applying several different machine learning algorithms to solve a supervised regression problem on the dataset.
** This notebook covers: **
* *Part 1: Business Understanding*
* *Part 2: Load Your Data*
* *Part 3: Explore Your Data*
* *Part 4: Visualize Your Data*
* *Part 5: Data Preparation*
* *Part 6: Data Modeling*
* *Part 7: Tuning and Evaluation**Our goal is to accurately predict power output given a set of environmental readings from various sensors in a natural gas-fired power generation plant.*