Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dominodatalab/reference-project-wind-turbine


https://github.com/dominodatalab/reference-project-wind-turbine

Last synced: 29 days ago
JSON representation

Awesome Lists containing this project

README

        

# Wind Turbine Output Prediction using SCADA data

## License
This template is licensed under Apache 2.0 and contains the following open source components:
* scikit-Learn [BSD 3](https://github.com/scikit-learn/scikit-learn/blob/main/COPYING)
* pandas [BSD 3](https://github.com/pandas-dev/pandas/blob/main/LICENSE)
* matplotlib [MDT](https://matplotlib.org/stable/users/project/license.html)

## Context
In this project we train a predictive model on Supervisory Control and Data Acquisition (SCADA) data captured from a physical wind turbine. SCADA systems are used for controlling, monitoring, and analyzing industrial devices and processes. The SCADA concept was developed to be a universal means of remote-access to a variety of local control modules, which could be from different manufacturers and allowing access through standard automation protocols.

Here we demonstrate how we can train a machine learning model using a freely available SCADA dataset, which comes from [Kaggle](https://www.kaggle.com/datasets/berkerisen/wind-turbine-scada-dataset)

## Dataset
The samples in this dataset are distributed as a .CSV file with the following attributes:

* Date/Time --- timestamp of the observation (10 minutes intervals)
* LV ActivePower (kW) --- The amount of power generated by the turbine at that timestamp (in kWh)
* Wind Speed (m/s) --- The wind speed as measured at the hub height of the turbine
* Theoretical_Power_Curve (KWh) --- The theoretical power values that the turbine generates with that wind speed as provided by the turbine manufacturer
* Wind Direction (degrees) --- The wind direction at the hub height of the turbine (the turbine turns in this direction automaticaly)

## Assets
This project contains the following assets

* ```WindTurbineScada.ipynb``` --- a notebok demonstrating data ingestion, exploratory data analysis, model building and evaluation
* ```train.py``` --- a model training script, which can be run as a [Domino job](https://docs.dominodatalab.com/en/latest/user_guide/942549/jobs/) to retrain the model (i.e. if new data is available)
* ```score.py``` --- a scoring function, which can be deployed as a [Domino Model API](https://docs.dominodatalab.com/en/latest/user_guide/8dbc91/deploy-models-at-rest/)
* ```model.bin``` --- a pickled version of a pre-trained ```ExtraTreesRegressor``` model
* ```data/T1.csv``` --- the original dataset

### Hardware Requirements
This project works with a standard small-sized hardware tier, such as the small-k8s tier on all Domino deployments.

### Environment Requirements
This project can be run with a Domino Standard Compute Environment that has Python 3.9 or above.