https://github.com/compcode1/energy-output-prediction
Develop a predictive model to forecast the hourly electrical energy output of a Combined Cycle Power Plant (CCPP) based on ambient environmental conditions.
https://github.com/compcode1/energy-output-prediction
machine-learning random-forest xgboost
Last synced: 6 months ago
JSON representation
Develop a predictive model to forecast the hourly electrical energy output of a Combined Cycle Power Plant (CCPP) based on ambient environmental conditions.
- Host: GitHub
- URL: https://github.com/compcode1/energy-output-prediction
- Owner: Compcode1
- License: gpl-3.0
- Created: 2024-11-11T02:29:39.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-11T02:36:44.000Z (over 1 year ago)
- Last Synced: 2024-11-11T03:26:17.679Z (over 1 year ago)
- Topics: machine-learning, random-forest, xgboost
- Language: Jupyter Notebook
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
In this project, we chose Random Forest and XGBoost as our primary algorithms for predicting the energy output of a Combined Cycle Power Plant (CCPP) based on environmental conditions. These models were selected due to their effectiveness with regression tasks and their ability to handle complex, nonlinear relationships within the data.
Random Forest: This ensemble method is known for its robustness and simplicity. It reduces overfitting by averaging the predictions of multiple decision trees, each trained on different data samples.
XGBoost: This gradient boosting algorithm improves prediction accuracy through an iterative process, correcting errors from previous trees. It is highly efficient and offers powerful hyperparameter tuning capabilities, making it well-suited for this regression task.
The task type is regression, as we aim to predict a continuous output (electrical energy in MW) based on the features Temperature (AT), Exhaust Vacuum (V), Ambient Pressure (AP), and Relative Humidity (RH).
6.2.2 Model Building
To determine the best model for this task, we followed a systematic model-building approach:
Baseline Models: Both Random Forest and XGBoost models were initially trained with default settings to establish baseline performance.
Validation Strategy: We split the data into training, validation, and test sets with an 80-10-10 split. This setup allowed us to tune the models on the validation set and test them on unseen data for final performance evaluation.
Hyperparameter Tuning:
For Random Forest, we used a grid search to tune parameters like n_estimators, max_depth, and min_samples_split.
For XGBoost, we used grid search to optimize n_estimators, max_depth, learning_rate, and colsample_bytree.
Model Comparison: After tuning, we compared both models based on performance metrics on the validation set. XGBoost showed slightly better performance, so it was selected as the final model.
6.2.3 Model Evaluation
The final XGBoost model was evaluated on the test set using the following metrics:
Mean Absolute Error (MAE): 2.02 MW
Root Mean Squared Error (RMSE): 2.97 MW
R-squared (R²): 0.970
These metrics confirm the model’s ability to generalize well on new data. The low MAE and RMSE values indicate high accuracy, while the high R² value (97.0%) suggests that the model captures most of the variance in energy output.
6.2.4 Model Interpretation
To interpret the model’s predictions, we conducted a feature importance analysis using SHAP (SHapley Additive exPlanations). SHAP values helped us understand the impact of each feature on the model's predictions:
Temperature (AT): The most influential feature, where higher temperatures decrease predicted power output, while lower temperatures increase it.
Exhaust Vacuum (V): The second most impactful feature, with higher values reducing power output predictions.
Relative Humidity (RH): Shows a non-linear impact, suggesting a more complex relationship with power output.
Ambient Pressure (AP): Has a smaller impact on predictions, with a narrow spread of SHAP values.
Implications for CCPP Energy Optimization
The feature importance analysis provides actionable insights for optimizing power output at a Combined Cycle Power Plant. For example:
Monitoring Temperature and Exhaust Vacuum closely can help predict power output levels, allowing the plant to adjust operating parameters in response to environmental conditions.
Understanding the effect of Relative Humidity and Ambient Pressure enables more nuanced adjustments, potentially helping the plant to maintain stable power output across different weather conditions.
This model interpretation improves the explainability of our predictions and highlights critical environmental factors affecting power output, supporting the power plant's goal of optimizing energy production efficiency.