https://github.com/hinzy97/spark-dynamic-executor-time-prediction
Neural Network Models for Predicting Execution Time with Dynamic Executor Allocation in Apache Spark.
https://github.com/hinzy97/spark-dynamic-executor-time-prediction
apache-spark big-data-analytics deep-learning distributed-computing dynamic-allocation execution-time-prediction machine-learning neural-networks performance-modeling spark
Last synced: 12 days ago
JSON representation
Neural Network Models for Predicting Execution Time with Dynamic Executor Allocation in Apache Spark.
- Host: GitHub
- URL: https://github.com/hinzy97/spark-dynamic-executor-time-prediction
- Owner: hinzy97
- License: other
- Created: 2025-07-15T07:54:17.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-07-24T05:43:59.000Z (10 months ago)
- Last Synced: 2025-09-05T13:27:04.885Z (9 months ago)
- Topics: apache-spark, big-data-analytics, deep-learning, distributed-computing, dynamic-allocation, execution-time-prediction, machine-learning, neural-networks, performance-modeling, spark
- Language: Jupyter Notebook
- Homepage: https://www.researchgate.net/publication/381108033_Execution_Time_Prediction_Model_that_Considers_Dynamic_Allocation_of_Spark_Executors#fullTextFileContent
- Size: 146 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# NN Execution Time Prediction
This repository contains neural network models for predicting execution time of Spark applications, based on the paper:
**Tariq, H., & Das, O. (2023). Execution Time Prediction Model that Considers Dynamic Allocation of Spark Executors.**
Published in: *EPEW/ASMTA 2023, Lecture Notes in Computer Science (LNCS)*.
DOI: https://doi.org/10.1007/978-3-031-43185-2_23
---
## 🧠What It Does
The goal is to accurately predict the **total runtime** of Spark applications affected by dynamic executor behavior.
Two types of neural network models are implemented:
- **Black-box model**: Feature Selection ('Datasize', 'IdleTimeout', 'BacklogTimeout').
- **White-box model**: Uses detailed stage-level features including task metrics, executor timelines alongwith 'Datasize', 'IdleTimeout', 'BacklogTimeout'.
Workloads include:
- **TPC-DS SQL queries**: Q26, Q52, Q70
- **KMeans clustering**
## Structure
- `km_nn_blackbox.txt`: NN model using blackbox features for KMeans.
- `km_nn_whitebox.txt`: NN model using whitebox features for KMeans.
- `query26_nn_blackbox.txt`: NN model using blackbox features for Query-26.
- `query26_nn_whitebox.txt`: NN model using whitebox features for Query-26.
- `q52_NN_black box.ipynb`: Blackbox NN model for Query-52
- `q52_NN_whitebox.ipynb`: Whitebox NN model for Query-52
- `q70_NN_black box.ipynb`: Blackbox NN model for Query-70
- `q70_NN_whitebox.ipynb`: Whitebox NN model for Query-70
- `kmeansdata.csv`: Input data for KMeans models.
- `query26_train_blackbox.csv`: Blackbox feature data for Query-26.
- `query26_train_whitebox.csv`: Whitebox feature data for Query-26.
- `query52train.csv`: Blackbox feature data for Query-52.
- `query52train1.csv`: Whitebox feature data for Query-52.
- `query70train.csv`: Blackbox feature data for Query-70.
- `query70train1.csv`: Whitebox feature data for Query-70.
## How to Run
1. Open a Jupyter notebook inside the `NN` folder.
2. Run the notebook to view predictions and plots.
---
## 🔧 Future Work
- Integration with Spark UI for real-time feature extraction
- Coupling Dynamic Allocation Model (DAM) with an **optimization framework** for executor recommendation
- Extending DAM for **multi-job workloads** or streaming scenarios
---
If you use this code or build upon it, please cite the original paper:
---
## 📢 Citation
```
@inproceedings{tariq2023execution,
title={Execution Time Prediction Model that Considers Dynamic Allocation of Spark Executors},
author={Tariq, Hina and Das, Olivia},
booktitle={Computer Performance Engineering (EPEW/ASMTA)},
series={Lecture Notes in Computer Science},
volume={14231},
pages={340--352},
year={2023},
publisher={Springer},
doi={10.1007/978-3-031-43185-2_23}
}
```
---