{"id":29616594,"url":"https://github.com/hinzy97/spark-dynamic-executor-time-prediction","last_synced_at":"2026-05-18T06:38:07.368Z","repository":{"id":304798130,"uuid":"1020011294","full_name":"hinzy97/spark-dynamic-executor-time-prediction","owner":"hinzy97","description":"Neural Network Models for Predicting Execution Time with Dynamic Executor Allocation in Apache Spark.","archived":false,"fork":false,"pushed_at":"2025-07-24T05:43:59.000Z","size":150,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-05T13:27:04.885Z","etag":null,"topics":["apache-spark","big-data-analytics","deep-learning","distributed-computing","dynamic-allocation","execution-time-prediction","machine-learning","neural-networks","performance-modeling","spark"],"latest_commit_sha":null,"homepage":"https://www.researchgate.net/publication/381108033_Execution_Time_Prediction_Model_that_Considers_Dynamic_Allocation_of_Spark_Executors#fullTextFileContent","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hinzy97.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-15T07:54:17.000Z","updated_at":"2025-07-24T05:44:03.000Z","dependencies_parsed_at":"2025-07-15T18:34:37.712Z","dependency_job_id":"49ce2da1-2307-4a0f-b32a-d206e2f31292","html_url":"https://github.com/hinzy97/spark-dynamic-executor-time-prediction","commit_stats":null,"previous_names":["hinzy97/spark-dynamic-executor-time-prediction"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/hinzy97/spark-dynamic-executor-time-prediction","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hinzy97%2Fspark-dynamic-executor-time-prediction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hinzy97%2Fspark-dynamic-executor-time-prediction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hinzy97%2Fspark-dynamic-executor-time-prediction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hinzy97%2Fspark-dynamic-executor-time-prediction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hinzy97","download_url":"https://codeload.github.com/hinzy97/spark-dynamic-executor-time-prediction/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hinzy97%2Fspark-dynamic-executor-time-prediction/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33167830,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-18T05:43:36.989Z","status":"ssl_error","status_checked_at":"2026-05-18T05:43:19.133Z","response_time":71,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-spark","big-data-analytics","deep-learning","distributed-computing","dynamic-allocation","execution-time-prediction","machine-learning","neural-networks","performance-modeling","spark"],"created_at":"2025-07-21T01:01:37.252Z","updated_at":"2026-05-18T06:38:07.349Z","avatar_url":"https://github.com/hinzy97.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# NN Execution Time Prediction\n\nThis repository contains neural network models for predicting execution time of Spark applications, based on the paper:\n\n**Tariq, H., \u0026 Das, O. (2023). Execution Time Prediction Model that Considers Dynamic Allocation of Spark Executors.**  \nPublished in: *EPEW/ASMTA 2023, Lecture Notes in Computer Science (LNCS)*.  \nDOI: https://doi.org/10.1007/978-3-031-43185-2_23\n\n---\n\n## 🧠 What It Does\nThe goal is to accurately predict the **total runtime** of Spark applications affected by dynamic executor behavior.\nTwo types of neural network models are implemented:\n- **Black-box model**: Feature Selection ('Datasize', 'IdleTimeout', 'BacklogTimeout').\n- **White-box model**: Uses detailed stage-level features including task metrics, executor timelines alongwith 'Datasize', 'IdleTimeout', 'BacklogTimeout'.\n\nWorkloads include:\n\n- **TPC-DS SQL queries**: Q26, Q52, Q70\n- **KMeans clustering**\n\n\n\n## Structure\n\n- `km_nn_blackbox.txt`: NN model using blackbox features for KMeans.\n- `km_nn_whitebox.txt`: NN model using whitebox features for KMeans.\n- `query26_nn_blackbox.txt`: NN model using blackbox features for Query-26.\n- `query26_nn_whitebox.txt`: NN model using whitebox features for Query-26.\n- `q52_NN_black box.ipynb`: Blackbox NN model for Query-52\n- `q52_NN_whitebox.ipynb`: Whitebox NN model for Query-52\n- `q70_NN_black box.ipynb`: Blackbox NN model for Query-70\n- `q70_NN_whitebox.ipynb`: Whitebox NN model for Query-70\n- `kmeansdata.csv`: Input data for KMeans models.\n- `query26_train_blackbox.csv`: Blackbox feature data for Query-26.\n- `query26_train_whitebox.csv`: Whitebox feature data for Query-26.\n- `query52train.csv`: Blackbox feature data for Query-52.\n- `query52train1.csv`: Whitebox feature data for Query-52.\n- `query70train.csv`: Blackbox feature data for Query-70.\n- `query70train1.csv`: Whitebox feature data for Query-70.\n\n\n## How to Run\n\n1. Open a Jupyter notebook inside the `NN` folder.\n2. Run the notebook to view predictions and plots.\n\n---\n\n## 🔧 Future Work\n\n- Integration with Spark UI for real-time feature extraction\n- Coupling Dynamic Allocation Model (DAM) with an **optimization framework** for executor recommendation\n- Extending DAM for **multi-job workloads** or streaming scenarios\n\n---\n\nIf you use this code or build upon it, please cite the original paper:\n\n---\n\n## 📢 Citation\n\n```\n@inproceedings{tariq2023execution,\n  title={Execution Time Prediction Model that Considers Dynamic Allocation of Spark Executors},\n  author={Tariq, Hina and Das, Olivia},\n  booktitle={Computer Performance Engineering (EPEW/ASMTA)},\n  series={Lecture Notes in Computer Science},\n  volume={14231},\n  pages={340--352},\n  year={2023},\n  publisher={Springer},\n  doi={10.1007/978-3-031-43185-2_23}\n}\n```\n---\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhinzy97%2Fspark-dynamic-executor-time-prediction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhinzy97%2Fspark-dynamic-executor-time-prediction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhinzy97%2Fspark-dynamic-executor-time-prediction/lists"}