https://github.com/mgarralda/spark-self-tuning-framework
A Self-Tuning Framework for Cost-Aware Apache Spark Configuration
https://github.com/mgarralda/spark-self-tuning-framework
bayesian-optimization big-data metaheuristic-optimisation spark-tuning transfer-learning
Last synced: 30 days ago
JSON representation
A Self-Tuning Framework for Cost-Aware Apache Spark Configuration
- Host: GitHub
- URL: https://github.com/mgarralda/spark-self-tuning-framework
- Owner: mgarralda
- License: other
- Created: 2025-10-18T05:58:58.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-10-28T08:08:01.000Z (8 months ago)
- Last Synced: 2025-10-28T10:08:29.168Z (8 months ago)
- Topics: bayesian-optimization, big-data, metaheuristic-optimisation, spark-tuning, transfer-learning
- Language: Python
- Homepage: https://doi.org/10.1016/j.future.2025.107730
- Size: 5.75 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# Spark Self-Tuning Framework (STL–ILS–TS–BO)
Implementation of the framework proposed in
**“A hybrid metaheuristics–Bayesian Optimization framework with safe transfer learning for continuous Spark tuning”**
(*Future Generation Computer Systems*, 2025).
DOI: https://doi.org/10.1016/j.future.2025.108325
---
## 🧠 Overview
The **Spark Self-Tuning Framework** provides continuous and adaptive optimization of Apache Spark configurations by combining:
- **Bayesian Optimization (BO)** with a custom acquisition function (`LCB`)
- **Compositional surrogate models** for performance and uncertainty estimation
- **Iterated Local Search + Tabu Search (ILS–TS)** for guided exploration and local refinement
- **Safe Transfer Learning (STL-PARN)** to reuse historical workload executions
- **Baseline implementations**: *Garralda*, *TurBO*, *YORO*, and *Naïve BO*
This framework enables cost-aware, knowledge-driven configuration tuning for complex Spark workloads.
---
## 📁 Project Structure
```
project-root/
├── src/ # Core framework
│ └── framework/
│ ├── proposed/ # Main optimization method
│ ├── metaheuristics/ # Tabu + ILS modules
│ ├── bayesian_optimization/
│ └── safe_transfer_learning/
├── src_resources/ # Experiment runners
├── resources/ # Datasets & results
```
---
## 📊 Data
Experimental data and benchmarks are provided under:
```
resources/
├── dataset/
│ ├── historical_dataset.json
│ ├── lhs_initialization.json
├── experiment_results/
│ ├── performance_model/
│ ├── optimization_model/
```
---
## 📜 License
This project is dual-licensed under:
- **CC BY-NC 4.0** for academic and research use
- **Commercial use is not allowed.**
Any use of this software or its derivatives for commercial purposes is strictly prohibited.
Distributed on an “AS IS” basis, without warranties or conditions of any kind.
See the [LICENSE](LICENSE) file for details.
---
## 📚 Citation
If you use this framework, its methodology, infrastructure, datasets, or derived components in research, benchmarking studies, technical documentation, or industrial reports, please cite the associated article and/or doctoral thesis.
### Article
```bibtex
@article{GarraldaBarrio2025,
title = {A hybrid metaheuristics–Bayesian optimization framework with safe transfer learning for continuous spark tuning},
author = {Mariano Garralda-Barrio and Carlos Eiras-Franco and Verónica Bolón-Canedo},
journal = {Future Generation Computer Systems},
pages = {108325},
year = {2025},
issn = {0167-739X},
doi = {https://doi.org/10.1016/j.future.2025.108325},
publisher = {Elsevier},
note = {Code available at \url{https://github.com/mgarralda/spark-self-tuning-framework}},
keywords = {Performance modeling, Big data, Machine learning, Apache Spark, Distributed computing}
}
```
### Doctoral Thesis
```bibtex
@phdthesis{GarraldaBarrio2026,
author = {Mariano Garralda Barrio},
title = {AI-Driven Optimization in Distributed Computing Systems: A Self-Tuning Framework},
school = {University of Coruña},
year = {2026},
type = {Doctoral Thesis},
url = {https://hdl.handle.net/2183/48114}
}
```
### References
- Garralda-Barrio, M., Eiras-Franco, C., & Bolón-Canedo, V. (2025).
*A hybrid metaheuristics–Bayesian optimization framework with safe transfer learning for continuous Spark tuning*.
Future Generation Computer Systems.
https://doi.org/10.1016/j.future.2025.108325
- Garralda Barrio, M. (2026).
*AI-Driven Optimization in Distributed Computing Systems: A Self-Tuning Framework*.
Doctoral Thesis, University of Coruña.
https://hdl.handle.net/2183/48114
---
## 📬 Contact
For questions, collaborations, or feedback, please contact:
**Mariano Garralda**
[mariano.garralda@udc.es](mailto:mariano.garralda@udc.es)
Universidade da Coruña (UDC)
---