https://github.com/davidandw190/data-locality-scheduler
A work-in-progress, Knative-compatible, K8s scheduler designed for data-intensive workflows across geo-distributed, heterogeneous edge-cloud infrastructures, optimizing data transfer consts against computational migration overhead through adaptive locality-aware placement decisions.
https://github.com/davidandw190/data-locality-scheduler
cloud-native data-locality edge-cloud edge-computing faas go k8s knative kubernetes scheduler
Last synced: 26 days ago
JSON representation
A work-in-progress, Knative-compatible, K8s scheduler designed for data-intensive workflows across geo-distributed, heterogeneous edge-cloud infrastructures, optimizing data transfer consts against computational migration overhead through adaptive locality-aware placement decisions.
- Host: GitHub
- URL: https://github.com/davidandw190/data-locality-scheduler
- Owner: davidandw190
- Created: 2025-03-07T11:47:43.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-15T17:17:38.000Z (12 months ago)
- Last Synced: 2025-06-15T17:25:44.419Z (12 months ago)
- Topics: cloud-native, data-locality, edge-cloud, edge-computing, faas, go, k8s, knative, kubernetes, scheduler
- Language: Python
- Homepage:
- Size: 5.3 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data-Locality-Aware Kubernetes Scheduler
> **Bachelor's Thesis Project**
> *Data-Locality-Aware Scheduling for Serverless Containerized Workflows across the Edge-Cloud Continuum*
> **Author:** Andrei-David Nan
> **Supervisor:** Lect. Dr. Adrian Spătaru
> **Institution:** West University of Timișoara, Faculty of Mathematics and Computer Science
## Overview
This project implements a data-locality-aware scheduler extension for Kubernetes that optimizes containerized workload placement across edge-cloud environments. The scheduler uses a Multi-Criteria Decision Making (MCDM) algorithm to balance resource availability, node capabilities, and data locality when making scheduling decisions.
## Architecture
The system consists of three main components:
### 1. Data-Locality-Aware Scheduler (`cmd/scheduler`)
- Extends Kubernetes scheduler with MCDM algorithm
- Maintains Bandwidth Graph and Data Index for locality decisions
- Registers as alternative scheduler (`schedulerName: "data-locality-scheduler"`)
- Implements dynamic priotity function weight adjustment based on workload classification
### 2. Node Capability Daemon (`cmd/node-daemon`)
- Runs as DaemonSet across all cluster nodes
- Detects hardware capabilities and storage characteristics
- Updates node labels with capability information
- Discovers local data objects (MinIO buckets, volumes), and works together with the Storage Index of the scheduler which accounts for data item placements.
### 3. Knative Integration Webhook (`integration/knative`)
- Intercepts Knative service deployments
- Enriches serverless functions with scheduling annotations
- Enables data-locality optimization for FaaS workloads
## Project Structure
```bash
├── cmd/
│ ├── node-daemon/ # Node capability detection daemon
│ └── scheduler/ # Data-locality-aware scheduler
├── pkg/
│ ├── daemon/ # Node daemon implementation
│ ├── scheduler/ # Scheduler algorithms and logic
│ └── storage/ # Storage abstraction and indexing
├── deployments/
│ ├── 00-core/ # Scheduler and daemon deployments
│ ├── 01-storage/ # MinIO storage setup
│ ├── 02-test/ # Test workload definitions
│ └── 03-validation/ # Validation and stress tests
├── benchmarks/
├── integration/knative/ # Knative serverless integration
├── config/ # Configuration files
└── build/ # Docker build files
```
## Quick Start
### Prerequisites
- Kubernetes cluster (v1.23+)
- `kubectl` configured
- Docker
### Clone the Repository
```bash
# Clone the data locality scheduler project
git clone github.com/davidandw190/data-locality-aware-scheduler
cd data-locality-aware-scheduler/
# Navigate to the benchmarker framework
cd benchmarks/simulated
```
### Install Dependencies
```python
# Install Python dependencies
pip install -r requirements.txt
# Check framework help
python benchmark_runner.py --help
```
### Basic Deployment
1. **Deploy core components:**
```bash
kubectl apply -f deployments/00-core/
```
2. **Set up storage (optional)**
```bash
kubectl apply -f deployments/01-storage/
```
### Using the scheduler
Specify the scheduler in your pod spec:
```bash
apiVersion: v1
kind: Pod
metadata:
name: my-workload
spec:
schedulerName: "data-locality-scheduler"
containers:
- name: app
image: my-app:latest
```
## Configuration
The scheduler behavior can be customized via `config/scheduler-config.yaml`:
```yaml
scheduler:
weights:
default:
resource: 0.20
affinity: 0.10
nodeType: 0.15
capabilities: 0.15
dataLocality: 0.40
dataIntensive:
dataLocality: 0.70
resource: 0.10
```
## Knative Integration
For serverless workloads using Knative:
1. **Deploy the webhook**:
```bash
kubectl apply -f integration/knative/manifests/
```
2. **Deploy Knative services normally** - they will automatically use the data-locality scheduler:
```bash
kubectl apply -f - <