An open API service indexing awesome lists of open source software.

https://github.com/davidandw190/data-locality-scheduler

A work-in-progress, Knative-compatible, K8s scheduler designed for data-intensive workflows across geo-distributed, heterogeneous edge-cloud infrastructures, optimizing data transfer consts against computational migration overhead through adaptive locality-aware placement decisions.
https://github.com/davidandw190/data-locality-scheduler

cloud-native data-locality edge-cloud edge-computing faas go k8s knative kubernetes scheduler

Last synced: 26 days ago
JSON representation

A work-in-progress, Knative-compatible, K8s scheduler designed for data-intensive workflows across geo-distributed, heterogeneous edge-cloud infrastructures, optimizing data transfer consts against computational migration overhead through adaptive locality-aware placement decisions.

Awesome Lists containing this project

README

          

# Data-Locality-Aware Kubernetes Scheduler

> **Bachelor's Thesis Project**
> *Data-Locality-Aware Scheduling for Serverless Containerized Workflows across the Edge-Cloud Continuum*
> **Author:** Andrei-David Nan
> **Supervisor:** Lect. Dr. Adrian Spătaru
> **Institution:** West University of Timișoara, Faculty of Mathematics and Computer Science

## Overview

This project implements a data-locality-aware scheduler extension for Kubernetes that optimizes containerized workload placement across edge-cloud environments. The scheduler uses a Multi-Criteria Decision Making (MCDM) algorithm to balance resource availability, node capabilities, and data locality when making scheduling decisions.

## Architecture

The system consists of three main components:

### 1. Data-Locality-Aware Scheduler (`cmd/scheduler`)
- Extends Kubernetes scheduler with MCDM algorithm
- Maintains Bandwidth Graph and Data Index for locality decisions
- Registers as alternative scheduler (`schedulerName: "data-locality-scheduler"`)
- Implements dynamic priotity function weight adjustment based on workload classification

### 2. Node Capability Daemon (`cmd/node-daemon`)
- Runs as DaemonSet across all cluster nodes
- Detects hardware capabilities and storage characteristics

- Updates node labels with capability information
- Discovers local data objects (MinIO buckets, volumes), and works together with the Storage Index of the scheduler which accounts for data item placements.

### 3. Knative Integration Webhook (`integration/knative`)
- Intercepts Knative service deployments
- Enriches serverless functions with scheduling annotations
- Enables data-locality optimization for FaaS workloads

## Project Structure

```bash
├── cmd/
│ ├── node-daemon/ # Node capability detection daemon
│ └── scheduler/ # Data-locality-aware scheduler
├── pkg/
│ ├── daemon/ # Node daemon implementation
│ ├── scheduler/ # Scheduler algorithms and logic
│ └── storage/ # Storage abstraction and indexing
├── deployments/
│ ├── 00-core/ # Scheduler and daemon deployments
│ ├── 01-storage/ # MinIO storage setup
│ ├── 02-test/ # Test workload definitions
│ └── 03-validation/ # Validation and stress tests
├── benchmarks/
├── integration/knative/ # Knative serverless integration
├── config/ # Configuration files
└── build/ # Docker build files
```

## Quick Start

### Prerequisites

- Kubernetes cluster (v1.23+)
- `kubectl` configured
- Docker

### Clone the Repository

```bash
# Clone the data locality scheduler project
git clone github.com/davidandw190/data-locality-aware-scheduler
cd data-locality-aware-scheduler/

# Navigate to the benchmarker framework
cd benchmarks/simulated
```

### Install Dependencies

```python
# Install Python dependencies
pip install -r requirements.txt

# Check framework help
python benchmark_runner.py --help
```

### Basic Deployment

1. **Deploy core components:**
```bash
kubectl apply -f deployments/00-core/
```
2. **Set up storage (optional)**

```bash
kubectl apply -f deployments/01-storage/
```

### Using the scheduler

Specify the scheduler in your pod spec:

```bash
apiVersion: v1
kind: Pod
metadata:
name: my-workload
spec:
schedulerName: "data-locality-scheduler"
containers:
- name: app
image: my-app:latest
```

## Configuration

The scheduler behavior can be customized via `config/scheduler-config.yaml`:

```yaml
scheduler:
weights:
default:
resource: 0.20
affinity: 0.10
nodeType: 0.15
capabilities: 0.15
dataLocality: 0.40
dataIntensive:
dataLocality: 0.70
resource: 0.10
```

## Knative Integration

For serverless workloads using Knative:

1. **Deploy the webhook**:

```bash
kubectl apply -f integration/knative/manifests/
```
2. **Deploy Knative services normally** - they will automatically use the data-locality scheduler:

```bash
kubectl apply -f - <