https://github.com/kube-HPC/hkube

🐟 High Performance Computing over Kubernetes - Core Repo 🎣
https://github.com/kube-HPC/hkube
algorithm cluster hkube kubernetes pipeline
Last synced: about 1 month ago
JSON representation
🐟 High Performance Computing over Kubernetes - Core Repo 🎣
Host: GitHub
URL: https://github.com/kube-HPC/hkube
Owner: kube-HPC
License: mit
Created: 2017-11-21T13:57:01.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2025-04-10T13:56:31.000Z (about 1 month ago)
Last Synced: 2025-04-10T14:58:40.809Z (about 1 month ago)
Topics: algorithm, cluster, hkube, kubernetes, pipeline
Language: JavaScript
Homepage: http://hkube.org
Size: 134 MB
Stars: 307
Watchers: 9
Forks: 21
Open Issues: 68
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project

awesome-repositories - kube-HPC/hkube - 🐟 High Performance Computing over Kubernetes - Core Repo 🎣 (JavaScript)
README

        # ![HKube](https://user-images.githubusercontent.com/27515937/59049270-4cffa000-8890-11e9-8281-4aa97b1ecca3.png) 

> HKube is a cloud-native open source framework to run **[distributed](https://en.wikipedia.org/wiki/Distributed_computing) pipeline of algorithms** built on [Kubernetes](https://kubernetes.io/).

>

> HKube optimally **utilizing** pipeline's resources, based on **user priorities** and **[heuristics](https://en.wikipedia.org/wiki/Heuristic)**.

## Features 

- **Distributed pipeline of algorithms**

  - Receives [DAG graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) as input and automatically parallelizes your algorithms over the cluster.

  - Manages the complications of distributed processing, keep your code simple (even single threaded).

- **Language Agnostic** - As a container based framework designed to facilitate the use of any language for your algorithm.

- **Batch Algorithms** - Run algorithms as a batch - instances of the same algorithm in order to accelerate the running time.

- **Optimize Hardware Utilization**

  - Containers **automatically** placed based on their resource requirements and other constraints, while not sacrificing availability.

  - Mixes critical and best-effort workloads in order to **drive up utilization** and save resources.

  - **Efficient execution** and clustering by heuristics which uses pipeline and algorithm metrics with combination of user requirements.

- **Build API** - Just upload your code, you **don't have to worry** about building containers and integrating them with HKube API.

- **Cluster Debugging**

  - Debug a **part of a pipeline** based on previous results.

  - Debug a **single algorithm** on your IDE, while the rest of the algorithms running in the cluster.

- **Jupyter Integration** - Scale your jupyter running tasks [Jupyter](https://jupyter.org/) with hkube.

## User Guide 

- [Installation](#installation)

  - [Dependencies](#dependencies)

  - [Helm](#helm)

- [APIs](#apis)

  - [UI Dashboard](#ui-dashboard)

  - [REST API](#rest-api)

  - [CLI](#cli)

- [API Usage Example](#api-usage-example)

  - [The Problem](#the-problem)

  - [Solution](#solution)

    - [Range Algorithm](#range-algorithm)

    - [Multiply Algorithm](#multiply-algorithm)

    - [Reduce Algorithm](#reduce-algorithm)

  - [Building a Pipeline](#building-a-pipeline)

    - [Pipeline Descriptor](#pipeline-descriptor)

    - [Node dependencies](#node-dependencies)

    - [JSON Breakdown](#json-breakdown)

    - [Advance Options](#advance-options)

  - [Algorithm](#algorithm)

    - [Implementing the Algorithms](#implementing-the-algorithms)

      - [Range (Python)](#range-python)

      - [Multiply (Python)](#multiply-python)

      - [Reduce (Javascript)](#reduce-javascript)

  - [Integrate Algorithms](#integrate-algorithms)

  - [Integrate Pipeline](#integrate-pipeline)

    - [Raw - Ad-hoc pipeline running](#raw---ad-hoc-pipeline-running)

    - [Stored - Storing the pipeline descriptor for next running](#stored---storing-the-pipeline-descriptor-for-next-running)

  - [Monitor Pipeline Results](#monitor-pipeline-results)

## Installation

### Dependencies

HKube runs on top of Kubernetes so in order to run HKube we have to install it's prerequisites.

- **Kubernetes** - Install [Kubernetes](https://kubernetes.io/docs/user-journeys/users/application-developer/foundational/#section-1) or [Minikube](https://kubernetes.io/docs/tasks/tools/install-minikube/) or [microk8s](https://microk8s.io/).

- **Helm** - HKube installation uses [Helm](https://helm.sh/), follow the [installation guide](https://helm.sh/docs/using_helm/#installing-helm).

### Helm

1. Add the [HKube Helm repository](http://hkube.io/helm/) to `helm`:

   ```bash

   helm repo add hkube http://hkube.io/helm/

   ```

2. Configure a docker registry for [builds](http://hkube.io/learn/algorithms/#the-easy-way)  

Create a ```values.yaml``` file for custom helm values

```yaml

build_secret:

# pull secret is only needed if docker hub is not accessible

  pull:

    registry: ''

    namespace: ''

    username: ''

    password: ''

# enter your docker hub / other registry credentials

  push:

    registry: '' # can be left empty for docker hub

    namespace: '' # registry namespace - usually your username

    username: ''

    password: ''

```

2. Install HKube chart

   ```console

   helm install hkube/hkube  -f ./values.yaml --name my-release

   ```

> This command installs HKube in a minimal configuration for **development**. Check [production-deployment](http://hkube.io/learn/install/#production-deployment).

## APIs

There are three ways to communicate with HKube: **Dashboard**, **REST API** and **CLI**.

### UI Dashboard

[Dashboard](http://hkube.io/tech/dashboard/) is a web-based HKube user interface. Dashboard supports every functionality HKube has to offer.

![ui](https://user-images.githubusercontent.com/27515937/59031674-051b5180-886d-11e9-9806-ecce2e3ba8f0.png)

### REST API

HKube exposes it's functionality with REST API.

- [API Spec](http://hkube.io/spec/)

- [Swagger-UI](http://petstore.swagger.io/?url=https://raw.githubusercontent.com/kube-HPC/api-server/master/api/rest-api/swagger.json) - locally `{yourDomain}/hkube/api-server/swagger-ui`

### CLI

`hkubectl` is HKube command line tool.

```bash

hkubectl [type] [command] [name]

# More information

hkubectl --help

```

Download `hkubectl` [latest version](https://github.com/kube-HPC/hkubectl/releases).

```bash

curl -Lo hkubectl https://github.com/kube-HPC/hkubectl/releases/latest/download/hkubectl-linux \

&& chmod +x hkubectl \

&& sudo mv hkubectl /usr/local/bin/

```

> For mac replace with hkubectl-macos  

> For Windows download hkubectl-win.exe  

Config `hkubectl` with your running Kubernetes.

```bash

# Config

hkubectl config set endpoint ${KUBERNETES-MASTER-IP}

hkubectl config set rejectUnauthorized false

```

> Make sure `kubectl` is configured to your cluster.

>

> HKube requires that certain pods will run in privileged security permissions, consult your Kubernetes installation to see how it's done.

## API Usage Example

### The Problem

We want to solve the next problem with given input and a desired output:

- _Input:_ Two numbers `N`, `k`.

- _Desired Output:_ A number `M` so: 


For example: `N=5`, `k=2` will result: 


### Solution

We will solve **the problem** by running a distributed pipeline of three algorithms: Range, Multiply and Reduce.

#### Range Algorithm

Creates an array of length `N`.

```console

 N = 5

 5 -> [1,2,3,4,5]

```

#### Multiply Algorithm

Multiples the received data from `Range Algorithm` by `k`.

```console

k = 2

[1,2,3,4,5] * (2) -> [2,4,6,8,10]

```

#### Reduce Algorithm

The algorithm will wait until all the instances of the `Multiply Algorithm` will finish then it will summarize the received data together .

```console

[2,4,6,8,10] -> 30

```

### Building a Pipeline

We will **implement the algorithms** using various languages and **construct a pipeline** from them using **HKube**.

![PipelineExample](https://user-images.githubusercontent.com/27515937/59348861-e9a6bf80-8d20-11e9-8d7b-76efedeb669f.png)

#### Pipeline Descriptor

The **pipeline descriptor** is a **JSON object** which describes and defines the links between the **nodes** by defining the dependencies between them.

```json

{

  "name": "numbers",

  "nodes": [

    {

      "nodeName": "Range",

      "algorithmName": "range",

      "input": ["@flowInput.data"]

    },

    {

      "nodeName": "Multiply",

      "algorithmName": "multiply",

      "input": ["#@Range", "@flowInput.mul"]

    },

    {

      "nodeName": "Reduce",

      "algorithmName": "reduce",

      "input": ["@Multiply"]

    }

  ],

  "flowInput": {

    "data": 5,

    "mul": 2

  }

}

```

> Note the `flowInput`: `data` = N = 5, `mul` = k = 2

#### Node dependencies

HKube [allows special signs](http://hkube.io/learn/execution/#batch) in nodes `input` for defining the pipeline execution flow.

In our case we used:

**(@)**  —  References input parameters for the algorithm.

**(#)**  —  Execute nodes in parallel and reduce the results into single node.

**(\#@)** — By combining `#` and `@` we can create a batch processing on node results.

![JSON](https://user-images.githubusercontent.com/27515937/59355883-815fda00-8d30-11e9-963c-c13b18caf54e.png)

#### JSON Breakdown

We created a pipeline name `numbers`.

```json

    "name":"numbers"

```

The pipeline is defined by three nodes.

```json

"nodes":[

    {

            "nodeName":"Range",

            "algorithmName":"range",

            "input":["@flowInput.data"]

        },

        {

            "nodeName":"Multiply",

            "algorithmName":"multiply",

            "input":["#@Range","@flowInput.mul"]

        },

        {

            "nodeName":"Reduce",

            "algorithmName":"reduce",

            "input":["@Multiply"]

        },

    ]

```

In HKube, the linkage between the nodes is done by defining the algorithm inputs. `Multiply` will be run after `Range` algorithm because of the input dependency between them.

Keep in mind that HKube will transport the results between the nodes **automatically** for doing it HKube currently support two different types of transportation layers _object storage_ and _files system_.

![Group 4 (3)](https://user-images.githubusercontent.com/27515937/59355963-a3595c80-8d30-11e9-88b0-96084085103e.png)

The `flowInput` is the place to define the Pipeline inputs:

```json

"flowInput":{

    "data":5,

    "mul":2

}

```

In our case we used _Numeric Type_ but it can be any [JSON type](https://json-schema.org/understanding-json-schema/reference/type.html) (`Object`, `String` etc).

#### Advance Options

There are more features that can be defined from the descriptor file.

```JSON

"webhooks": {

    "progress": "http://my-url-to-progress",

      "result": "http://my-url-to-result"

    },

  "priority": 3,

  "triggers":

      {

      "pipelines":[],

        "cron":{}

      }

  "options":{

      "batchTolerance": 80,

      "concurrentPipelines": 2,

      "ttl": 3600,

      "progressVerbosityLevel":"info"

  }

```

- **webhooks** - There are two types of webhooks, _progress_ and _result_.

  > You can also fetch the same data from the REST API.

  - progress:`{jobId}/api/v1/exec/status`

  - result: `{jobId}/api/v1/exec/results`

- **priority** - HKube support five level of priorities, five is the highest. Those priorities with the metrics that HKube gathered helps to decide which algorithms should be run first.

- **triggers** - There are two types of triggers that HKube currently support `cron` and `pipeline`.

  - **cron** - HKube can schedule your stored pipelines based on cron pattern.

    > Check [cron editor](https://crontab.guru/) in order to construct your cron.

  - **pipeline** - You can set your pipelines to run each time other pipeline/s has been finished successfully .

- **options** - There are other more options that can be configured:

  - **Batch Tolerance** - The Batch Tolerance is a threshold setting that allow you to control in which _percent_ from the batch processing the entire pipeline should be fail.

  - **Concurrency** - Pipeline Concurrency define the number of pipelines that are allowed to be running at the same time.

  - **TTL** - Time to live (TTL) limits the lifetime of pipeline in the cluster. stop will be sent if pipeline running for more than ttl (in seconds).

  - **Verbosity Level** - The Verbosity Level is a setting that allows to control what type of progress events the client will notified about. The severity levels are ascending from least important to most important: `trace` `debug` `info` `warn` `error` `critical`.

### Algorithm

The pipeline is built from algorithms which containerized with docker.

There are two ways to integrate your algorithm into HKube:

- **Seamless Integration** - As written above HKube can build automatically your docker with the HKube's websocket wrapper.

- **Code writing** - In order to add algorithm manually to HKube you need to wrap your algorithm with HKube. HKube already has a wrappers for `python`,`javaScript`, `java` and `.NET core`.

#### Implementing the [Algorithms](#meet-the-algorithms)

We will create the algorithms to solve [the problem](#the-problem), HKube currently support two languages for auto build _Python_ and _JavaScript_.

> Important notes:

>

> - **Installing dependencies**

>   During the container build, HKube will search for the _requirement.txt_ file and will try to install the packages from the pip package manager.

> - **Advanced Operations**

>   HKube can build the algorithm only by implementing start function but for advanced operation such as one time initiation and gracefully stopping you have to implement two other functions `init` and `stop`.

##### Range (Python)

```Python

def start(args):

    print('algorithm: range start')

    input = args['input'][0]

    array = list(range(input))

    return array

```

The start method calls with the args parameter, the inputs to the algorithm will appear in the `input` property.

The `input` property is an array, so you would like to take the first argument (`"input":["@flowInput.data"]` as you can see we placed `data` as the first argument)

##### Multiply (Python)

```Python

def start(args):

    print('algorithm: multiply start')

    input = args['input'][0]

    mul = args['input'][1]

    return input * mul

```

We sent two parameters `"input":["#@Range","@flowInput.mul"]`, the first one is the output from `Range` that sent an array of numbers, but because we using **batch** sign **(#)** each multiply algorithm will get one item from the array, the second parameter we passing is the `mul` parameter from the `flowInput` object.

##### Reduce (Javascript)

```javascript

module.exports.start = args => {

  console.log('algorithm: reduce start');

  const input = args.input[0];

  return input.reduce((acc, cur) => acc + cur);

};

```

We placed `["@Multiply"]` in the input parameter, HKube will collect all the data from the multiply algorithm and will sent it as an array in the first input parameter.

### Integrate Algorithms

After we created the [algorithms](#meet-the-algorithms), we will integrate them with the [CLI](#cli).

> Can be done also through the [Dashboard](#dashboard).

Create a `yaml` (or `JSON`) that defines the **algorithm**:

```yaml

# range.yml

name: range

env: python # can be python or javascript

resources:

  cpu: 0.5

  gpu: 1 # if not needed just remove it from the file

  mem: 512Mi

code:

  path: /path-to-algorithm/range.tar.gz

  entryPoint: main.py

```

Add it with the [CLI](#cli):

```console

hkubectl algorithm apply --f range.yml

```

> Keep in mind we have to do it **for each one of the algorithms**.

### Integrate Pipeline

Create a `yaml` (or `JSON`) that defines the **pipeline**:

```yml

# number.yml

name: numbers

nodes:

  - nodeName: Range

    algorithmName: range

    input:

      - '@flowInput.data'

  - nodeName: Multiply

    algorithmName: multiply

    input:

      - '#@Range'

      - '@flowInput.mul'

  - nodeName: Reduce

    algorithmName: reduce

    input:

      - '@Multiply'

flowInput:

  data: 5

  mul: 2

```

#### Raw - Ad-hoc pipeline running

For running our pipeline as raw-data:

```bash

hkubectl exec raw --f numbers.yml

```

#### Stored - Storing the pipeline descriptor for next running

First we store the pipeline:

```bash

hkubectl pipeline store --f numbers.yml

```

Then you can execute it (if `flowInput` available)

```bash

# flowInput stored

hkubectl exec stored numbers

```

For executing the pipeline with other input, create `yaml` (or `JSON`) file with `flowInput` key:

```yml

# otherFlowInput.yml

flowInput:

  data: 500

  mul: 200

```

Then you can executed it by pipeline `name`:

```bash

# Executes pipeline "numbers" with data=500, mul=200

hkubectl exec stored numbers --f otherFlowInput.yml

```

### Monitor Pipeline Results

As a result of executing pipeline, HKube will return a `jobId`.

```bash

# Job ID returned after execution.

result:

  jobId: numbers:a56c97cb-5d62-4990-817c-04a8b0448b7c.numbers

```

This is a unique identifier helps to **query** this **specific pipeline execution**.

- **Stop** pipeline execution:

  `hkubectl exec stop  [reason]`

- **Track** pipeline status:

  `hkubectl exec status `

- **Track** pipeline result:

  `hkubectl exec result `
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kube-HPC/hkube

Awesome Lists containing this project

README