https://github.com/kube-HPC/hkube
🐟 High Performance Computing over Kubernetes - Core Repo 🎣
https://github.com/kube-HPC/hkube
algorithm cluster hkube kubernetes pipeline
Last synced: 12 days ago
JSON representation
🐟 High Performance Computing over Kubernetes - Core Repo 🎣
- Host: GitHub
- URL: https://github.com/kube-HPC/hkube
- Owner: kube-HPC
- License: mit
- Created: 2017-11-21T13:57:01.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2025-04-10T13:56:31.000Z (23 days ago)
- Last Synced: 2025-04-10T14:58:40.809Z (23 days ago)
- Topics: algorithm, cluster, hkube, kubernetes, pipeline
- Language: JavaScript
- Homepage: http://hkube.org
- Size: 134 MB
- Stars: 307
- Watchers: 9
- Forks: 21
- Open Issues: 68
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome-repositories - kube-HPC/hkube - 🐟 High Performance Computing over Kubernetes - Core Repo 🎣 (JavaScript)
README
# 
> HKube is a cloud-native open source framework to run **[distributed](https://en.wikipedia.org/wiki/Distributed_computing) pipeline of algorithms** built on [Kubernetes](https://kubernetes.io/).
>
> HKube optimally **utilizing** pipeline's resources, based on **user priorities** and **[heuristics](https://en.wikipedia.org/wiki/Heuristic)**.## Features
- **Distributed pipeline of algorithms**
- Receives [DAG graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) as input and automatically parallelizes your algorithms over the cluster.
- Manages the complications of distributed processing, keep your code simple (even single threaded).- **Language Agnostic** - As a container based framework designed to facilitate the use of any language for your algorithm.
- **Batch Algorithms** - Run algorithms as a batch - instances of the same algorithm in order to accelerate the running time.
- **Optimize Hardware Utilization**
- Containers **automatically** placed based on their resource requirements and other constraints, while not sacrificing availability.
- Mixes critical and best-effort workloads in order to **drive up utilization** and save resources.
- **Efficient execution** and clustering by heuristics which uses pipeline and algorithm metrics with combination of user requirements.- **Build API** - Just upload your code, you **don't have to worry** about building containers and integrating them with HKube API.
- **Cluster Debugging**
- Debug a **part of a pipeline** based on previous results.
- Debug a **single algorithm** on your IDE, while the rest of the algorithms running in the cluster.- **Jupyter Integration** - Scale your jupyter running tasks [Jupyter](https://jupyter.org/) with hkube.
## User Guide
- [Installation](#installation)
- [Dependencies](#dependencies)
- [Helm](#helm)
- [APIs](#apis)
- [UI Dashboard](#ui-dashboard)
- [REST API](#rest-api)
- [CLI](#cli)
- [API Usage Example](#api-usage-example)
- [The Problem](#the-problem)
- [Solution](#solution)
- [Range Algorithm](#range-algorithm)
- [Multiply Algorithm](#multiply-algorithm)
- [Reduce Algorithm](#reduce-algorithm)
- [Building a Pipeline](#building-a-pipeline)
- [Pipeline Descriptor](#pipeline-descriptor)
- [Node dependencies](#node-dependencies)
- [JSON Breakdown](#json-breakdown)
- [Advance Options](#advance-options)
- [Algorithm](#algorithm)
- [Implementing the Algorithms](#implementing-the-algorithms)
- [Range (Python)](#range-python)
- [Multiply (Python)](#multiply-python)
- [Reduce (Javascript)](#reduce-javascript)
- [Integrate Algorithms](#integrate-algorithms)
- [Integrate Pipeline](#integrate-pipeline)
- [Raw - Ad-hoc pipeline running](#raw---ad-hoc-pipeline-running)
- [Stored - Storing the pipeline descriptor for next running](#stored---storing-the-pipeline-descriptor-for-next-running)
- [Monitor Pipeline Results](#monitor-pipeline-results)## Installation
### Dependencies
HKube runs on top of Kubernetes so in order to run HKube we have to install it's prerequisites.
- **Kubernetes** - Install [Kubernetes](https://kubernetes.io/docs/user-journeys/users/application-developer/foundational/#section-1) or [Minikube](https://kubernetes.io/docs/tasks/tools/install-minikube/) or [microk8s](https://microk8s.io/).
- **Helm** - HKube installation uses [Helm](https://helm.sh/), follow the [installation guide](https://helm.sh/docs/using_helm/#installing-helm).
### Helm
1. Add the [HKube Helm repository](http://hkube.io/helm/) to `helm`:
```bash
helm repo add hkube http://hkube.io/helm/
```
2. Configure a docker registry for [builds](http://hkube.io/learn/algorithms/#the-easy-way)
Create a ```values.yaml``` file for custom helm values
```yaml
build_secret:
# pull secret is only needed if docker hub is not accessible
pull:
registry: ''
namespace: ''
username: ''
password: ''
# enter your docker hub / other registry credentials
push:
registry: '' # can be left empty for docker hub
namespace: '' # registry namespace - usually your username
username: ''
password: ''
```2. Install HKube chart
```console
helm install hkube/hkube -f ./values.yaml --name my-release
```> This command installs HKube in a minimal configuration for **development**. Check [production-deployment](http://hkube.io/learn/install/#production-deployment).
## APIs
There are three ways to communicate with HKube: **Dashboard**, **REST API** and **CLI**.
### UI Dashboard
[Dashboard](http://hkube.io/tech/dashboard/) is a web-based HKube user interface. Dashboard supports every functionality HKube has to offer.

### REST API
HKube exposes it's functionality with REST API.
- [API Spec](http://hkube.io/spec/)
- [Swagger-UI](http://petstore.swagger.io/?url=https://raw.githubusercontent.com/kube-HPC/api-server/master/api/rest-api/swagger.json) - locally `{yourDomain}/hkube/api-server/swagger-ui`### CLI
`hkubectl` is HKube command line tool.
```bash
hkubectl [type] [command] [name]# More information
hkubectl --help
```Download `hkubectl` [latest version](https://github.com/kube-HPC/hkubectl/releases).
```bash
curl -Lo hkubectl https://github.com/kube-HPC/hkubectl/releases/latest/download/hkubectl-linux \
&& chmod +x hkubectl \
&& sudo mv hkubectl /usr/local/bin/
```
> For mac replace with hkubectl-macos
> For Windows download hkubectl-win.exeConfig `hkubectl` with your running Kubernetes.
```bash
# Config
hkubectl config set endpoint ${KUBERNETES-MASTER-IP}hkubectl config set rejectUnauthorized false
```> Make sure `kubectl` is configured to your cluster.
>
> HKube requires that certain pods will run in privileged security permissions, consult your Kubernetes installation to see how it's done.## API Usage Example
### The Problem
We want to solve the next problem with given input and a desired output:
- _Input:_ Two numbers `N`, `k`.
- _Desired Output:_ A number `M` so:For example: `N=5`, `k=2` will result:
### Solution
We will solve **the problem** by running a distributed pipeline of three algorithms: Range, Multiply and Reduce.
#### Range Algorithm
Creates an array of length `N`.
```console
N = 5
5 -> [1,2,3,4,5]
```#### Multiply Algorithm
Multiples the received data from `Range Algorithm` by `k`.
```console
k = 2
[1,2,3,4,5] * (2) -> [2,4,6,8,10]
```#### Reduce Algorithm
The algorithm will wait until all the instances of the `Multiply Algorithm` will finish then it will summarize the received data together .
```console
[2,4,6,8,10] -> 30
```### Building a Pipeline
We will **implement the algorithms** using various languages and **construct a pipeline** from them using **HKube**.

#### Pipeline Descriptor
The **pipeline descriptor** is a **JSON object** which describes and defines the links between the **nodes** by defining the dependencies between them.
```json
{
"name": "numbers",
"nodes": [
{
"nodeName": "Range",
"algorithmName": "range",
"input": ["@flowInput.data"]
},
{
"nodeName": "Multiply",
"algorithmName": "multiply",
"input": ["#@Range", "@flowInput.mul"]
},
{
"nodeName": "Reduce",
"algorithmName": "reduce",
"input": ["@Multiply"]
}
],
"flowInput": {
"data": 5,
"mul": 2
}
}
```> Note the `flowInput`: `data` = N = 5, `mul` = k = 2
#### Node dependencies
HKube [allows special signs](http://hkube.io/learn/execution/#batch) in nodes `input` for defining the pipeline execution flow.
In our case we used:
**(@)** — References input parameters for the algorithm.
**(#)** — Execute nodes in parallel and reduce the results into single node.
**(\#@)** — By combining `#` and `@` we can create a batch processing on node results.

#### JSON Breakdown
We created a pipeline name `numbers`.
```json
"name":"numbers"
```The pipeline is defined by three nodes.
```json
"nodes":[
{
"nodeName":"Range",
"algorithmName":"range",
"input":["@flowInput.data"]
},
{
"nodeName":"Multiply",
"algorithmName":"multiply",
"input":["#@Range","@flowInput.mul"]
},
{
"nodeName":"Reduce",
"algorithmName":"reduce",
"input":["@Multiply"]
},
]
```In HKube, the linkage between the nodes is done by defining the algorithm inputs. `Multiply` will be run after `Range` algorithm because of the input dependency between them.
Keep in mind that HKube will transport the results between the nodes **automatically** for doing it HKube currently support two different types of transportation layers _object storage_ and _files system_.

The `flowInput` is the place to define the Pipeline inputs:
```json
"flowInput":{
"data":5,
"mul":2
}
```In our case we used _Numeric Type_ but it can be any [JSON type](https://json-schema.org/understanding-json-schema/reference/type.html) (`Object`, `String` etc).
#### Advance Options
There are more features that can be defined from the descriptor file.
```JSON
"webhooks": {
"progress": "http://my-url-to-progress",
"result": "http://my-url-to-result"
},
"priority": 3,
"triggers":
{
"pipelines":[],
"cron":{}
}
"options":{
"batchTolerance": 80,
"concurrentPipelines": 2,
"ttl": 3600,
"progressVerbosityLevel":"info"
}
```- **webhooks** - There are two types of webhooks, _progress_ and _result_.
> You can also fetch the same data from the REST API.
- progress:`{jobId}/api/v1/exec/status`
- result: `{jobId}/api/v1/exec/results`- **priority** - HKube support five level of priorities, five is the highest. Those priorities with the metrics that HKube gathered helps to decide which algorithms should be run first.
- **triggers** - There are two types of triggers that HKube currently support `cron` and `pipeline`.
- **cron** - HKube can schedule your stored pipelines based on cron pattern.
> Check [cron editor](https://crontab.guru/) in order to construct your cron.
- **pipeline** - You can set your pipelines to run each time other pipeline/s has been finished successfully .- **options** - There are other more options that can be configured:
- **Batch Tolerance** - The Batch Tolerance is a threshold setting that allow you to control in which _percent_ from the batch processing the entire pipeline should be fail.
- **Concurrency** - Pipeline Concurrency define the number of pipelines that are allowed to be running at the same time.
- **TTL** - Time to live (TTL) limits the lifetime of pipeline in the cluster. stop will be sent if pipeline running for more than ttl (in seconds).
- **Verbosity Level** - The Verbosity Level is a setting that allows to control what type of progress events the client will notified about. The severity levels are ascending from least important to most important: `trace` `debug` `info` `warn` `error` `critical`.### Algorithm
The pipeline is built from algorithms which containerized with docker.
There are two ways to integrate your algorithm into HKube:
- **Seamless Integration** - As written above HKube can build automatically your docker with the HKube's websocket wrapper.
- **Code writing** - In order to add algorithm manually to HKube you need to wrap your algorithm with HKube. HKube already has a wrappers for `python`,`javaScript`, `java` and `.NET core`.#### Implementing the [Algorithms](#meet-the-algorithms)
We will create the algorithms to solve [the problem](#the-problem), HKube currently support two languages for auto build _Python_ and _JavaScript_.
> Important notes:
>
> - **Installing dependencies**
> During the container build, HKube will search for the _requirement.txt_ file and will try to install the packages from the pip package manager.
> - **Advanced Operations**
> HKube can build the algorithm only by implementing start function but for advanced operation such as one time initiation and gracefully stopping you have to implement two other functions `init` and `stop`.##### Range (Python)
```Python
def start(args):
print('algorithm: range start')
input = args['input'][0]
array = list(range(input))
return array
```The start method calls with the args parameter, the inputs to the algorithm will appear in the `input` property.
The `input` property is an array, so you would like to take the first argument (`"input":["@flowInput.data"]` as you can see we placed `data` as the first argument)
##### Multiply (Python)
```Python
def start(args):
print('algorithm: multiply start')
input = args['input'][0]
mul = args['input'][1]
return input * mul
```We sent two parameters `"input":["#@Range","@flowInput.mul"]`, the first one is the output from `Range` that sent an array of numbers, but because we using **batch** sign **(#)** each multiply algorithm will get one item from the array, the second parameter we passing is the `mul` parameter from the `flowInput` object.
##### Reduce (Javascript)
```javascript
module.exports.start = args => {
console.log('algorithm: reduce start');
const input = args.input[0];
return input.reduce((acc, cur) => acc + cur);
};
```We placed `["@Multiply"]` in the input parameter, HKube will collect all the data from the multiply algorithm and will sent it as an array in the first input parameter.
### Integrate Algorithms
After we created the [algorithms](#meet-the-algorithms), we will integrate them with the [CLI](#cli).
> Can be done also through the [Dashboard](#dashboard).
Create a `yaml` (or `JSON`) that defines the **algorithm**:
```yaml
# range.yml
name: range
env: python # can be python or javascript
resources:
cpu: 0.5
gpu: 1 # if not needed just remove it from the file
mem: 512Micode:
path: /path-to-algorithm/range.tar.gz
entryPoint: main.py
```Add it with the [CLI](#cli):
```console
hkubectl algorithm apply --f range.yml
```> Keep in mind we have to do it **for each one of the algorithms**.
### Integrate Pipeline
Create a `yaml` (or `JSON`) that defines the **pipeline**:
```yml
# number.yml
name: numbers
nodes:
- nodeName: Range
algorithmName: range
input:
- '@flowInput.data'
- nodeName: Multiply
algorithmName: multiply
input:
- '#@Range'
- '@flowInput.mul'
- nodeName: Reduce
algorithmName: reduce
input:
- '@Multiply'
flowInput:
data: 5
mul: 2
```#### Raw - Ad-hoc pipeline running
For running our pipeline as raw-data:
```bash
hkubectl exec raw --f numbers.yml
```#### Stored - Storing the pipeline descriptor for next running
First we store the pipeline:
```bash
hkubectl pipeline store --f numbers.yml
```Then you can execute it (if `flowInput` available)
```bash
# flowInput stored
hkubectl exec stored numbers
```For executing the pipeline with other input, create `yaml` (or `JSON`) file with `flowInput` key:
```yml
# otherFlowInput.yml
flowInput:
data: 500
mul: 200
```Then you can executed it by pipeline `name`:
```bash
# Executes pipeline "numbers" with data=500, mul=200
hkubectl exec stored numbers --f otherFlowInput.yml
```### Monitor Pipeline Results
As a result of executing pipeline, HKube will return a `jobId`.
```bash
# Job ID returned after execution.
result:
jobId: numbers:a56c97cb-5d62-4990-817c-04a8b0448b7c.numbers
```This is a unique identifier helps to **query** this **specific pipeline execution**.
- **Stop** pipeline execution:
`hkubectl exec stop [reason]`- **Track** pipeline status:
`hkubectl exec status `- **Track** pipeline result:
`hkubectl exec result `