{"id":13646837,"url":"https://github.com/kube-HPC/hkube","last_synced_at":"2025-04-21T21:31:55.624Z","repository":{"id":37413092,"uuid":"111555721","full_name":"kube-HPC/hkube","owner":"kube-HPC","description":"🐟 High Performance Computing over Kubernetes - Core Repo 🎣","archived":false,"fork":false,"pushed_at":"2025-04-10T13:56:31.000Z","size":140521,"stargazers_count":307,"open_issues_count":68,"forks_count":21,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-04-10T14:58:40.809Z","etag":null,"topics":["algorithm","cluster","hkube","kubernetes","pipeline"],"latest_commit_sha":null,"homepage":"http://hkube.org","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kube-HPC.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-11-21T13:57:01.000Z","updated_at":"2025-04-10T13:56:34.000Z","dependencies_parsed_at":"2023-09-26T12:22:25.533Z","dependency_job_id":"16d84dd1-4f16-40b7-b140-b8be9bb53142","html_url":"https://github.com/kube-HPC/hkube","commit_stats":null,"previous_names":[],"tags_count":1004,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kube-HPC%2Fhkube","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kube-HPC%2Fhkube/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kube-HPC%2Fhkube/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kube-HPC%2Fhkube/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kube-HPC","download_url":"https://codeload.github.com/kube-HPC/hkube/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250136787,"owners_count":21380891,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithm","cluster","hkube","kubernetes","pipeline"],"created_at":"2024-08-02T01:03:09.413Z","updated_at":"2025-04-21T21:31:55.611Z","avatar_url":"https://github.com/kube-HPC.png","language":"JavaScript","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"readme":"# ![HKube](https://user-images.githubusercontent.com/27515937/59049270-4cffa000-8890-11e9-8281-4aa97b1ecca3.png) \u003c!-- omit in toc --\u003e\n\n\u003e HKube is a cloud-native open source framework to run **[distributed](https://en.wikipedia.org/wiki/Distributed_computing) pipeline of algorithms** built on [Kubernetes](https://kubernetes.io/).\n\u003e\n\u003e HKube optimally **utilizing** pipeline's resources, based on **user priorities** and **[heuristics](https://en.wikipedia.org/wiki/Heuristic)**.\n\n## Features \u003c!-- omit in toc --\u003e\n\n- **Distributed pipeline of algorithms**\n\n  - Receives [DAG graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) as input and automatically parallelizes your algorithms over the cluster.\n  - Manages the complications of distributed processing, keep your code simple (even single threaded).\n\n- **Language Agnostic** - As a container based framework designed to facilitate the use of any language for your algorithm.\n\n- **Batch Algorithms** - Run algorithms as a batch - instances of the same algorithm in order to accelerate the running time.\n\n- **Optimize Hardware Utilization**\n\n  - Containers **automatically** placed based on their resource requirements and other constraints, while not sacrificing availability.\n  - Mixes critical and best-effort workloads in order to **drive up utilization** and save resources.\n  - **Efficient execution** and clustering by heuristics which uses pipeline and algorithm metrics with combination of user requirements.\n\n- **Build API** - Just upload your code, you **don't have to worry** about building containers and integrating them with HKube API.\n\n- **Cluster Debugging**\n\n  - Debug a **part of a pipeline** based on previous results.\n  - Debug a **single algorithm** on your IDE, while the rest of the algorithms running in the cluster.\n\n- **Jupyter Integration** - Scale your jupyter running tasks [Jupyter](https://jupyter.org/) with hkube.\n\n## User Guide \u003c!-- omit in toc --\u003e\n\n\u003c!-- TOC --\u003e\n\n- [Installation](#installation)\n  - [Dependencies](#dependencies)\n  - [Helm](#helm)\n- [APIs](#apis)\n  - [UI Dashboard](#ui-dashboard)\n  - [REST API](#rest-api)\n  - [CLI](#cli)\n- [API Usage Example](#api-usage-example)\n  - [The Problem](#the-problem)\n  - [Solution](#solution)\n    - [Range Algorithm](#range-algorithm)\n    - [Multiply Algorithm](#multiply-algorithm)\n    - [Reduce Algorithm](#reduce-algorithm)\n  - [Building a Pipeline](#building-a-pipeline)\n    - [Pipeline Descriptor](#pipeline-descriptor)\n    - [Node dependencies](#node-dependencies)\n    - [JSON Breakdown](#json-breakdown)\n    - [Advance Options](#advance-options)\n  - [Algorithm](#algorithm)\n    - [Implementing the Algorithms](#implementing-the-algorithms)\n      - [Range (Python)](#range-python)\n      - [Multiply (Python)](#multiply-python)\n      - [Reduce (Javascript)](#reduce-javascript)\n  - [Integrate Algorithms](#integrate-algorithms)\n  - [Integrate Pipeline](#integrate-pipeline)\n    - [Raw - Ad-hoc pipeline running](#raw---ad-hoc-pipeline-running)\n    - [Stored - Storing the pipeline descriptor for next running](#stored---storing-the-pipeline-descriptor-for-next-running)\n  - [Monitor Pipeline Results](#monitor-pipeline-results)\n\n## Installation\n\n### Dependencies\n\nHKube runs on top of Kubernetes so in order to run HKube we have to install it's prerequisites.\n\n- **Kubernetes** - Install [Kubernetes](https://kubernetes.io/docs/user-journeys/users/application-developer/foundational/#section-1) or [Minikube](https://kubernetes.io/docs/tasks/tools/install-minikube/) or [microk8s](https://microk8s.io/).\n\n- **Helm** - HKube installation uses [Helm](https://helm.sh/), follow the [installation guide](https://helm.sh/docs/using_helm/#installing-helm).\n\n### Helm\n\n1. Add the [HKube Helm repository](http://hkube.io/helm/) to `helm`:\n\n   ```bash\n   helm repo add hkube http://hkube.io/helm/\n   ```\n2. Configure a docker registry for [builds](http://hkube.io/learn/algorithms/#the-easy-way)  \nCreate a ```values.yaml``` file for custom helm values\n```yaml\nbuild_secret:\n# pull secret is only needed if docker hub is not accessible\n  pull:\n    registry: ''\n    namespace: ''\n    username: ''\n    password: ''\n# enter your docker hub / other registry credentials\n  push:\n    registry: '' # can be left empty for docker hub\n    namespace: '' # registry namespace - usually your username\n    username: ''\n    password: ''\n```\n\n2. Install HKube chart\n\n   ```console\n   helm install hkube/hkube  -f ./values.yaml --name my-release\n   ```\n\n\u003e This command installs HKube in a minimal configuration for **development**. Check [production-deployment](http://hkube.io/learn/install/#production-deployment).\n\n## APIs\n\nThere are three ways to communicate with HKube: **Dashboard**, **REST API** and **CLI**.\n\n### UI Dashboard\n\n[Dashboard](http://hkube.io/tech/dashboard/) is a web-based HKube user interface. Dashboard supports every functionality HKube has to offer.\n\n![ui](https://user-images.githubusercontent.com/27515937/59031674-051b5180-886d-11e9-9806-ecce2e3ba8f0.png)\n\n### REST API\n\nHKube exposes it's functionality with REST API.\n\n- [API Spec](http://hkube.io/spec/)\n- [Swagger-UI](http://petstore.swagger.io/?url=https://raw.githubusercontent.com/kube-HPC/api-server/master/api/rest-api/swagger.json) - locally `{yourDomain}/hkube/api-server/swagger-ui`\n\n### CLI\n\n`hkubectl` is HKube command line tool.\n\n```bash\nhkubectl [type] [command] [name]\n\n# More information\nhkubectl --help\n```\n\nDownload `hkubectl` [latest version](https://github.com/kube-HPC/hkubectl/releases).\n\n```bash\ncurl -Lo hkubectl https://github.com/kube-HPC/hkubectl/releases/latest/download/hkubectl-linux \\\n\u0026\u0026 chmod +x hkubectl \\\n\u0026\u0026 sudo mv hkubectl /usr/local/bin/\n```\n\u003e For mac replace with hkubectl-macos  \n\u003e For Windows download hkubectl-win.exe  \n\nConfig `hkubectl` with your running Kubernetes.\n\n```bash\n# Config\nhkubectl config set endpoint ${KUBERNETES-MASTER-IP}\n\nhkubectl config set rejectUnauthorized false\n```\n\n\u003e Make sure `kubectl` is configured to your cluster.\n\u003e\n\u003e HKube requires that certain pods will run in privileged security permissions, consult your Kubernetes installation to see how it's done.\n\n## API Usage Example\n\n### The Problem\n\nWe want to solve the next problem with given input and a desired output:\n\n- _Input:_ Two numbers `N`, `k`.\n- _Desired Output:_ A number `M` so: \u003cdiv style=\"text-align:center\"\u003e\u003cimg src=\"https://latex.codecogs.com/svg.latex?M\u0026space;=\u0026space;\\sum_{i=1}^N\u0026space;k\\cdot\u0026space;i\" title=\"M = \\sum_{i=1}^N k\\cdot i\" /\u003e\u003c/div\u003e\n\nFor example: `N=5`, `k=2` will result: \u003cdiv style=\"text-align:center\"\u003e\u003cimg src=\"https://latex.codecogs.com/svg.latex?2\\cdot1\u0026plus;2\\cdot\u0026space;2\u0026space;\u0026plus;\u0026space;2\\cdot\u0026space;3\u0026space;\u0026plus;\u0026space;2\\cdot\u0026space;4\u0026space;\u0026plus;\u0026space;2\\cdot\u0026space;5\u0026space;=\u0026space;2\u0026space;\u0026plus;\u0026space;4\u0026space;\u0026plus;6\u0026plus;8\u0026plus;10\u0026space;=\u0026space;30\u0026space;=\u0026space;M\" title=\"2\\cdot1+2\\cdot 2 + 2\\cdot 3 + 2\\cdot 4 + 2\\cdot 5 = 2 + 4 +6+8+10 = 30 = M\" /\u003e\u003c/div\u003e\n\n### Solution\n\nWe will solve **the problem** by running a distributed pipeline of three algorithms: Range, Multiply and Reduce.\n\n#### Range Algorithm\n\nCreates an array of length `N`.\n\n```console\n N = 5\n 5 -\u003e [1,2,3,4,5]\n```\n\n#### Multiply Algorithm\n\nMultiples the received data from `Range Algorithm` by `k`.\n\n```console\nk = 2\n[1,2,3,4,5] * (2) -\u003e [2,4,6,8,10]\n```\n\n#### Reduce Algorithm\n\nThe algorithm will wait until all the instances of the `Multiply Algorithm` will finish then it will summarize the received data together .\n\n```console\n[2,4,6,8,10] -\u003e 30\n```\n\n### Building a Pipeline\n\nWe will **implement the algorithms** using various languages and **construct a pipeline** from them using **HKube**.\n\n![PipelineExample](https://user-images.githubusercontent.com/27515937/59348861-e9a6bf80-8d20-11e9-8d7b-76efedeb669f.png)\n\n#### Pipeline Descriptor\n\nThe **pipeline descriptor** is a **JSON object** which describes and defines the links between the **nodes** by defining the dependencies between them.\n\n```json\n{\n  \"name\": \"numbers\",\n  \"nodes\": [\n    {\n      \"nodeName\": \"Range\",\n      \"algorithmName\": \"range\",\n      \"input\": [\"@flowInput.data\"]\n    },\n    {\n      \"nodeName\": \"Multiply\",\n      \"algorithmName\": \"multiply\",\n      \"input\": [\"#@Range\", \"@flowInput.mul\"]\n    },\n    {\n      \"nodeName\": \"Reduce\",\n      \"algorithmName\": \"reduce\",\n      \"input\": [\"@Multiply\"]\n    }\n  ],\n  \"flowInput\": {\n    \"data\": 5,\n    \"mul\": 2\n  }\n}\n```\n\n\u003e Note the `flowInput`: `data` = N = 5, `mul` = k = 2\n\n#### Node dependencies\n\nHKube [allows special signs](http://hkube.io/learn/execution/#batch) in nodes `input` for defining the pipeline execution flow.\n\nIn our case we used:\n\n**(@)**  —  References input parameters for the algorithm.\n\n**(#)**  —  Execute nodes in parallel and reduce the results into single node.\n\n**(\\#@)** — By combining `#` and `@` we can create a batch processing on node results.\n\n![JSON](https://user-images.githubusercontent.com/27515937/59355883-815fda00-8d30-11e9-963c-c13b18caf54e.png)\n\n#### JSON Breakdown\n\nWe created a pipeline name `numbers`.\n\n```json\n    \"name\":\"numbers\"\n```\n\nThe pipeline is defined by three nodes.\n\n```json\n\"nodes\":[\n    {\n            \"nodeName\":\"Range\",\n            \"algorithmName\":\"range\",\n            \"input\":[\"@flowInput.data\"]\n        },\n        {\n            \"nodeName\":\"Multiply\",\n            \"algorithmName\":\"multiply\",\n            \"input\":[\"#@Range\",\"@flowInput.mul\"]\n        },\n        {\n            \"nodeName\":\"Reduce\",\n            \"algorithmName\":\"reduce\",\n            \"input\":[\"@Multiply\"]\n        },\n    ]\n```\n\nIn HKube, the linkage between the nodes is done by defining the algorithm inputs. `Multiply` will be run after `Range` algorithm because of the input dependency between them.\n\nKeep in mind that HKube will transport the results between the nodes **automatically** for doing it HKube currently support two different types of transportation layers _object storage_ and _files system_.\n\n![Group 4 (3)](https://user-images.githubusercontent.com/27515937/59355963-a3595c80-8d30-11e9-88b0-96084085103e.png)\n\nThe `flowInput` is the place to define the Pipeline inputs:\n\n```json\n\"flowInput\":{\n    \"data\":5,\n    \"mul\":2\n}\n```\n\nIn our case we used _Numeric Type_ but it can be any [JSON type](https://json-schema.org/understanding-json-schema/reference/type.html) (`Object`, `String` etc).\n\n#### Advance Options\n\nThere are more features that can be defined from the descriptor file.\n\n```JSON\n\"webhooks\": {\n    \"progress\": \"http://my-url-to-progress\",\n      \"result\": \"http://my-url-to-result\"\n    },\n  \"priority\": 3,\n  \"triggers\":\n      {\n      \"pipelines\":[],\n        \"cron\":{}\n      }\n  \"options\":{\n      \"batchTolerance\": 80,\n      \"concurrentPipelines\": 2,\n      \"ttl\": 3600,\n      \"progressVerbosityLevel\":\"info\"\n  }\n```\n\n- **webhooks** - There are two types of webhooks, _progress_ and _result_.\n\n  \u003e You can also fetch the same data from the REST API.\n\n  - progress:`{jobId}/api/v1/exec/status`\n  - result: `{jobId}/api/v1/exec/results`\n\n- **priority** - HKube support five level of priorities, five is the highest. Those priorities with the metrics that HKube gathered helps to decide which algorithms should be run first.\n\n- **triggers** - There are two types of triggers that HKube currently support `cron` and `pipeline`.\n\n  - **cron** - HKube can schedule your stored pipelines based on cron pattern.\n    \u003e Check [cron editor](https://crontab.guru/) in order to construct your cron.\n  - **pipeline** - You can set your pipelines to run each time other pipeline/s has been finished successfully .\n\n- **options** - There are other more options that can be configured:\n\n  - **Batch Tolerance** - The Batch Tolerance is a threshold setting that allow you to control in which _percent_ from the batch processing the entire pipeline should be fail.\n  - **Concurrency** - Pipeline Concurrency define the number of pipelines that are allowed to be running at the same time.\n  - **TTL** - Time to live (TTL) limits the lifetime of pipeline in the cluster. stop will be sent if pipeline running for more than ttl (in seconds).\n  - **Verbosity Level** - The Verbosity Level is a setting that allows to control what type of progress events the client will notified about. The severity levels are ascending from least important to most important: `trace` `debug` `info` `warn` `error` `critical`.\n\n### Algorithm\n\nThe pipeline is built from algorithms which containerized with docker.\n\nThere are two ways to integrate your algorithm into HKube:\n\n- **Seamless Integration** - As written above HKube can build automatically your docker with the HKube's websocket wrapper.\n- **Code writing** - In order to add algorithm manually to HKube you need to wrap your algorithm with HKube. HKube already has a wrappers for `python`,`javaScript`, `java` and `.NET core`.\n\n#### Implementing the [Algorithms](#meet-the-algorithms)\n\nWe will create the algorithms to solve [the problem](#the-problem), HKube currently support two languages for auto build _Python_ and _JavaScript_.\n\n\u003e Important notes:\n\u003e\n\u003e - **Installing dependencies**\n\u003e   During the container build, HKube will search for the _requirement.txt_ file and will try to install the packages from the pip package manager.\n\u003e - **Advanced Operations**\n\u003e   HKube can build the algorithm only by implementing start function but for advanced operation such as one time initiation and gracefully stopping you have to implement two other functions `init` and `stop`.\n\n##### Range (Python)\n\n```Python\ndef start(args):\n    print('algorithm: range start')\n    input = args['input'][0]\n    array = list(range(input))\n    return array\n```\n\nThe start method calls with the args parameter, the inputs to the algorithm will appear in the `input` property.\n\nThe `input` property is an array, so you would like to take the first argument (`\"input\":[\"@flowInput.data\"]` as you can see we placed `data` as the first argument)\n\n##### Multiply (Python)\n\n```Python\ndef start(args):\n    print('algorithm: multiply start')\n    input = args['input'][0]\n    mul = args['input'][1]\n    return input * mul\n```\n\nWe sent two parameters `\"input\":[\"#@Range\",\"@flowInput.mul\"]`, the first one is the output from `Range` that sent an array of numbers, but because we using **batch** sign **(#)** each multiply algorithm will get one item from the array, the second parameter we passing is the `mul` parameter from the `flowInput` object.\n\n##### Reduce (Javascript)\n\n```javascript\nmodule.exports.start = args =\u003e {\n  console.log('algorithm: reduce start');\n  const input = args.input[0];\n  return input.reduce((acc, cur) =\u003e acc + cur);\n};\n```\n\nWe placed `[\"@Multiply\"]` in the input parameter, HKube will collect all the data from the multiply algorithm and will sent it as an array in the first input parameter.\n\n### Integrate Algorithms\n\nAfter we created the [algorithms](#meet-the-algorithms), we will integrate them with the [CLI](#cli).\n\n\u003e Can be done also through the [Dashboard](#dashboard).\n\nCreate a `yaml` (or `JSON`) that defines the **algorithm**:\n\n```yaml\n# range.yml\nname: range\nenv: python # can be python or javascript\nresources:\n  cpu: 0.5\n  gpu: 1 # if not needed just remove it from the file\n  mem: 512Mi\n\ncode:\n  path: /path-to-algorithm/range.tar.gz\n  entryPoint: main.py\n```\n\nAdd it with the [CLI](#cli):\n\n```console\nhkubectl algorithm apply --f range.yml\n```\n\n\u003e Keep in mind we have to do it **for each one of the algorithms**.\n\n### Integrate Pipeline\n\nCreate a `yaml` (or `JSON`) that defines the **pipeline**:\n\n```yml\n# number.yml\nname: numbers\nnodes:\n  - nodeName: Range\n    algorithmName: range\n    input:\n      - '@flowInput.data'\n  - nodeName: Multiply\n    algorithmName: multiply\n    input:\n      - '#@Range'\n      - '@flowInput.mul'\n  - nodeName: Reduce\n    algorithmName: reduce\n    input:\n      - '@Multiply'\nflowInput:\n  data: 5\n  mul: 2\n```\n\n#### Raw - Ad-hoc pipeline running\n\nFor running our pipeline as raw-data:\n\n```bash\nhkubectl exec raw --f numbers.yml\n```\n\n#### Stored - Storing the pipeline descriptor for next running\n\nFirst we store the pipeline:\n\n```bash\nhkubectl pipeline store --f numbers.yml\n```\n\nThen you can execute it (if `flowInput` available)\n\n```bash\n# flowInput stored\nhkubectl exec stored numbers\n```\n\nFor executing the pipeline with other input, create `yaml` (or `JSON`) file with `flowInput` key:\n\n```yml\n# otherFlowInput.yml\nflowInput:\n  data: 500\n  mul: 200\n```\n\nThen you can executed it by pipeline `name`:\n\n```bash\n# Executes pipeline \"numbers\" with data=500, mul=200\nhkubectl exec stored numbers --f otherFlowInput.yml\n```\n\n### Monitor Pipeline Results\n\nAs a result of executing pipeline, HKube will return a `jobId`.\n\n```bash\n# Job ID returned after execution.\nresult:\n  jobId: numbers:a56c97cb-5d62-4990-817c-04a8b0448b7c.numbers\n```\n\nThis is a unique identifier helps to **query** this **specific pipeline execution**.\n\n- **Stop** pipeline execution:\n  `hkubectl exec stop \u003cjobId\u003e [reason]`\n\n- **Track** pipeline status:\n  `hkubectl exec status \u003cjobId\u003e`\n\n- **Track** pipeline result:\n  `hkubectl exec result \u003cjobId\u003e`\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkube-HPC%2Fhkube","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkube-HPC%2Fhkube","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkube-HPC%2Fhkube/lists"}