{"id":21683777,"url":"https://github.com/open-eo/openeo-geotrellis-kubernetes","last_synced_at":"2025-10-24T13:11:03.279Z","repository":{"id":43012161,"uuid":"274414282","full_name":"Open-EO/openeo-geotrellis-kubernetes","owner":"Open-EO","description":"Contains scripts to run openeo geotrellis backend on a Kubernetes cluster on DIAS.","archived":false,"fork":false,"pushed_at":"2025-04-10T13:27:25.000Z","size":841,"stargazers_count":3,"open_issues_count":6,"forks_count":4,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-04-10T14:51:57.801Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Open-EO.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-06-23T13:33:58.000Z","updated_at":"2025-04-07T14:38:26.000Z","dependencies_parsed_at":"2023-09-22T16:16:11.761Z","dependency_job_id":"3d729ba9-1a27-46b1-b305-cf1db0e856dc","html_url":"https://github.com/Open-EO/openeo-geotrellis-kubernetes","commit_stats":{"total_commits":394,"total_committers":7,"mean_commits":"56.285714285714285","dds":0.4467005076142132,"last_synced_commit":"68b96d99729c903105a73bd9303689b16431722e"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Open-EO%2Fopeneo-geotrellis-kubernetes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Open-EO%2Fopeneo-geotrellis-kubernetes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Open-EO%2Fopeneo-geotrellis-kubernetes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Open-EO%2Fopeneo-geotrellis-kubernetes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Open-EO","download_url":"https://codeload.github.com/Open-EO/openeo-geotrellis-kubernetes/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248537003,"owners_count":21120688,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-25T16:13:28.134Z","updated_at":"2025-10-24T13:10:58.248Z","avatar_url":"https://github.com/Open-EO.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# openeo-geotrellis-kubernetes\n\n## Introduction\n\nThis repository contains all the required configuration and documentation to deploy the OpenEO service on Kubernetes on a DIAS cloud environment. The [CloudFerro CreoDIAS][1] Openstack was used for this setup, but should work for any Openstack environment.\n\nThe goal is to have a working OpenEO deployment, running as a Spark job, on Kubernetes, on Openstack.\n\n## Prerequisites\n\n  * CreoDIAS account with credits to provision Openstack resources\n  * Some Kubernetes knowledge\n  * A working Kubernetes cluster (see [Setup the Kubernetes environment](#setup-the-kubernetes-environment))\n\n## Setup the Kubernetes environment\n\nTo deploy a Kubernetes cluster on the CreoDIAS OpenStack, we make use of [RKE][3], Rancher Kubernetes Engine. This tool allows you to deploy a cluster that runs in containers in a fast and reliable way. Rancher also provides a [Terraform provider for RKE][16], which we use to keep all of our infrastructure provisioning within Terraform.\n\nSteps we took to create a cluster:\n\n1. Create an OpenStack image with Packer with all necessary dependencies like Docker, users, ...\n2. Provision instances, networks, security groups, ...\n3. Provision the Kubernetes cluster itself\n\n## The Spark operator\n\nSince Spark version 2.3.0, you can use Kubernetes as scheduler for your Spark jobs ([docs][4]). Using this functionality, you just use the regular `spark-submit` tool, with Kubernetes specific parameters. While this would work, it has some drawbacks:\n\n  * You can't manage your Spark applications as regular Kubernetes objects and thus not use `kubectl` to manage them\n  * There is no way of native cron support\n  * No automatic retries\n  * No built-in monitoring\n\nTo meet all these shortcomings, the GoogleCloudPlatform has developed the [spark-on-k8s-operator][5]. This operator provides a way to schedule Spark applications as native Kubernetes resources, using CRD's (Custom Resource Definitions). The operator will also manage the lifecycle of the Spark applications and provide many other features (see the Github repo for a complete feature list).\n\nNow that we've chosen to use the operator instead of regular `spark-submit` commands, we have to get the operator installed on the cluster. The easiest way to perform the installation, is by using the [Helm chart][6]. Installation instructions on how to install Helm can be found [here][7].\n\nWith Helm installed, we can now install the operator. To start, we need to add the Helm chart repository to Helm:\n\n```\nhelm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator\n```\n\nYou can choose in which Kubernetes namespace the spark operator will be installed. In this guide. we'll create a separate namespace for the operator and one for the Spark jobs. The namespace for the Spark jobs should be created separately, as the Helm chart doesn't create it for us.\n\n```\nkubectl create namespace spark-jobs\nhelm install sparkoperator --generate-name --create-namespace --namespace spark-operator --set sparkJobNamespace=spark-jobs --set webhook.enable=true --set image.tag=v1beta2-1.3.0-3.1.1\n```\n\nLet's break down the different options passed to the `helm install` command:\n\n| Option                  | Explanation                                                                    |\n|-------------------------|--------------------------------------------------------------------------------|\n| incubator/sparkoperator | Helm chart that will be installed                                              |\n| --generate-name         | Generate a name for the  Helm release (Also possible to provide your own name) |\n| --create-namespace      | Create the namespace if it doesn't exist yet (requires Helm 3.2+)              |\n| --namespace             | The namespace where the operator will be installed in                          |\n| --set sparkJobNamespace | The namespace where the Spark jobs will be deployed                            |\n| --set webhook.enable    | This enables the mutating admission webhook                                    |\n\nWith the operator installed you should be able to get the following outputs:\n\n```\nhelm list -n spark-operator\nNAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION\nspark-operator  spark-operator  1               2022-05-13 06:17:20.165279131 +0000 UTC deployed        spark-operator-1.1.15   v1beta2-1.3.1-3.1.1\n```\n\nand:\n\n```\nkubectl get pods -n spark-operator\nNAME                                        READY   STATUS    RESTARTS   AGE\nsparkoperator-1593174963-556544cb66-v5v7f   1/1     Running   0          11m\n```\n\nThe Spark operator is now up and running in its own namespace.\n\n## Configuring collections\nFor data access, openEO requires you to register the configuration of 'collections' in a file inside the docker image. The default location is '/opt/layercatalog.json', but it can be set with an environment variable OPENEO_CATALOG_FILES.\n\nThe main format of this file follows the STAC collection metadata specification, but there's also a number of custom properties.\nDocumentation of these properties is rather sparse, so for now, working from existing examples is the best approach. In the best case, when working from a properly configured STAC collection, the amount of additional configuration is limited.\n\nTo update the configuration in the docker file, we would recommend rebuilding the image. Another approach might be to try and use a Kubernetes config map.\n\nAn example of a layercatalog:\nhttps://github.com/Open-EO/openeo-geotrellis-kubernetes/blob/master/docker/creo_layercatalog.json\n\n\nSee: \nhttps://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/#define-the-key-to-use-when-creating-a-configmap-from-a-file\n\nFor reference, this Python code actually works with the configuration in the layer catalog:\nhttps://github.com/Open-EO/openeo-geopyspark-driver/blob/master/openeogeotrellis/layercatalog.py\n\n### STAC based collection\n\nCustom config for STAC collection. Note that 'opensearch_XX' properties are used, but the backend tries to determine automatically what to use. \n\n```\n \"_vito\": {\n      \"data_source\": {\n        \"type\": \"file-s2\",\n        \"opensearch_collection_id\": \"S2\",\n        \"opensearch_endpoint\": \"https://resto.c-scale.zcu.cz\",\n        \"provider:backend\": \"incd\"\n      }\n    }\n```\n\n## Deploy the OpenEO Spark job\n\nNow that we have the Spark operator running, it's time to deploy our application on the cluster. As Kubernetes is a container orchestrator, we of course need to package our application into a container image. The necessary files to build this container image can be found in the [docker][8] directory. A prebuilt image is available at `vito-docker.artifactory.vgt.vito.be/openeo-geotrellis-kube`.\n\n\nAs we are using the Spark operator, we can now define our Spark job as a Kubernetes resource, rather than a `spark-submit` script.\nTo have a fully functional application, we need more than a `SparkApplication` Kubernetes resource. We also need an Ingress, ServiceAccounts, RBAC, ... A [Helm chart][9] was written to help with all the parts we need. Instructions on how to use this chart, can be found in the `README.md` file.\n\nAfter creating a `values.yam` file with your necessary values, you can then invoke a regular `helm install` command to deploy your instance of openEO to your Kubernetes cluster.\n\n### Relevant config properties\n\nThe configuration of the backend is mostly done via environment variables in values.yaml. Please review these values for your backend:\n\n\n| Service              | Function                                    |\n|----------------------|---------------------------------------------|\n| SWIFT_URL            | url of S3 API that will be used to store batch job results       |\n| AWS_ACCESS_KEY_ID    | S3 Access key used to manage batch job results     |\n| AWS_SECRET_ACCESS_KEY| S3 Access key secret                           |\n| ZOOKEEPERNODES       | Zookeeper cluster used for persistence      |\n\n\n## Deploy an openEO job-tracker cron job\n\nTo track the status of batch jobs, we need a job-tracker cron job. An example of such job can be found at [examples/job-tracker.yaml][17].\n\n## Monitoring\n\nThe spark-operator also provides an easy way to monitor your Spark Applications that are submitted by the operator. In the `openeo.yaml` manifest, you can find the necessary configuration:\n\n```\nmonitoring:\n  exposeDriverMetrics: true\n  exposeExecutorMetrics: true\n  prometheus:\n    jmxExporterJar: \"/opt/jmx_prometheus_javaagent-0.13.0.jar\"\n    port: 8090\n```\n\nYou can also only expose the executor's metrics for example.\n\nThis monitoring section uses a default configuration file for the [jmx_exporter][12]. The default configuration file can be found [here][13]. Of course, there is also a possibility to override this configuration. Via a Kubernetes [configMap][14], you can mount a different `prometheus.yaml` file in your driver and executor. First, a `configMap` should be created, containing the `prometheus.yaml` file:\n\n** To make the configMaps to work, you need to enable the [mutating admission webhook][15] for the spark-operator **\n\n```\nkubectl create configmap prometheus-jmx-config --from-file=prometheus.yaml\n```\n\nThis `configMap` can now be added to your pods:\n\n```\ndriver:\n  configMaps:\n    - name: prometheus-jmx-config\n      path: /opt/prometheus_config\n```\n\nThe path can then be used in the monitoring configuration:\n\n```\nmonitoring:\n  prometheus:\n    configuration: /opt/prometheus_config\n```\n\nThe new metrics should now be appearing in your Prometheus instance.\n\n## Additional services running in our cluster\n\n| Service              | Function                                    |\n|----------------------|---------------------------------------------|\n| Prometheus           | Monitoring                                  |\n| Grafana              | Visualize prometheus metrics                |\n| Alertmanager         | Alerting                                    |\n| Spark History Server | Overview of historical jobs                 |\n| Zookeeper            | Keep track of batch jobs                    |\n| Traefik              | Ingress                                     |\n| Kubecost             | Monitoring of costs per namespace, pod, ... |\n| Filebeat             | Log aggregation                             |\n| RKE Pushprox         | Helper for metrics of RKE components        |\n| Cinder CSI           | Dynamic OpenStack Cinder volumes            |\n\n[1]: https://creodias.eu/\n[2]: https://creodias.eu/faq-other/-/asset_publisher/SIs09LQL6Gct/content/how-to-configure-kubernetes\n[3]: https://rancher.com/products/rke\n[4]: https://spark.apache.org/docs/2.4.5/running-on-kubernetes.html\n[5]: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator\n[6]: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/tree/master/charts/spark-operator-chart\n[7]: https://helm.sh/docs/intro/install\n[8]: https://github.com/Open-EO/openeo-geotrellis-kubernetes/blob/master/docker\n[9]: https://github.com/Open-EO/openeo-geotrellis-kubernetes/tree/master/kubernetes/charts/sparkapplication\n[12]: https://github.com/prometheus/jmx_exporter\n[13]: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/spark-docker/conf/prometheus.yaml\n[14]: https://kubernetes.io/docs/concepts/configuration/configmap/\n[15]: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/quick-start-guide.md#about-the-mutating-admission-webhook\n[16]: https://registry.terraform.io/providers/rancher/rke/latest/docs\n[17]: examples/job-tracker.yaml\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopen-eo%2Fopeneo-geotrellis-kubernetes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopen-eo%2Fopeneo-geotrellis-kubernetes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopen-eo%2Fopeneo-geotrellis-kubernetes/lists"}