{"id":27666085,"url":"https://github.com/GoogleCloudPlatform/spark-on-k8s-operator","last_synced_at":"2025-04-24T13:01:19.422Z","repository":{"id":38381085,"uuid":"116165188","full_name":"kubeflow/spark-operator","owner":"kubeflow","description":"Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes. ","archived":false,"fork":false,"pushed_at":"2025-04-21T17:33:43.000Z","size":26636,"stargazers_count":2903,"open_issues_count":71,"forks_count":1405,"subscribers_count":71,"default_branch":"master","last_synced_at":"2025-04-24T08:23:08.038Z","etag":null,"topics":["apache-spark","google-cloud-dataproc","kubernetes","kubernetes-controller","kubernetes-crd","kubernetes-operator","spark"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kubeflow.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-01-03T17:43:16.000Z","updated_at":"2025-04-21T08:32:31.000Z","dependencies_parsed_at":"2023-02-10T01:00:53.537Z","dependency_job_id":"d3ac773f-f453-4842-8675-9ad1c84f4df1","html_url":"https://github.com/kubeflow/spark-operator","commit_stats":{"total_commits":876,"total_committers":220,"mean_commits":3.981818181818182,"dds":0.7671232876712328,"last_synced_commit":"1509b341d4e6210f58518339e6b675c9cae640b5"},"previous_names":["kubeflow/spark-operator","googlecloudplatform/spark-on-k8s-operator"],"tags_count":100,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kubeflow%2Fspark-operator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kubeflow%2Fspark-operator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kubeflow%2Fspark-operator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kubeflow%2Fspark-operator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kubeflow","download_url":"https://codeload.github.com/kubeflow/spark-operator/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250632593,"owners_count":21462367,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-spark","google-cloud-dataproc","kubernetes","kubernetes-controller","kubernetes-crd","kubernetes-operator","spark"],"created_at":"2025-04-24T13:00:47.558Z","updated_at":"2025-04-24T13:01:19.326Z","avatar_url":"https://github.com/kubeflow.png","language":"Go","funding_links":[],"categories":["Kubernetes Operators","Repository is obsolete","Go"],"sub_categories":["Awesome Operators in the Wild"],"readme":"# Kubeflow Spark Operator\n\n[![Integration Test](https://github.com/kubeflow/spark-operator/actions/workflows/integration.yaml/badge.svg)](https://github.com/kubeflow/spark-operator/actions/workflows/integration.yaml)\n[![Go Report Card](https://goreportcard.com/badge/github.com/kubeflow/spark-operator)](https://goreportcard.com/report/github.com/kubeflow/spark-operator)\n[![GitHub release](https://img.shields.io/github/v/release/kubeflow/spark-operator)](https://github.com/kubeflow/spark-operator/releases)\n\n## What is Spark Operator?\n\nThe Kubernetes Operator for Apache Spark aims to make specifying and running [Spark](https://github.com/apache/spark) applications as easy and idiomatic as running other workloads on Kubernetes. It uses\n[Kubernetes custom resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) for specifying, running, and surfacing status of Spark applications.\n\n## Quick Start\n\nFor a more detailed guide, please refer to the [Getting Started guide](https://www.kubeflow.org/docs/components/spark-operator/getting-started/).\n\n```bash\n# Add the Helm repository\nhelm repo add spark-operator https://kubeflow.github.io/spark-operator\nhelm repo update\n\n# Install the operator into the spark-operator namespace and wait for deployments to be ready\nhelm install spark-operator spark-operator/spark-operator \\\n    --namespace spark-operator --create-namespace --wait\n\n# Create an example application in the default namespace\nkubectl apply -f https://raw.githubusercontent.com/kubeflow/spark-operator/refs/heads/master/examples/spark-pi.yaml\n\n# Get the status of the application\nkubectl get sparkapp spark-pi\n```\n\n## Overview\n\nFor a complete reference of the custom resource definitions, please refer to the [API Definition](docs/api-docs.md). For details on its design, please refer to the [Architecture](https://www.kubeflow.org/docs/components/spark-operator/overview/#architecture). It requires Spark 2.3 and above that supports Kubernetes as a native scheduler backend.\n\nThe Kubernetes Operator for Apache Spark currently supports the following list of features:\n\n* Supports Spark 2.3 and up.\n* Enables declarative application specification and management of applications through custom resources.\n* Automatically runs `spark-submit` on behalf of users for each `SparkApplication` eligible for submission.\n* Provides native [cron](https://en.wikipedia.org/wiki/Cron) support for running scheduled applications.\n* Supports customization of Spark pods beyond what Spark natively is able to do through the mutating admission webhook, e.g., mounting ConfigMaps and volumes, and setting pod affinity/anti-affinity.\n* Supports automatic application re-submission for updated `SparkApplication` objects with updated specification.\n* Supports automatic application restart with a configurable restart policy.\n* Supports automatic retries of failed submissions with optional linear back-off.\n* Supports collecting and exporting application-level metrics and driver/executor metrics to Prometheus.\n\n## Project Status\n\n**Project status:** *beta*\n\n**Current API version:** *`v1beta2`*\n\n**If you are currently using the `v1beta1` version of the APIs in your manifests, please update them to use the `v1beta2` version by changing `apiVersion: \"sparkoperator.k8s.io/\u003cversion\u003e\"` to `apiVersion: \"sparkoperator.k8s.io/v1beta2\"`. You will also need to delete the `previous` version of the CustomResourceDefinitions named `sparkapplications.sparkoperator.k8s.io` and `scheduledsparkapplications.sparkoperator.k8s.io`, and replace them with the `v1beta2` version either by installing the latest version of the operator or by running `kubectl create -f config/crd/bases`.**\n\n## Prerequisites\n\n* Version \u003e= 1.13 of Kubernetes to use the [`subresource` support for CustomResourceDefinitions](https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/#subresources), which became beta in 1.13 and is enabled by default in 1.13 and higher.\n\n* Version \u003e= 1.16 of Kubernetes to use the `MutatingWebhook` and `ValidatingWebhook` of `apiVersion: admissionregistration.k8s.io/v1`.\n\n## Getting Started\n\nFor getting started with Spark operator, please refer to [Getting Started](https://www.kubeflow.org/docs/components/spark-operator/getting-started/).\n\n## User Guide\n\nFor detailed user guide and API documentation, please refer to [User Guide](https://www.kubeflow.org/docs/components/spark-operator/user-guide/) and [API Specification](docs/api-docs.md).\n\nIf you are running Spark operator on Google Kubernetes Engine (GKE) and want to use Google Cloud Storage (GCS) and/or BigQuery for reading/writing data, also refer to the [GCP guide](https://www.kubeflow.org/docs/components/spark-operator/user-guide/gcp/).\n\n## Version Matrix\n\nThe following table lists the most recent few versions of the operator.\n\n| Operator Version      | API Version | Kubernetes Version | Base Spark Version |\n|-----------------------|-------------|--------------------|--------------------|\n| `v2.0.x`              | `v1beta2`   | 1.16+              | `3.5.2`            |\n| `v1beta2-1.6.x-3.5.0` | `v1beta2`   | 1.16+              | `3.5.0`            |\n| `v1beta2-1.5.x-3.5.0` | `v1beta2`   | 1.16+              | `3.5.0`            |\n| `v1beta2-1.4.x-3.5.0` | `v1beta2`   | 1.16+              | `3.5.0`            |\n| `v1beta2-1.3.x-3.1.1` | `v1beta2`   | 1.16+              | `3.1.1`            |\n| `v1beta2-1.2.3-3.1.1` | `v1beta2`   | 1.13+              | `3.1.1`            |\n| `v1beta2-1.2.2-3.0.0` | `v1beta2`   | 1.13+              | `3.0.0`            |\n| `v1beta2-1.2.1-3.0.0` | `v1beta2`   | 1.13+              | `3.0.0`            |\n| `v1beta2-1.2.0-3.0.0` | `v1beta2`   | 1.13+              | `3.0.0`            |\n| `v1beta2-1.1.x-2.4.5` | `v1beta2`   | 1.13+              | `2.4.5`            |\n| `v1beta2-1.0.x-2.4.4` | `v1beta2`   | 1.13+              | `2.4.4`            |\n\n## Developer Guide\n\nFor developing with Spark Operator, please refer to [Developer Guide](https://www.kubeflow.org/docs/components/spark-operator/developer-guide/).\n\n## Contributor Guide\n\nFor contributing to Spark Operator, please refer to [Contributor Guide](CONTRIBUTING.md).\n\n## Community\n\n* Join the [CNCF Slack Channel](https://www.kubeflow.org/docs/about/community/#kubeflow-slack-channels) and then join `#kubeflow-spark-operator` Channel.\n* Check out our blog post [Announcing the Kubeflow Spark Operator: Building a Stronger Spark on Kubernetes Community](https://blog.kubeflow.org/operators/2024/04/15/kubeflow-spark-operator.html).\n* Join our monthly community meeting [Kubeflow Spark Operator Meeting Notes](https://bit.ly/3VGzP4n).\n\n## Adopters\n\nCheck out [adopters of Spark Operator](ADOPTERS.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FGoogleCloudPlatform%2Fspark-on-k8s-operator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FGoogleCloudPlatform%2Fspark-on-k8s-operator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FGoogleCloudPlatform%2Fspark-on-k8s-operator/lists"}