{"id":19558493,"url":"https://github.com/mach-kernel/databricks-kube-operator","last_synced_at":"2025-04-26T23:31:59.261Z","repository":{"id":62960870,"uuid":"559639650","full_name":"mach-kernel/databricks-kube-operator","owner":"mach-kernel","description":"A Kubernetes operator to enable GitOps style deploys for Databricks resources","archived":false,"fork":false,"pushed_at":"2024-09-04T19:39:27.000Z","size":1608,"stargazers_count":17,"open_issues_count":3,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-04T18:21:59.470Z","etag":null,"topics":["ci","cicd","databricks","gitops","helm","kubernetes","operators","rust","spark"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mach-kernel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-30T18:25:49.000Z","updated_at":"2024-12-18T19:18:34.000Z","dependencies_parsed_at":"2024-01-29T15:43:39.041Z","dependency_job_id":"336074f7-c815-47ed-95e1-9e33af87d359","html_url":"https://github.com/mach-kernel/databricks-kube-operator","commit_stats":null,"previous_names":[],"tags_count":47,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mach-kernel%2Fdatabricks-kube-operator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mach-kernel%2Fdatabricks-kube-operator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mach-kernel%2Fdatabricks-kube-operator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mach-kernel%2Fdatabricks-kube-operator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mach-kernel","download_url":"https://codeload.github.com/mach-kernel/databricks-kube-operator/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251068039,"owners_count":21531475,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ci","cicd","databricks","gitops","helm","kubernetes","operators","rust","spark"],"created_at":"2024-11-11T04:47:17.921Z","updated_at":"2025-04-26T23:31:58.812Z","avatar_url":"https://github.com/mach-kernel.png","language":"Rust","readme":"---\ndescription: A Kubernetes operator for Databricks\ncoverY: 0\n---\n\n# 🦀 databricks-kube-operator\n\n[![Rust](https://github.com/mach-kernel/databricks-kube-operator/actions/workflows/rust.yml/badge.svg?branch=master)](https://github.com/mach-kernel/databricks-kube-operator/actions/workflows/rust.yml)\n[![FOSSA Status](https://app.fossa.com/api/projects/custom%2B34302%2Fgithub.com%2Fmach-kernel%2Fdatabricks-kube-operator.svg?type=shield)](https://app.fossa.com/projects/custom%2B34302%2Fgithub.com%2Fmach-kernel%2Fdatabricks-kube-operator?ref=badge_shield)\n\nA [kube-rs](https://kube.rs/) operator to enable GitOps style management of Databricks resources. It supports the following APIs:\n\n| API                 | CRD                                     |\n| ------------------- | --------------------------------------- |\n| Jobs 2.1            | DatabricksJob                           |\n| Git Credentials 2.0 | GitCredential                           |\n| Repos 2.0           | Repo                                    |\n| Secrets 2.0         | DatabricksSecretScope, DatabricksSecret |\n\nExperimental headed towards stable. See the GitHub project board for the roadmap. Contributions and feedback are welcome!\n\n[Read the docs](https://databricks-kube-operator.gitbook.io/doc)\n\n## Quick Start\n\nLooking for a more in-depth example? Read the [tutorial](tutorial.md).\n\n### Installation\n\nAdd the Helm repository and install the chart:\n\n```bash\nhelm repo add mach https://mach-kernel.github.io/databricks-kube-operator\nhelm install databricks-kube-operator mach/databricks-kube-operator\n```\n\nCreate a config map in the same namespace as the operator. To override the configmap name, `--set configMapName=my-custom-name`:\n\n```bash\ncat \u003c\u003cEOF | kubectl apply -f -\napiVersion: v1\nkind: ConfigMap\nmetadata:\n  name: databricks-kube-operator\ndata:\n  api_secret_name: databricks-api-secret\nEOF\n```\n\nCreate a secret with your API URL and credentials:\n\n```bash\ncat \u003c\u003cEOF | kubectl apply -f -\napiVersion: v1\ndata:\n  access_token: $(echo -n 'shhhh' | base64)\n  databricks_url: $(echo -n 'https://my-tenant.cloud.databricks.com/api' | base64)\nkind: Secret\nmetadata:\n  name: databricks-api-secret\ntype: Opaque\nEOF\n```\n\n### Usage\n\nSee the examples directory for samples of Databricks CRDs. Resources that are created via Kubernetes are owned by the operator: your checked-in manifests are the source of truth.\n\n```yaml\napiVersion: com.dstancu.databricks/v1\nkind: DatabricksJob\nmetadata:\n  name: my-word-count\n  namespace: default\nspec:\n  job:\n    settings:\n      email_notifications:\n        no_alert_for_skipped_runs: false\n      format: MULTI_TASK\n      job_clusters:\n      - job_cluster_key: word-count-cluster\n        new_cluster:\n          ...\n      max_concurrent_runs: 1\n      name: my-word-count\n      git_source:\n        git_branch: misc-and-docs\n        git_provider: gitHub\n        git_url: https://github.com/mach-kernel/databricks-kube-operator\n      tasks:\n      - email_notifications: {}\n        job_cluster_key: word-count-cluster\n        notebook_task:\n          notebook_path: examples/job.py\n          source: GIT\n        task_key: my-word-count\n        timeout_seconds: 0\n      timeout_seconds: 0\n```\n\nChanges made by users in the Databricks webapp will be overwritten by the operator if drift is detected:\n\n```\n[2024-01-11T14:20:40Z INFO  databricks_kube::traits::remote_api_resource] Resource DatabricksJob my-word-count drifted!\n    Diff (remote, kube):\n    json atoms at path \".settings.tasks[0].notebook_task.notebook_path\" are not equal:\n        lhs:\n            \"examples/job_oops_is_this_right.py\"\n        rhs:\n            \"examples/job.py\"\n[2024-01-11T14:20:40Z INFO  databricks_kube::traits::remote_api_resource] Resource DatabricksJob my-word-count reconciling drift...\n```\n\nLook at jobs (allowed to be viewed by the operator's access token):\n\n```bash\n$ kubectl get databricksjobs\nNAME                                 STATUS\ncontoso-ingest-qa                      RUNNING\ncontoso-ingest-staging                 INTERNAL_ERROR\ncontoso-stats-qa                       TERMINATED\ncontoso-stats-staging                  NO_RUNS\n\n$ kubectl describe databricksjob contoso-ingest-qa\n...\n```\n\nA job's status key surfaces API information about the latest [run](https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsRunsList). The status is polled every 60s:\n\n```bash\n$ kubectl get databricksjob contoso-ingest-staging -ojson | jq .status\n{\n  \"latest_run_state\": {\n    \"life_cycle_state\": \"INTERNAL_ERROR\",\n    \"result_state\": \"FAILED\",\n    \"state_message\": \"Task contoso-ingest-staging failed. This caused all downstream tasks to get skipped.\",\n    \"user_cancelled_or_timedout\": false\n  }\n}\n```\n\n## Developers\n\nBegin by creating the configmap as per the Helm instructions.\n\nGenerate and install the CRDs by running the `crd_gen` bin target:\n\n```bash\ncargo run --bin crd_gen | kubectl apply -f -\n```\n\nThe quickest way to test the operator is with a working [minikube](https://minikube.sigs.k8s.io/docs/start/) cluster:\n\n```bash\nminikube start\nminikube tunnel \u0026\n```\n\n```bash\nexport RUST_LOG=databricks_kube\ncargo run\n[2022-11-02T18:56:25Z INFO  databricks_kube] boot! (build: df7e26b-modified)\n[2022-11-02T18:56:25Z INFO  databricks_kube::context] Waiting for CRD: databricksjobs.com.dstancu.databricks\n[2022-11-02T18:56:25Z INFO  databricks_kube::context] Waiting for CRD: gitcredentials.com.dstancu.databricks\n[2022-11-02T18:56:25Z INFO  databricks_kube::context] Waiting for settings in config map: databricks-kube-operator\n[2022-11-02T18:56:25Z INFO  databricks_kube::context] Found config map\n[2022-11-02T18:56:25Z INFO  databricks_kube::traits::synced_api_resource] Looking for uningested GitCredential(s)\n[2022-11-02T18:56:25Z INFO  databricks_kube::traits::synced_api_resource] Looking for uningested DatabricksJob(s)\n```\n\n### Generating API Clients\n\nThe client is generated by `openapi-generator` and then lightly postprocessed so we get models that derive [`JsonSchema`](https://github.com/GREsau/schemars#basic-usage) and fix some bugs.\n\n\u003cdetails\u003e\n  \u003csummary\u003e TODO: Manual client 'fixes' \u003c/summary\u003e\n\n  ```bash\n  # Hey!! This uses GNU sed\n  # brew install gnu-sed\n\n  # Jobs API\n  openapi-generator generate -g rust -i openapi/jobs-2.1-aws.yaml -c openapi/config-jobs.yaml -o dbr_jobs\n\n  # Derive JsonSchema for all models and add schemars as dep\n  gsed -i -e 's/derive(Clone/derive(JsonSchema, Clone/' dbr_jobs/src/models/*\n  gsed -i -e 's/\\/\\*/use schemars::JsonSchema;\\n\\/\\*/' dbr_jobs/src/models/*\n  gsed -r -i -e 's/(\\[dependencies\\])/\\1\\nschemars = \"0.8.11\"/' dbr_jobs/Cargo.toml\n\n  # Missing import?\n  gsed -r -i -e 's/(use reqwest;)/\\1\\nuse crate::models::ViewsToExport;/' dbr_jobs/src/apis/default_api.rs\n\n  # Git Credentials API\n  openapi-generator generate -g rust -i openapi/gitcredentials-2.0-aws.yaml -c openapi/config-git.yaml -o dbr_git_creds\n\n  # Derive JsonSchema for all models and add schemars as dep\n  gsed -i -e 's/derive(Clone/derive(JsonSchema, Clone/' dbr_git_creds/src/models/*\n  gsed -i -e 's/\\/\\*/use schemars::JsonSchema;\\n\\/\\*/' dbr_git_creds/src/models/*\n  gsed -r -i -e 's/(\\[dependencies\\])/\\1\\nschemars = \"0.8.11\"/' dbr_git_creds/Cargo.toml\n\n  # Repos API\n  openapi-generator generate -g rust -i openapi/repos-2.0-aws.yaml -c openapi/config-repos.yaml -o dbr_repo\n\n  # Derive JsonSchema for all models and add schemars as dep\n  gsed -i -e 's/derive(Clone/derive(JsonSchema, Clone/' dbr_repo/src/models/*\n  gsed -i -e 's/\\/\\*/use schemars::JsonSchema;\\n\\/\\*/' dbr_repo/src/models/*\n  gsed -r -i -e 's/(\\[dependencies\\])/\\1\\nschemars = \"0.8.11\"/' dbr_repo/Cargo.toml\n\n  # Secrets API\n  openapi-generator generate -g rust -i openapi/secrets-aws.yaml -c openapi/config-secrets.yaml -o dbr_secrets\n  sed -i -e 's/derive(Clone/derive(JsonSchema, Clone/' dbr_secrets/src/models/*\n  sed -i -e 's/\\/\\*/use schemars::JsonSchema;\\n\\/\\*/' dbr_secrets/src/models/*\n  sed -r -i -e 's/(\\[dependencies\\])/\\1\\nschemars = \"0.8.11\"/' dbr_secrets/Cargo.toml\n  ```\n\u003c/details\u003e\n\n\n### Expand CRD macros\n\nDeriving `CustomResource` uses macros to generate another struct. For this example, the output struct name would be `DatabricksJob`:\n\n```rust\n#[derive(Clone, CustomResource, Debug, Default, Deserialize, PartialEq, Serialize, JsonSchema)]\n#[kube(\n    group = \"com.dstancu.databricks\",\n    version = \"v1\",\n    kind = \"DatabricksJob\",\n    derive = \"Default\",\n    namespaced\n)]\npub struct DatabricksJobSpec {\n    pub job: Job,\n}\n```\n\n`rust-analyzer` shows squiggles when you `use crds::databricks_job::DatabricksJob`, but one may want to look inside. To see what is generated with [cargo-expand](https://github.com/dtolnay/cargo-expand):\n\n```bash\nrustup default nightly\ncargo expand --bin databricks_kube\n```\n\n### Adding a new CRD\n\nWant to add support for a new API? Provided it has an OpenAPI definition, these are the steps. Look for existing examples in the codebase:\n\n* Download API definition into `openapi/` and make a [Rust generator configuration](https://openapi-generator.tech/docs/generators/rust/) (feel free to copy the others and change name)\n* Generate the SDK, add it to the Cargo workspace and dependencies for `databricks-kube/`\n* Implement `RestConfig\u003cTSDKConfig\u003e` for your new client\n* Define the new CRD Spec type ([follow kube-rs tutorial](https://kube.rs/getting-started/))\n* `impl RemoteAPIResource\u003cTAPIResource\u003e for MyNewCRD`\n* `impl StatusAPIResource\u003cTStatusType\u003e for MyNewCRD` and [specify `TStatusType` in your CRD](https://github.com/kube-rs/kube/blob/main/examples/crd_derive.rs#L20)\n* Add the new resource to the context ensure CRDs condition\n* Add the new resource to `crdgen.rs`\n\n### Running tests\n\nTests must be run with a single thread since we use a stateful singleton to 'mock' the state of a remote API. Eventually it would be nice to have integration tests targetting Databricks.\n\n```bash\n$ cargo test -- --test-threads=1\n```\n\n## License\n\n[![FOSSA Status](https://app.fossa.com/api/projects/custom%2B34302%2Fgithub.com%2Fmach-kernel%2Fdatabricks-kube-operator.svg?type=large)](https://app.fossa.com/projects/custom%2B34302%2Fgithub.com%2Fmach-kernel%2Fdatabricks-kube-operator?ref=badge_large)","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmach-kernel%2Fdatabricks-kube-operator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmach-kernel%2Fdatabricks-kube-operator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmach-kernel%2Fdatabricks-kube-operator/lists"}