{"id":19643795,"url":"https://github.com/logicalclocks/databricks_bundles","last_synced_at":"2026-06-12T19:33:11.346Z","repository":{"id":210051568,"uuid":"725599363","full_name":"logicalclocks/databricks_bundles","owner":"logicalclocks","description":"This project showcases how the Hopsworks Feature Store can be integrated with Databricks bundles","archived":false,"fork":false,"pushed_at":"2023-11-30T13:56:17.000Z","size":343,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-02-26T23:37:55.035Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/logicalclocks.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-11-30T13:39:05.000Z","updated_at":"2023-11-30T14:03:01.000Z","dependencies_parsed_at":"2023-11-30T14:49:22.868Z","dependency_job_id":null,"html_url":"https://github.com/logicalclocks/databricks_bundles","commit_stats":null,"previous_names":["logicalclocks/databricks_bundles"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/logicalclocks/databricks_bundles","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/logicalclocks%2Fdatabricks_bundles","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/logicalclocks%2Fdatabricks_bundles/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/logicalclocks%2Fdatabricks_bundles/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/logicalclocks%2Fdatabricks_bundles/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/logicalclocks","download_url":"https://codeload.github.com/logicalclocks/databricks_bundles/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/logicalclocks%2Fdatabricks_bundles/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34260309,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-12T02:00:06.859Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-11T14:24:03.887Z","updated_at":"2026-06-12T19:33:11.327Z","avatar_url":"https://github.com/logicalclocks.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# databricks_bundles\n\nThe 'databricks_bundles' project was generated by using the default-python template.\n\n### Prerequisites\n\nThis guide assumes that necessary resources such as jars, client certificates and an init script is already uploaded to dbfs and accessible by the workspace running your code.\nIn order to upload the required files a user can do this step https://docs.hopsworks.ai/3.5/user_guides/integrations/databricks/configuration/#configure-a-cluster.\n\n## Hopsworks integration steps\n\n1. Configure the `job_clusters` in your `resources/*.yml` to include `spark_conf`, `init_scripts` and `custom_tags` such as shown here: https://github.com/logicalclocks/databricks_bundles/blob/6713af30c0ec916e99bb76d273ec695ddcf226ed/resources/databricks_bundles_job.yml#L28\n\n   The values to set can be extracted looking at the json configuration for a cluster already configured by Hopsworks.\n\n![alt text](images/json_view.png)\n\n\n2. Configure the `tasks` in your `resources/*.yml` to include `libraries` such as shown here: https://github.com/logicalclocks/databricks_bundles/blob/6713af30c0ec916e99bb76d273ec695ddcf226ed/resources/databricks_bundles_job.yml#L20\n\n\n3. In order to connect to Hopsworks Feature Store, import the hsfs library and supply connection parameters the same way as the first cell here: https://github.com/logicalclocks/databricks_bundles/blob/main/src/1_feature_pipeline.ipynb\n\n4. Validate the configuration\n   ```\n   $ databricks bundle validate\n   ```\n\n\n5. To deploy a development copy of this project, type:\n    ```\n    $ databricks bundle deploy --target dev\n    ```\n    (Note that \"dev\" is the default target, so the `--target` parameter\n    is optional here.)\n\n    This deploys everything that's defined for this project.\n    For example, the default template would deploy a job called\n    `[dev yourname] databricks_bundles_job` to your workspace.\n    You can find that job by opening your workpace and clicking on **Workflows**.\n\n\n6. Similarly, to deploy a production copy, type:\n   ```\n   $ databricks bundle deploy --target prod\n   ```\n\n7. To run the job, use the \"run\" command:\n   ```\n   $ databricks bundle run\n   ```\n   And then select the databricks_bundles_job\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flogicalclocks%2Fdatabricks_bundles","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flogicalclocks%2Fdatabricks_bundles","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flogicalclocks%2Fdatabricks_bundles/lists"}