{"id":22017836,"url":"https://github.com/shoprunner/stork","last_synced_at":"2025-05-07T03:10:45.983Z","repository":{"id":48845976,"uuid":"146018905","full_name":"ShopRunner/stork","owner":"ShopRunner","description":"Make your libraries magically appear in Databricks.","archived":false,"fork":false,"pushed_at":"2023-07-25T17:04:17.000Z","size":123,"stargazers_count":47,"open_issues_count":11,"forks_count":19,"subscribers_count":50,"default_branch":"main","last_synced_at":"2025-04-19T00:28:19.075Z","etag":null,"topics":["datascience"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ShopRunner.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE-OF-CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null}},"created_at":"2018-08-24T17:07:03.000Z","updated_at":"2024-10-03T20:47:08.000Z","dependencies_parsed_at":"2023-09-27T09:12:46.559Z","dependency_job_id":null,"html_url":"https://github.com/ShopRunner/stork","commit_stats":{"total_commits":101,"total_committers":6,"mean_commits":"16.833333333333332","dds":0.2376237623762376,"last_synced_commit":"df49a593f0272fa1bb882585335ceb1fa363d15b"},"previous_names":["shoprunner/apparate"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShopRunner%2Fstork","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShopRunner%2Fstork/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShopRunner%2Fstork/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShopRunner%2Fstork/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ShopRunner","download_url":"https://codeload.github.com/ShopRunner/stork/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252804219,"owners_count":21806771,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["datascience"],"created_at":"2024-11-30T05:08:19.692Z","updated_at":"2025-05-07T03:10:45.966Z","avatar_url":"https://github.com/ShopRunner.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Stork\nCommand line helpers for Databricks!\n\n[![PyPI version](https://badge.fury.io/py/stork.svg)](https://badge.fury.io/py/stork)\n[![Python package](https://github.com/ShopRunner/stork/workflows/Python%20package/badge.svg)](https://github.com/ShopRunner/stork/actions/workflows/prod.yaml)\n[![Documentation Status](https://readthedocs.org/projects/stork-library/badge/?version=latest)](https://stork-library.readthedocs.io/en/latest/?badge=latest)\n\n## Maintenance Note\n⚠️  [2021/07/08] After recent updates to the DataBricks platform it is now possible to install jars and wheels from internal repositories (such as an artifactory instance). We recommend this approach moving forward, since it allows more standard version management and wheels have several advantages over eggs for python libraries. Stork currently still works, but the library management does rely on a deprecated API and thus may break at some point in the future and we will likely not attempt to fix it at that point in time.\n\n\n## Why we built this\n\nWhen our team started setting up CI/CD for the various packages we maintain, we encountered some difficulties integrating Jenkins with Databricks.\n\nWe write a lot of Python + PySpark packages in our data science work, and we often deploy these as batch jobs run on a schedule using Databricks. However, each time we merged in a new change to one of these libraries we would have to manually create an egg, upload it using the Databricks GUI, go find all the jobs that used the library, and update each one to point to the new job. As our team and set of libraries and jobs grew, this became unsustainable (not to mention a big break from the CI/CD philosophy...).\n\nAs we set out to automate this using Databrick's library API, we realized that this task required using two versions of the API and many dependant API calls. Instead of trying to recreate that logic in each Jenkinsfile, we wrote stork. Now you can enjoy the magic as well!\n\nStork now works for both `.egg` and `.jar` files to support Python + PySpark and Scala + Spark libaries.\nTake advantage of stork's ability to update jobs, make sure you're following one of the following naming conventions:\n```\nnew_library-1.0.0-py3.6.egg\nnew_library-1.0.0-SNAPSHOT-py3.6.egg\nnew_library-1.0.0-SNAPSHOT-my-branch-py3.6.egg\nnew_library-1.0.0.egg\nnew_library-1.0.0-SNAPSHOT.egg\nnew_library-1.0.0-SNAPSHOT-my-branch.egg\nnew_library-1.0.0.jar\nnew_library-1.0.0-SNAPSHOT.jar\nnew_library-1.0.0-SNAPSHOT-my-branch.jar\n```\nWhere the first number in the version (in this case `1`) is a major version signaling breaking changes.\n\n## What it does\n\nStork is a set of command line helpers for Databricks. Some commands are for managing libraries in Databricks in an automated fashion. This allows you to move away from the point-and-click interface for your development work and for deploying production-level libraries for use in scheduled Databricks jobs. Another command allows you to create an interactive cluster that replicates the settings used on a job cluster.\n\nFor a more detailed API and tutorials, check out the [docs](https://stork-library.readthedocs.io/en/latest/index.html).\n\n## Installation\n\nNote: stork requires python3, and currently only works on Databricks accounts that run AWS (not Azure)\n\nStork is hosted on PyPi, so to get the latest version simply install via pip:\n```\npip install stork\n```\n\nYou can also install from source, by cloning the git repository https://github.com/ShopRunner/stork.git and installing via easy_install:\n```\ngit clone https://github.com/ShopRunner/stork.git\ncd stork\neasy_install .\n```\n\n## Setup\n\n### Configuration\n\nStork uses a `.storkcfg` to store information about your Databricks account and setup. To create this file, run:\n```\nstork configure\n```\n\nYou will be asked for your Databricks host name (the url you use to access the account - something like `https://my-organization.cloud.databricks.com`), an access token, and your production folder. This should be a folder your team creates to keep production-ready libraries. By isolating production-ready libraries in their own folder, you ensure that stork will never update a job to use a library still in development/testing.\n\n### Databricks API token\n\nThe API tokens can be generated in Databricks under Account Settings -\u003e Access Tokens. To upload an egg to any folder in Databricks, you can use any token. To update jobs, you will need a token with admin permissions, which can be created in the same manner by an admin on the account.\n\n## Usage notes\n\nWhile libraries can be uploaded to folders other than your specified production library, no libraries outside of this folder will ever be deleted and no jobs using libraries outside of this folder will be updated.\n\nIf you try to upload a library to Databricks that already exists there with the same version, a warning will be printed instructing the user to update the version if a change has been made. Without a version change the new library will not be uploaded.\n\n## Contributing\nSee a way for stork to improve? We welcome contributions in the form of issues or pull requests!\n\nPlease check out the [contributing](https://stork-library.readthedocs.io/en/latest/contrib.html) page for more information.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshoprunner%2Fstork","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshoprunner%2Fstork","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshoprunner%2Fstork/lists"}