{"id":28532136,"url":"https://github.com/databrickslabs/databricks-sync","last_synced_at":"2025-08-02T07:11:04.749Z","repository":{"id":38356266,"uuid":"290599126","full_name":"databrickslabs/databricks-sync","owner":"databrickslabs","description":"An experimental tool to synchronize source Databricks deployment with a target Databricks deployment.","archived":false,"fork":false,"pushed_at":"2024-01-21T19:34:04.000Z","size":4171,"stargazers_count":47,"open_issues_count":23,"forks_count":14,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-07-07T14:42:27.431Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/databrickslabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-08-26T20:35:29.000Z","updated_at":"2025-03-25T19:42:37.000Z","dependencies_parsed_at":"2023-02-12T23:30:37.721Z","dependency_job_id":null,"html_url":"https://github.com/databrickslabs/databricks-sync","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/databrickslabs/databricks-sync","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databrickslabs%2Fdatabricks-sync","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databrickslabs%2Fdatabricks-sync/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databrickslabs%2Fdatabricks-sync/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databrickslabs%2Fdatabricks-sync/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/databrickslabs","download_url":"https://codeload.github.com/databrickslabs/databricks-sync/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databrickslabs%2Fdatabricks-sync/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268348168,"owners_count":24236294,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-02T02:00:12.353Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-09T15:31:06.191Z","updated_at":"2025-08-02T07:11:04.735Z","avatar_url":"https://github.com/databrickslabs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Databricks Sync (dbSync)\n\n\u003e **NOTE:** For a more extensive and maintained cross-workload migration solution, please use the [Databricks Terraform Exporter](https://registry.terraform.io/providers/databricks/databricks/latest/docs/guides/experimental-exporter), which creates Infrastructure-as-a-Code replicas for the entire manually-configured Databricks Workspaces.\n\n![Reference Architecture for Databricks-Sync](docs/solution-arch.png?raw=true)\n\n## Introduction\n\nDatabricks Sync is an object synchronization tool to backup, restore, and sync Databricks workspaces.\n\nThis package uses credentials from the [Databricks CLI](https://docs.databricks.com/user-guide/dev-tools/databricks-cli.html)\n\n## Table of Contents\n\n1. [Introduction](https://github.com/databrickslabs/databricks-sync#Introduction)\n2. [Documentation](https://github.com/databrickslabs/databricks-sync/blob/master/docs)\n   * [Setup](https://github.com/databrickslabs/databricks-sync/blob/master/docs/setup.md)\n   * [Usage](https://github.com/databrickslabs/databricks-sync/blob/master/docs/usage.md)\n   * [FAQ](https://github.com/databrickslabs/databricks-sync/blob/master/docs/faq.md)\n   * [Contributing](https://github.com/databrickslabs/databricks-sync/blob/master/docs/contributing.md)\n3. [Quickstart](https://github.com/databrickslabs/databricks-sync#Quickstart)\n   * [Next Steps](https://github.com/databrickslabs/databricks-sync#next-steps)\n   * [Common Commands](https://github.com/databrickslabs/databricks-sync#common-commands)\n   * [Backend Instructions](https://github.com/databrickslabs/databricks-sync#backend-instructions-storing-terraform-state-in-azure-blob-or-aws-s3)\n   * [Docker Instructions](https://github.com/databrickslabs/databricks-sync#docker-instructions)\n   * [Aliasing](https://github.com/databrickslabs/databricks-sync#aliasing)\n   * [Support Matrix for Import and Export Operations](https://github.com/databrickslabs/databricks-sync#support-matrix-for-import-and-export-operations)\n4. [Project Support](https://github.com/databrickslabs/databricks-sync#project-support)\n5. [Building the Project](https://github.com/databrickslabs/databricks-sync#building-the-project)\n6. [Deploying / Installing the Project](https://github.com/databrickslabs/databricks-sync#deploying--installing-the-project)\n7. [Releasing the Project](https://github.com/databrickslabs/databricks-sync#releasing-the-project)\n8. [Using the Project](https://github.com/databrickslabs/databricks-sync#using-the-project)\n\n## Documentation\n\nSee the [Databricks Sync Documentation](https://github.com/databrickslabs/databricks-sync/blob/master/docs) for information.\n\n[Instructions to install Databricks Sync](https://github.com/databrickslabs/databricks-sync/blob/master/docs/setup.md) can be found here.\n\n## Quickstart\n\n### Next steps:\n* Configure [YAML file](https://github.com/databrickslabs/databricks-sync/blob/master/tests/integration_test.yaml).\n* Export object permissions and import them to the target with the object\n* Add examples for different scenarios:\n  * Backup and Restore\n  * Disaster Recovery Sync\n  * Batch modification (will require Terraform Object Import support)\n\n\n### Common commands\n\n```bash\n$ databricks-sync init my-export-config\n\n$ databricks-sync  export \\\n    --profile \u003cdb cli profile\u003e \\\n    --git-ssh-url git@github.com:..../.....git \\\n    -c ....test.yaml\n\noptional flags:\n    -v DEBUG\n    --dry-run\n    --dask\n    --branch # support new main name convention\n\n$ GIT_PYTHON_TRACE=full databricks-sync import \\\n    -g git@github.com:.../....git \\\n    --profile dr_tagert \\\n    --databricks-object-type cluster_policy \\\n    --artifact-dir ..../dir \\\n    --plan \\\n    --skip-refresh \\\n    --revision ....\n```\n\nControl the databricks provider version by using:\n\n```\nexport DATABRICKS_TERRAFORM_PROVIDER_VERSION=\"\u003cversion here\u003e\"\n```\n\n### Backend Instructions (Storing terraform state in azure blob or aws s3)\n\nWhen importing you are able to store and manage your state using blob or s3. You can do this by using the `--backend-file`.\nThis `--backend-file` will take a file path to the back end file. You can name the file `backend.tf`. This backend file will use\nazure blob or aws s3 to manage the state file. To authenticate to either you will use environment variables.\n\nPlease use `ARM_SAS_TOKEN` or `ARM_ACCESS_KEY` for sas token and account access key respectively for azure blob.\nPlease use `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` for the key and secret for the s3 bucket. Please go the following links in\nregards to policies and permissions. If you want to make the region dynamic you can use `AWS_DEFAULT_REGION`.\n\n1. Storing state in aws s3: https://www.terraform.io/docs/backends/types/s3.html\n2. Storing state in azure blob (only azure blob is support as it supports locking): https://www.terraform.io/docs/backends/types/azurerm.html\n\n### Docker instructions\n\nThese set of instructions are to use docker to build and use the CLI. It avoids the need to have golang,\nTerraform, the databricks-terraform-provider to get this to run. If you do want to work on this tool please\ninstall the prior listed tools to get this to work.\n\nTo install this tool please run the following command:\n\n```bash\n$ docker build -t databricks-sync:latest .\n```\n\n\n#### Aliasing\n\nHow our alias command works:\n\nThis script creates 3 volume mounts with docker of which two are read only.\n1. We mount `$PWD` or present working directory to `/usr/src/databricks-sync` as that is the working directory.\nThis allows you to manipulate files in the local working directory on your host machine.\n2. The second mount is mounting `~/.databrickscfg` to `/root` as that is the home directory of the container.\nThis mount is read only.\n3. The third mount is mounting `~/.ssh` folder to the `/root/.ssh` folder. This is so the script can fetch your\nprivate keys in a read only fashion for accessing the git repository. This is also a read only mount.\n\n```bash\nalias dbt='docker run -it --rm --name docker-databricks-sync --env-file \u003c(env | grep -e \"[ARM|TF_VAR]\") -v \"$PWD\":/usr/src/databricks-sync -v ~/.databrickscfg:/root/.databrickscfg:ro -v ~/.ssh:/root/.ssh:ro -w /usr/src/databricks-sync databricks-sync'\n```\n\n### Support Matrix for Import and Export Operations:\n\n| Component              | Export to HCL | Import to Workspace |Comments     |\n|------------------------|---------------|---------------------|-------------|\n|                        | **User Objects** |\n| cluster policy         | ✅           |  ✅              | |\n| cluster                |  ✅            | ✅               | |\n| dbfs file              |  ✅           |  ✅              | |\n| instance pool          |  ✅           |  ✅              | |\n| instance profile       |  ✅           |  ✅              | |\n| job                    |  ✅           |  ✅               | |\n| multi task job         |  ⬜           |  ⬜               | |\n| repos                  |  ⬜           |  ⬜               | |\n| notebook               |  ✅           |  ✅              | |\n|                        | **Administrator Setup** |\n| aws s3 mount           | ⬜️            | ⬜️               | |\n| azure adls gen1 mount  | ⬜️            | ⬜️               | |\n| azure adls gen2 mount  | ⬜️            | ⬜️               | |\n| azure blob mount       | ⬜️            | ⬜️               | |\n| secret                 |  ✅           |  ✅               | |\n| secret acl             |  ✅           |  ✅              | |\n| secret scope           |  ✅           |  ✅              | |\n| metastore tables       | ⬜️            | ⬜️               | |\n| metastore table ACLs   | ⬜️            | ⬜️               | |\n|                        | **Users Management** |\n| group                  |  ✅            |  ✅               | |\n| group instance profile |  ✅            |  ✅               | |\n| group member           |  ✅            |  ✅               | |\n| scim user              |  ✅            |  ✅               | |\n\n## Project Support\nPlease note that all projects in the /databrickslabs github account are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs).  They are provided AS-IS and we do not make any guarantees of any kind.  Please do not submit a support ticket relating to any issues arising from the use of these projects.\n\nAny issues discovered through the use of this project should be filed as GitHub Issues on the Repo.  They will be reviewed as time permits, but there are no formal SLAs for support.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabrickslabs%2Fdatabricks-sync","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatabrickslabs%2Fdatabricks-sync","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabrickslabs%2Fdatabricks-sync/lists"}