An open API service indexing awesome lists of open source software.

https://github.com/dan1elt0m/unitycatalog-migrate

Migrate Databricks Unity Catalog to OSS Unity Catalog
https://github.com/dan1elt0m/unitycatalog-migrate

cli databricks migration migration-tool unitycatalog

Last synced: 3 months ago
JSON representation

Migrate Databricks Unity Catalog to OSS Unity Catalog

Awesome Lists containing this project

README

          

[![test](https://github.com/dan1elt0m/unitycatalog-migrate/actions/workflows/test.yml/badge.svg)](https://github.com/dan1elt0m/unitycatalog-migrate/actions/workflows/test.yml)
![Python Version from PEP 621 TOML](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2Fdan1elt0m%2Funitycatalog-migrate%2Fmain%2Fpyproject.toml)


๐Ÿš€ UCM - OSS Unity Catalog Migration Assistant

UCM is a CLI that provides commands for migrating catalogs, schemas, and tables from Databricks to Unity Catalog.

## Requirements

- Python 3.9 or higher
- Databricks configuration file: https://docs.databricks.com/en/dev-tools/auth/config-profiles.html

## Installation

To install the Unity Catalog Migrator, you can use the following commands:

```shell
pipx install unitycatalog-migrate
```

## Usage

### Migrate Catalogs
```shell
ucm migrate-catalog NAMES... --profile
```

### Migrate Schemas
```shell
ucm migrate-schema FULL_NAMES... --profile
```
where FULL_NAMES are in the format `catalog.schema`

### Migrate Tables
```shell
ucm migrate-table FULL_NAMES.. --profile
```
where FULL_NAMES are in the format `catalog.schema.table`

## Configuration

The Unity Catalog Migrator uses the following environment variables:
- UC_HOST_URL: The URL of the Unity Catalog server. Default is `http://localhost:8080/api/2.1/unity-catalog`.
- UC_TOKEN: The token to authenticate with the Unity Catalog server. Default is `None`. You can also pass the token as
an argument to the command.

## Example
```shell
# First migrate catalog and schema
ucm migrate-catalog catalog1 --profile DATABRICKS_TEST
ucm migrate-schema catalog1.schema1 --profile DATABRICKS_TEST

# Use Databricks CLI to get all tables of schema
table_names=$(databricks tables list catalog1 schema1 --profile DATABRICKS_TEST | awk 'NR>1 {print $1}' | paste -sd ' ' -)
# Migrate the tables to Unity Catalog using
echo $table_names | xargs ucm migrate-table --profile DATABRICKS_TEST

[13:45:46] [ Success ] catalog1.schema1.table1
[ Success ] catalog1.schema1.table2
[ Success ] catalog1.schema1.table3
[ Success ] catalog1.schema1.table4
[ Success ] catalog1.schema1.table5
[ Skipped ] catalog1.schema1.table6
[ Success ] catalog1.schema1.table7
[ Failed ] catalog1.schema1.table8
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Result โ”ƒ Count โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ Successfully migrated โ”‚ 6 โ”‚
โ”‚ Skipped โ”‚ 1 โ”‚
โ”‚ Errors โ”‚ 1 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Skipped โ”ƒ Reason โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ catalog1.schema1.table6 โ”‚ Table already Exists โ”ƒ
โ”‚ โ”‚ โ”ƒ
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Error โ”ƒ Reason โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ catalog1.schema1.table8 โ”‚ Variant Type not supported โ”ƒ
โ”‚ โ”‚ โ”ƒ
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

```

### Remarks
- ucm skips already existing catalogs, schemas or tables (if desired, we can add a force option)

### Not supported:
- system tables
- Variant datatype

### Wishlist
- migrate functions command
- migrate models command
- update existing tables (sync)

## Contributing
- Contributions are welcome. Please fork and make a PR and I'll take a look asap.
- Star the repo if you like it.