{"id":13678328,"url":"https://github.com/databrickslabs/dlt-meta","last_synced_at":"2025-04-29T13:30:34.523Z","repository":{"id":155789983,"uuid":"596753004","full_name":"databrickslabs/dlt-meta","owner":"databrickslabs","description":"Metadata driven Databricks Delta Live Tables framework for bronze/silver pipelines","archived":false,"fork":false,"pushed_at":"2025-04-21T22:18:41.000Z","size":22796,"stargazers_count":185,"open_issues_count":21,"forks_count":84,"subscribers_count":25,"default_branch":"main","last_synced_at":"2025-04-21T23:26:48.968Z","etag":null,"topics":["databricks","dlt","meta-programming","python"],"latest_commit_sha":null,"homepage":"https://databrickslabs.github.io/dlt-meta/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/databrickslabs.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"authors.txt","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-02T21:24:43.000Z","updated_at":"2025-04-14T02:07:22.000Z","dependencies_parsed_at":"2024-05-08T23:23:53.648Z","dependency_job_id":"e19ecdd0-c1e0-400c-9e42-49099954d729","html_url":"https://github.com/databrickslabs/dlt-meta","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databrickslabs%2Fdlt-meta","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databrickslabs%2Fdlt-meta/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databrickslabs%2Fdlt-meta/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databrickslabs%2Fdlt-meta/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/databrickslabs","download_url":"https://codeload.github.com/databrickslabs/dlt-meta/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251509357,"owners_count":21600621,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["databricks","dlt","meta-programming","python"],"created_at":"2024-08-02T13:00:52.435Z","updated_at":"2025-04-29T13:30:34.514Z","avatar_url":"https://github.com/databrickslabs.png","language":"Python","readme":"# DLT-META\n\n\u003c!-- Top bar will be removed from PyPi packaged versions --\u003e\n\u003c!-- Dont remove: exclude package --\u003e\n\n[Documentation](https://databrickslabs.github.io/dlt-meta/) |\n[Release Notes](CHANGELOG.md) |\n[Examples](https://github.com/databrickslabs/dlt-meta/tree/main/examples)\n\n\u003c!-- Dont remove: end exclude package --\u003e\n\n---\n\n\u003cp align=\"left\"\u003e\n    \u003ca href=\"https://databrickslabs.github.io/dlt-meta/\"\u003e\n        \u003cimg src=\"https://img.shields.io/badge/DOCS-PASSING-green?style=for-the-badge\" alt=\"Documentation Status\"/\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://pypi.org/project/dlt-meta/\"\u003e\n        \u003cimg src=\"https://img.shields.io/badge/PYPI-v%200.0.9-green?style=for-the-badge\" alt=\"Latest Python Release\"/\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/databrickslabs/dlt-meta/actions/workflows/onpush.yml\"\u003e\n        \u003cimg src=\"https://img.shields.io/github/workflow/status/databrickslabs/dlt-meta/build/main?style=for-the-badge\"\n             alt=\"GitHub Workflow Status (branch)\"/\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://codecov.io/gh/databrickslabs/dlt-meta\"\u003e\n        \u003cimg src=\"https://img.shields.io/codecov/c/github/databrickslabs/dlt-meta?style=for-the-badge\u0026amp;token=2CxLj3YBam\"\n             alt=\"codecov\"/\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://pypistats.org/packages/dl-meta\"\u003e\n        \u003cimg src=\"https://img.shields.io/pypi/dm/dlt-meta?style=for-the-badge\" alt=\"downloads\"/\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/PyCQA/flake8\"\u003e\n        \u003cimg src=\"https://img.shields.io/badge/FLAKE8-FLAKE8-lightgrey?style=for-the-badge\"\n             alt=\"We use flake8 for formatting\"/\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\n[![lines of code](https://tokei.rs/b1/github/databrickslabs/dlt-meta)](\u003c[https://codecov.io/github/databrickslabs/dlt-meta](https://github.com/databrickslabs/dlt-meta)\u003e)\n\n---\n\n# Project Overview\n`DLT-META` is a metadata-driven framework designed to work with [Delta Live Tables](https://www.databricks.com/product/delta-live-tables). This framework enables the automation of bronze and silver data pipelines by leveraging metadata recorded in an onboarding JSON file. This file, known as the Dataflowspec, serves as the data flow specification, detailing the source and target metadata required for the pipelines.\n\nIn practice, a single generic DLT pipeline reads the Dataflowspec and uses it to orchestrate and run the necessary data processing workloads. This approach streamlines the development and management of data pipelines, allowing for a more efficient and scalable data processing workflow\n\n### Components:\n\n#### Metadata Interface\n\n- Capture input/output metadata in [onboarding file](https://github.com/databrickslabs/dlt-meta/blob/main/examples/onboarding.template)\n- Capture [Data Quality Rules](https://github.com/databrickslabs/dlt-meta/tree/main/examples/dqe/customers/bronze_data_quality_expectations.json)\n- Capture processing logic as sql in [Silver transformation file](https://github.com/databrickslabs/dlt-meta/blob/main/examples/silver_transformations.json)\n\n#### Generic DLT pipeline\n\n- Apply appropriate readers based on input metadata\n- Apply data quality rules with DLT expectations\n- Apply CDC apply changes if specified in metadata\n- Builds DLT graph based on input/output metadata\n- Launch DLT pipeline\n\n## High-Level Process Flow:\n\n![DLT-META High-Level Process Flow](./docs/static/images/solutions_overview.png)\n\n## Steps\n\n![DLT-META Stages](./docs/static/images/dlt-meta_stages.png)\n\n## DLT-META DLT Features support\n| Features  | DLT-META Support |\n| ------------- | ------------- |\n| Input data sources  | Autoloader, Delta, Eventhub, Kafka, snapshot  |\n| Medallion architecture layers | Bronze, Silver  |\n| Custom transformations | Bronze, Silver layer accepts custom functions|\n| Data Quality Expecations Support | Bronze, Silver layer |\n| Quarantine table support | Bronze layer |\n| [apply_changes](https://docs.databricks.com/en/delta-live-tables/python-ref.html#cdc) API support | Bronze, Silver layer | \n| [apply_changes_from_snapshot](https://docs.databricks.com/en/delta-live-tables/python-ref.html#change-data-capture-from-database-snapshots-with-python-in-delta-live-tables) API support | Bronze layer|\n| [append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#use-append-flow-to-write-to-a-streaming-table-from-multiple-source-streams) API support | Bronze layer|\n| Liquid cluster support | Bronze, Bronze Quarantine, Silver tables|\n| [DLT-META CLI](https://databrickslabs.github.io/dlt-meta/getting_started/dltmeta_cli/) |  ```databricks labs dlt-meta onboard```, ```databricks labs dlt-meta deploy``` |\n| Bronze and Silver pipeline chaining | Deploy dlt-meta pipeline with ```layer=bronze_silver``` option using Direct publishing mode |\n\n## Getting Started\n\nRefer to the [Getting Started](https://databrickslabs.github.io/dlt-meta/getting_started)\n\n### Databricks Labs DLT-META CLI lets you run onboard and deploy in interactive python terminal\n\n#### pre-requisites:\n\n- Python 3.8.0 +\n\n- Databricks CLI v0.213 or later. See [instructions](https://docs.databricks.com/en/dev-tools/cli/tutorial.html)\n\n- Install Databricks CLI on macOS:\n- ![macos_install_databricks](docs/static/images/macos_1_databrickslabsmac_installdatabricks.gif)\n\n- Install Databricks CLI on Windows:\n- ![windows_install_databricks.png](docs/static/images/windows_install_databricks.png)\n\nOnce you install Databricks CLI, authenticate your current machine to a Databricks Workspace:\n\n```commandline\ndatabricks auth login --host WORKSPACE_HOST\n```\n\n    To enable debug logs, simply add `--debug` flag to any command.\n\n### Installing dlt-meta:\n\n- Install dlt-meta via Databricks CLI:\n\n```commandline\n    databricks labs install dlt-meta\n```\n\n### Onboard using dlt-meta CLI:\n\nIf you want to run existing demo files please follow these steps before running onboard command:\n\n```commandline\n    git clone https://github.com/databrickslabs/dlt-meta.git\n```\n\n```commandline\n    cd dlt-meta\n```\n\n```commandline\n    python -m venv .venv\n```\n\n```commandline\n    source .venv/bin/activate\n```\n\n```commandline\n    pip install databricks-sdk\n```\n\n```commandline\n    dlt_meta_home=$(pwd)\n```\n\n```commandline\n    export PYTHONPATH=$dlt_meta_home\n```\n```commandline\n    databricks labs dlt-meta onboard\n```\n![onboardingDLTMeta.gif](docs/static/images/onboardingDLTMeta.gif)\n\n\nAbove commands will prompt you to provide onboarding details. If you have cloned dlt-meta git repo then accept defaults which will launch config from demo folder.\n![onboardingDLTMeta_2.gif](docs/static/images/onboardingDLTMeta_2.gif)\n\n\n- Goto your databricks workspace and located onboarding job under: Workflow-\u003eJobs runs\n\n### depoly using dlt-meta CLI:\n\n- Once onboarding jobs is finished deploy `bronze` and `silver` DLT using below command\n- ```commandline\n     databricks labs dlt-meta deploy\n  ```\n- - Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps\n- - Bronze DLT\n\n![deployingDLTMeta_bronze.gif](docs/static/images/deployingDLTMeta_bronze.gif)\n\n\n- Silver DLT\n- - ```commandline\n       databricks labs dlt-meta deploy\n    ```\n- - Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps\n\n![deployingDLTMeta_silver.gif](docs/static/images/deployingDLTMeta_silver.gif)\n\n\n## More questions\n\nRefer to the [FAQ](https://databrickslabs.github.io/dlt-meta/faq)\nand DLT-META [documentation](https://databrickslabs.github.io/dlt-meta/)\n\n# Project Support\n\nPlease note that all projects released under [`Databricks Labs`](https://www.databricks.com/learn/labs)\nare provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements\n(SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket\nrelating to any issues arising from the use of these projects.\n\nAny issues discovered through the use of this project should be filed as issues on the Github Repo.  \nThey will be reviewed as time permits, but there are no formal SLAs for support.\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabrickslabs%2Fdlt-meta","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatabrickslabs%2Fdlt-meta","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabrickslabs%2Fdlt-meta/lists"}