{"id":18400661,"url":"https://github.com/databricks/delta-live-tables-notebooks","last_synced_at":"2025-04-14T17:00:21.348Z","repository":{"id":37926096,"uuid":"364636474","full_name":"databricks/delta-live-tables-notebooks","owner":"databricks","description":null,"archived":false,"fork":false,"pushed_at":"2025-03-12T19:35:05.000Z","size":69083,"stargazers_count":365,"open_issues_count":0,"forks_count":260,"subscribers_count":22,"default_branch":"main","last_synced_at":"2025-04-07T14:01:55.865Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/databricks.png","metadata":{"files":{"readme":"README.md","changelog":"change-data-capture-example/PipelineSetting.json.sql","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-05-05T16:15:27.000Z","updated_at":"2025-04-07T12:59:18.000Z","dependencies_parsed_at":"2023-12-11T19:30:47.737Z","dependency_job_id":"178beca6-912b-460b-b80c-f39dcc49963d","html_url":"https://github.com/databricks/delta-live-tables-notebooks","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fdelta-live-tables-notebooks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fdelta-live-tables-notebooks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fdelta-live-tables-notebooks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fdelta-live-tables-notebooks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/databricks","download_url":"https://codeload.github.com/databricks/delta-live-tables-notebooks/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248923721,"owners_count":21183951,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T02:35:52.619Z","updated_at":"2025-04-14T17:00:21.320Z","avatar_url":"https://github.com/databricks.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1\u003eDelta Live Tables Example Notebooks\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://databricks.com/wp-content/uploads/2021/10/logo-color-delta-lake-1.svg\" width=\"200\"/\u003e\u003cbr\u003e\n  \u003cstrong\u003eDelta Live Tables\u003c/strong\u003e is a new framework designed to enable customers to successfully declaratively define, deploy, test \u0026 upgrade data pipelines and eliminate operational burdens associated with the management of such pipelines.\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  This repo contains Delta Live Table examples designed to get customers started with\n  building, deploying and running pipelines.\n\u003c/p\u003e\n\n# Getting Started\n\n* Connect your Databricks workspace using the \u003cimg src=\"https://databricks.com/wp-content/uploads/2021/05/repos.png\" width=\"140\" style=\" vertical-align:middle\"/\u003e feature to [this repo](https://github.com/databricks/delta-live-tables-notebooks)\n\n* Choose one of the examples and create your pipeline!\n\n# Examples\n## Wikipedia\nThe Wikipedia clickstream sample is a great way to jump start using Delta Live Tables (DLT).  It is a simple bificating pipeline that creates a table on your JSON data, cleanses the data, and then creates two tables.  \n\n\u003cimg src=\"images/wikipedia-00-pipeline.png\" width=\"500\"/\u003e\n\nThis sample is available for both [SQL](https://github.com/databricks/delta-live-tables-notebooks/blob/main/sql/Wikipedia.sql) and [Python](https://github.com/databricks/delta-live-tables-notebooks/blob/main/python/Wikipedia.py).\n\n\n### Running your pipeline\n\n**1. Create your pipeline using the following parameters**\n\n  * From your Databricks workspace, click **Jobs**, then **Delta Live Tables** and click on **Create Pipeline**\n  * Fill in the **Pipeline Name**, e.g. `Wikipedia`\n  * For the **Notebook Libraries**, fill in the path of the notebook such as `/Repos/michael@databricks.com/delta-live-tables-notebooks/SQL/Wikipedia`\n    \n    \u003cimg src=\"https://databricks.com/wp-content/uploads/2022/04/DLT-Pipeline-UI-1.png\" width=\"400\"/\u003e\n  * To publish your tables, add the `target` parameter to specify which database you want to persist your tables, e.g. `wiki_demo`.\n\n\n**2. Edit your pipeline JSON**\n\n  * Once you have setup your pipeline, click **Edit Settings** near the top, the JSON will look similar to below\n\n    \u003cimg src=\"https://databricks.com/wp-content/uploads/2022/04/DLT-Pipeline-JSON-1.png\" width=\"400\"/\u003e  \n\n\n**3. Click Start**\n\n  * To view the progress of your pipeline, refer to the progress flow near the bottom of the pipeline details UI as noted in the following image. \n\n    \u003cimg src=\"https://raw.githubusercontent.com/databricks/tech-talks/master/images/dlt-wikipedia_wiki-spark-progress.png\" width=\"600\"/\u003e\n\n\n**4. Reviewing the results**\n\n  * Once your pipeline has completed processing, you can review the data by opening up a new Databricks notebook and running the following SQL statements:\n\n    ```\n    %sql\n    -- Review the top referrers to Wikipedia's Apache Spark articles\n    SELECT * FROM wiki_demo.top_spark_referers\n    ```\n\n  * Unsurprisingly, the top referrer is \"Google\" which you can see graphically when you convert your table into an area chart.\n  \n    \u003cimg src=\"https://raw.githubusercontent.com/databricks/tech-talks/master/images/dlt-wikipedia_wiki-spark-area-chart.png\" width=\"700\"/\u003e\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabricks%2Fdelta-live-tables-notebooks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatabricks%2Fdelta-live-tables-notebooks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabricks%2Fdelta-live-tables-notebooks/lists"}