{"id":17807504,"url":"https://github.com/zero323/dlt","last_synced_at":"2025-03-17T13:31:30.725Z","repository":{"id":63635824,"uuid":"428039032","full_name":"zero323/dlt","owner":"zero323","description":"Mirror of https://gitlab.com/zero323/dlt","archived":false,"fork":false,"pushed_at":"2022-11-25T11:10:15.000Z","size":926,"stargazers_count":7,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-11T00:55:36.691Z","etag":null,"topics":["apache-spark","delta","delta-io","delta-lake","r","rstats","spark","sparkr"],"latest_commit_sha":null,"homepage":"https://dlt.zero323.net/","language":"R","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zero323.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":null,"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":["https://archive.org/donate","https://supporters.eff.org/donate","https://www.msf.org/donate"]}},"created_at":"2021-11-14T20:54:35.000Z","updated_at":"2024-10-06T15:40:36.000Z","dependencies_parsed_at":"2023-01-22T12:15:55.677Z","dependency_job_id":null,"html_url":"https://github.com/zero323/dlt","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zero323%2Fdlt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zero323%2Fdlt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zero323%2Fdlt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zero323%2Fdlt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zero323","download_url":"https://codeload.github.com/zero323/dlt/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243864809,"owners_count":20360360,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-spark","delta","delta-io","delta-lake","r","rstats","spark","sparkr"],"created_at":"2024-10-27T14:22:15.182Z","updated_at":"2025-03-17T13:31:30.189Z","avatar_url":"https://github.com/zero323.png","language":"R","funding_links":["https://archive.org/donate","https://supporters.eff.org/donate","https://www.msf.org/donate"],"categories":[],"sub_categories":[],"readme":"\u003cimg class=\"mt-4\" alt=\"dlt logo\" src=\"man/figures/dlt.png\" width=15% align=\"right\" /\u003e\n\n# dlt ‒ Delta Lake interface for SparkR\n\n\n## Contents\n\n- [Installation](#installation)\n- [Usage](#usage)\n  - [Batch reads and writes](#batch-reads-and-writes)\n  - [Loading DeltaTable objects](#loading-deltatable-objects)\n  - [Streaming reads and writes](#streaming-reads-and-writes)\n  - [Updates](#updates)\n  - [Deletes](#deletes)\n  - [Merging SparkDataFrame with DeltaTable](#merging-sparkdataframe-with-deltatable)\n  - [Time travel](#time-travel)\n  - [Querying Delta log](#querying-delta-log)\n  - [Building DeltaTables](#building-deltatables)\n  - [Maintenance and conversions](#maintenance-and-conversions)\n- [Notes](#notes)\n- [Acknowledgments](#acknowledgments)\n- [Disclaimer](#disclaimer)\n\n\n## Installation\n\nThis package can be installed from the [main git repository](https://gitlab.com/zero323/dlt)\n\n```r\nremotes::install_gitlab(\"zero323/dlt\")\n```\n\nor its [GitHub mirror](https://github.com/zero323/dlt)\n\n```r\nremotes::install_github(\"zero323/dlt\")\n```\n\nand requires following R packages:\n\n- `SparkR (\u003e= 3.3.0)`\n- `magrittr`\n\n\nAdditionally, you'll have to ensure that a compatible Delta Lake jar is available,\nfor example by adding `delta-core` to `spark.jars.packages`:\n\n```\nspark.jars.packages \t\tio.delta:delta-core_2.12:2.1.0\n```\n\n## Usage\n\nThis package provides:\n\n- Readers and writers for Delta format.\n- DeltaTable merge API.\n- Delta table builder API. \n\n### Batch reads and writes\n\n`dlt_read` and `dlt_write` can be used to read and write data in Delta format.\n\n```r\nlibrary(dlt)\n\ntarget %\u003e% \n  printSchema()\n\n# root\n#  |-- id: integer (nullable = true)\n#  |-- key: string (nullable = true)\n#  |-- val: integer (nullable = true)\n#  |-- ind: integer (nullable = true)\n#  |-- category: string (nullable = true)\n#  |-- lat: double (nullable = true)\n#  |-- long: double (nullable = true)\n\ntarget %\u003e% \n  dlt_write(\"/tmp/target\")\n\ndlt_read(\"/tmp/target\") %\u003e% \n  showDF(5)\n\n# +---+---+---+---+--------+-------------------+-------------------+\n# | id|key|val|ind|category|                lat|               long|\n# +---+---+---+---+--------+-------------------+-------------------+\n# |  1|  a|  4| -1|     KBQ| -56.28354165237397|-108.74080670066178|\n# |  2|  a| 10|  1|     ROB| 50.546925463713706|-104.60825988091528|\n# |  3|  a|  7| -1|     SLX|-13.985343240201473|-114.89280310459435|\n# |  4|  a|  5|  1|     ACP| -47.15050248429179|-168.96175763569772|\n# |  5|  b|  3| -1|     EEK|-49.020595396868885|-105.57821027934551|\n# +---+---+---+---+--------+-------------------+-------------------+\n# only showing top 5 rows\n```\n\nThese also come with aliases following SparkR conventions - `read.delta` and `write.delta`.\n\n```r\nsource %\u003e%\n  printSchema()\n\n# root\n#  |-- id: integer (nullable = true)\n#  |-- key: string (nullable = true)\n#  |-- val: integer (nullable = true)\n#  |-- ind: integer (nullable = true)\n#  |-- category: string (nullable = true)\n#  |-- lat: double (nullable = true)\n#  |-- long: double (nullable = true)\n\nsource %\u003e%\n  write.delta(\"/tmp/source\")\n\nread.delta(\"/tmp/source\") %\u003e% \n  showDF(5)\n\n# +---+---+---+---+--------+------------------+-------------------+\n# | id|key|val|ind|category|               lat|               long|\n# +---+---+---+---+--------+------------------+-------------------+\n# |  1|  a|  1|  1|     NTD| 72.72564971353859|  5.116242365911603|\n# |  3|  b|  5|  1|     RSL|-65.03216980956495| -39.52675184234977|\n# |  5|  b|  1| -1|     SYG| 88.00051120575517| 146.06572712771595|\n# | 14|  c|  9| -1|     MYZ| 80.40028186049312|-19.090933883562684|\n# | 16|  d| 10| -1|     DMO|-75.16123954206705| 120.96153359860182|\n# +---+---+---+---+--------+------------------+-------------------+\n```\n\n### Loading DeltaTable objects\n\n`DataTable` objects can be created for file system path:\n\n\n```r\ndlt_for_path(\"/tmp/target/\") %\u003e%\n  dlt_to_df()\n\n# SparkDataFrame[id:int, key:string, val:int, ind:int, category:string, lat:double, long:double]\n```\n\nor for the table name:\n\n```r\nsource %\u003e% \n  saveAsTable(\"source\", source=\"delta\")\n\ndlt_for_name(\"source\")  %\u003e%\n  dlt_to_df()\n\n# SparkDataFrame[id:int, key:string, val:int, ind:int, category:string, lat:double, long:double]\n```\n\n### Streaming reads and writes\n\n`dlt_read_stream` and `dlt_read_stream` can be used for streaming reads and writes respectively.\n\n```r\nquery \u003c- dlt_read_stream(\"/tmp/target\") %\u003e%\n  dlt_write_stream(\n    path = \"/tmp/target-stream\", queryName = \"test\", trigger.once = TRUE,\n    checkpointLocation = \"/tmp/target-stream/_checkpoints/test\"\n  )\n\nawaitTermination(query, 10000)\n# [1] TRUE\n```\n\n### Getting table details\n\n`dlt_detail` can be used to retrieve detailed information about the format, location and \nother important properties of the table.\n\n```r\ndlt_for_path(\"/tmp/target/\") %\u003e%\n  dlt_detail() %\u003e%\n  drop(c(\"id\", \"location\", \"createdAt\", \"lastModified\")) %\u003e%\n  showDF()\n\n# +------+----+-----------+----------------+--------+-----------+----------+----------------+----------------+\n# |format|name|description|partitionColumns|numFiles|sizeInBytes|properties|minReaderVersion|minWriterVersion|\n# +------+----+-----------+----------------+--------+-----------+----------+----------------+----------------+\n# | delta|null|       null|              []|       1|       2290|        {}|               1|               2|\n# +------+----+-----------+----------------+--------+-----------+----------+----------------+----------------+\n```\n\n\n### Updates\n\nUpdates without\n\n```r\ndlt_for_path(\"/tmp/target\") %\u003e%\n  dlt_update(list(ind = \"-ind\")) %\u003e%\n  dlt_show(5)\n\n# +---+---+---+---+--------+-------------------+-------------------+\n# | id|key|val|ind|category|                lat|               long|\n# +---+---+---+---+--------+-------------------+-------------------+\n# |  1|  a|  4|  1|     KBQ| -56.28354165237397|-108.74080670066178|\n# |  2|  a| 10| -1|     ROB| 50.546925463713706|-104.60825988091528|\n# |  3|  a|  7|  1|     SLX|-13.985343240201473|-114.89280310459435|\n# |  4|  a|  5| -1|     ACP| -47.15050248429179|-168.96175763569772|\n# |  5|  b|  3|  1|     EEK|-49.020595396868885|-105.57821027934551|\n# +---+---+---+---+--------+-------------------+-------------------+\n# only showing top 5 rows\n```\n\nand with\n\n```r\ndlt_for_path(\"/tmp/target\") %\u003e%\n  dlt_update(list(\n    lat = lit(39.08067389467202),\n    long = lit(-89.73335678516888)\n  ), column(\"id\") %in% c(2, 4)) %\u003e%\n  dlt_show(5)\n\n# +---+---+---+---+--------+-------------------+-------------------+\n# | id|key|val|ind|category|                lat|               long|\n# +---+---+---+---+--------+-------------------+-------------------+\n# |  1|  a|  4|  1|     KBQ| -56.28354165237397|-108.74080670066178|\n# |  2|  a| 10| -1|     ROB|  39.08067389467202| -89.73335678516888|\n# |  3|  a|  7|  1|     SLX|-13.985343240201473|-114.89280310459435|\n# |  4|  a|  5| -1|     ACP|  39.08067389467202| -89.73335678516888|\n# |  5|  b|  3|  1|     EEK|-49.020595396868885|-105.57821027934551|\n# +---+---+---+---+--------+-------------------+-------------------+\n# only showing top 5 rows\n```\n\ncondition are supported.\n\n### Deletes\n\nDeletes with\n\n```r\ndlt_for_path(\"/tmp/target\") %\u003e%\n  dlt_delete(column(\"category\") %in% c(\"ROB\", \"ACP\")) %\u003e%\n  dlt_show(5)\n\n# +---+---+---+---+--------+-------------------+-------------------+\n# | id|key|val|ind|category|                lat|               long|\n# +---+---+---+---+--------+-------------------+-------------------+\n# |  1|  a|  4|  1|     KBQ| -56.28354165237397|-108.74080670066178|\n# |  3|  a|  7|  1|     SLX|-13.985343240201473|-114.89280310459435|\n# |  5|  b|  3|  1|     EEK|-49.020595396868885|-105.57821027934551|\n# |  6|  b|  2| -1|     SMT|  80.79935231711715| -46.80488987825811|\n# |  7|  b|  9|  1|     LBC|  51.65884342510253|  97.16074059717357|\n# +---+---+---+---+--------+-------------------+-------------------+\n# only showing top 5 rows\n```\n\nand without\n\n```r\ndlt_for_path(\"/tmp/target\") %\u003e%\n  dlt_delete() %\u003e%\n  dlt_to_df() %\u003e%\n  count()\n\n# [1] 0\n```\n\ncondition are supported.\n\n### Merging SparkDataFrame with DeltaTable\n\n`dlt` supports a complete set of `DeltaMergeBuilder` methods (`dlt_when_matched_delete`, `dlt_when_matched_update`, `dlt_when_matched_update_all`, `dlt_when_not_matched_insert`, `dlt_when_not_matched_insert_all`).\n\n```r\ntarget %\u003e%\n  dlt_write(\"/tmp/target\", mode=\"overwrite\")\n\ndlt_for_path(\"/tmp/target\") %\u003e%\n  dlt_alias(\"target\") %\u003e%\n  dlt_merge(alias(source, \"source\"), expr(\"source.id = target.id\")) %\u003e%\n  dlt_when_matched_update_all() %\u003e%\n  dlt_when_not_matched_insert_all() %\u003e%\n  dlt_execute() %\u003e%\n  dlt_to_df() %\u003e%\n  arrange(desc(column(\"id\"))) %\u003e%\n  showDF(10)\n\n# +---+---+---+---+--------+------------------+-------------------+\n# | id|key|val|ind|category|               lat|               long|\n# +---+---+---+---+--------+------------------+-------------------+\n# | 16|  d| 10| -1|     DMO|-75.16123954206705| 120.96153359860182|\n# | 14|  c|  9| -1|     MYZ| 80.40028186049312|-19.090933883562684|\n# | 12|  c| 10|  1|     TBL| 6.229456798173487|  55.28501939959824|\n# | 11|  c|  5| -1|     ZSH| 89.73377700895071|  61.67111137881875|\n# | 10|  c|  1|  1|     GKP| 58.43853528611362| 100.64806896261871|\n# |  9|  c| 10| -1|     LCN| 76.90145746339113| -138.4841349441558|\n# |  8|  b|  8|  1|     BOB| 47.12074530310929| -91.43876885063946|\n# |  7|  b|  9| -1|     LBC| 51.65884342510253|  97.16074059717357|\n# |  6|  b|  2|  1|     SMT| 80.79935231711715| -46.80488987825811|\n# |  5|  b|  1| -1|     SYG| 88.00051120575517| 146.06572712771595|\n#+---+---+---+---+--------+------------------+-------------------+\n# only showing top 10 rows\n```\n\n### Querying Delta log\n\n```r\ndlt_for_path(\"/tmp/target-stream/\") %\u003e%\n  dlt_history() %\u003e%\n  printSchema()\n\n# root\n#  |-- version: long (nullable = true)\n#  |-- timestamp: timestamp (nullable = true)\n#  |-- userId: string (nullable = true)\n#  |-- userName: string (nullable = true)\n#  |-- operation: string (nullable = true)\n#  |-- operationParameters: map (nullable = true)\n#  |    |-- key: string\n#  |    |-- value: string (valueContainsNull = true)\n#  |-- job: struct (nullable = true)\n#  |    |-- jobId: string (nullable = true)\n#  |    |-- jobName: string (nullable = true)\n#  |    |-- runId: string (nullable = true)\n#  |    |-- jobOwnerId: string (nullable = true)\n#  |    |-- triggerType: string (nullable = true)\n#  |-- notebook: struct (nullable = true)\n#  |    |-- notebookId: string (nullable = true)\n#  |-- clusterId: string (nullable = true)\n#  |-- readVersion: long (nullable = true)\n#  |-- isolationLevel: string (nullable = true)\n#  |-- isBlindAppend: boolean (nullable = true)\n#  |-- operationMetrics: map (nullable = true)\n#  |    |-- key: string\n#  |    |-- value: string (valueContainsNull = true)\n#  |-- userMetadata: string (nullable = true)\n\ndlt_for_path(\"/tmp/target-stream/\") %\u003e%\n  dlt_history() %\u003e%\n  select(\"version\", \"operation\", \"operationParameters\") %\u003e%\n  showDF(truncate = FALSE)\n  \n# +-------+----------------+-------------------------------------------------------------------------------------+\n# |version|operation       |operationParameters                                                                  |\n# +-------+----------------+-------------------------------------------------------------------------------------+\n# |0      |STREAMING UPDATE|{outputMode -\u003e Append, queryId -\u003e bb30dba2-3327-44f1-b6b3-b4bb99fdf57b, epochId -\u003e 0}|\n# +-------+----------------+-------------------------------------------------------------------------------------+\n\ndlt_for_path(\"/tmp/target-stream/\") %\u003e%\n  dlt_delete(\"id IN (1, 3, 5)\")\n\ndlt_for_path(\"/tmp/target-stream/\") %\u003e%\n  dlt_history() %\u003e%\n  select(\"version\", \"operation\", \"operationParameters\") %\u003e%\n  showDF(truncate = FALSE)\n\n# +-------+----------------+-------------------------------------------------------------------------------------+\n# |version|operation       |operationParameters                                                                  |\n# +-------+----------------+-------------------------------------------------------------------------------------+\n# |1      |DELETE          |{predicate -\u003e [\"(id IN (1, 3, 5))\"]}                                                 |\n# |0      |STREAMING UPDATE|{outputMode -\u003e Append, queryId -\u003e bb30dba2-3327-44f1-b6b3-b4bb99fdf57b, epochId -\u003e 0}|\n# +-------+----------------+-------------------------------------------------------------------------------------+\n```\n\n### Time travel\n\nTime travel is possible by setting `versionAsOf`\n\n```r\ndlt_read(\"/tmp/target-stream/\") %\u003e%\n  count()\n\n# [1] 9\n\ndlt_read(\"/tmp/target-stream/\", versionAsOf=0) %\u003e%\n  count()\n\n# [1] 12\n\n```\n\nor `timestampAsOf` `options`.\n\n```r\ntimestamps \u003c- dlt_for_path(\"/tmp/target-stream/\") %\u003e%\n  dlt_history() %\u003e%\n  where(column(\"version\") %in% c(0, 1)) %\u003e%\n  arrange(\"version\") %\u003e%\n  select(alias(date_format(column(\"timestamp\") + expr(\"INTERVAL 1 second\"), \"yyyy-MM-dd HH:mm:ss\"), \"timestamp\")) %\u003e%\n  collect()\n  \n  \ndlt_read(\"/tmp/target-stream/\", timestampAsOf = timestamps$timestamp[1]) %\u003e%\n  count()\n\n# [1] 12\n```\n\nIt is also possible to restore data to a specific version\n\n\n```r\ndlt_for_path(\"/tmp/target-stream/\") %\u003e%\n  dlt_restore_to_version(0) %\u003e%\n  showDF()\n  \n# +------------------------+--------------------------+-----------------+------------------+------------------+-------------------+\n# |table_size_after_restore|num_of_files_after_restore|num_removed_files|num_restored_files|removed_files_size|restored_files_size|\n# +------------------------+--------------------------+-----------------+------------------+------------------+-------------------+\n# |                    2295|                         1|                1|                 1|              2175|               2295|\n# + ------------------------+--------------------------+-----------------+------------------+------------------+-------------------+\n\ndlt_read(\"/tmp/target-stream/\") %\u003e%\n  count()\n  \n# [1] 12\n\ndlt_for_path(\"/tmp/target-stream/\") %\u003e%\n  dlt_restore_to_version(1) %\u003e%\n  showDF()\n  \n# +------------------------+--------------------------+-----------------+------------------+------------------+-------------------+\n# |table_size_after_restore|num_of_files_after_restore|num_removed_files|num_restored_files|removed_files_size|restored_files_size|\n# +------------------------+--------------------------+-----------------+------------------+------------------+-------------------+\n# |                    2175|                         1|                1|                 1|              2295|               2175|\n# + ------------------------+--------------------------+-----------------+------------------+------------------+-------------------+  \n  \ndlt_read(\"/tmp/target-stream/\") %\u003e%\n  count()\n  \n# [1] 9\n```\n\nor timestamp\n\n```r\ndlt_for_path(\"/tmp/target-stream/\") %\u003e%\n  dlt_restore_to_timestamp(timestamp = timestamps$timestamp[1]) %\u003e%\n  showDF()\n\n# +------------------------+--------------------------+-----------------+------------------+------------------+-------------------+\n# |table_size_after_restore|num_of_files_after_restore|num_removed_files|num_restored_files|removed_files_size|restored_files_size|\n# +------------------------+--------------------------+-----------------+------------------+------------------+-------------------+\n# |                    2295|                         1|                1|                 1|              2175|               2295|\n# + ------------------------+--------------------------+-----------------+------------------+------------------+-------------------+  \n  \ndlt_read(\"/tmp/target-stream/\") %\u003e%\n  count()\n  \n# [1] 12  \n```\n\n\n### Building DeltaTables\n\nNew tables can be created (`dlt_create`), created if not exists (`dlt_create_if_not_exists`), replaced (`dlt_replace`) and created or replaced (`dlt_create_or_replace`). All `DeltaTableBuilder` methods are supported.\n\n```r\ndlt_create() %\u003e%\n  dlt_location(\"/tmp/key-val\") %\u003e%\n  dlt_add_column(\"id\", \"integer\", nullable = FALSE) %\u003e%\n  dlt_add_columns(structType(\"key string, value double\")) %\u003e%\n  dlt_partitioned_by(\"key\") %\u003e%\n  dlt_comment(\"Key-value table\") %\u003e%\n  dlt_property(\"creation-time\", as.character(Sys.time())) %\u003e%\n  dlt_execute() %\u003e%\n  dlt_to_df() %\u003e%\n  printSchema()\n\n# root\n#  |-- id: integer (nullable = false)\n#  |-- key: string (nullable = true)\n#  |-- value: double (nullable = true)\n```\n\n### Maintenance and conversions\n\nYou can use `dlt` to convert Parquet directories to `DeltaTable` \n\n```r\ntarget %\u003e%\n  write.parquet(\"/tmp/target-parquet\")\n\ndlt_is_delta_table(\"/tmp/target-parquet/\")\n# [1] FALSE\n\ntbl \u003c- dlt_convert_to_delta(\"parquet.`/tmp/target-parquet/`\")\n\ndlt_is_delta_table(\"/tmp/target-parquet/\")\n# [1] TRUE\n\ntbl %\u003e%\n  dlt_show(5)\n\n# +---+---+---+---+--------+-------------------+-------------------+\n# | id|key|val|ind|category|                lat|               long|\n# +---+---+---+---+--------+-------------------+-------------------+\n# |  1|  a|  4| -1|     KBQ| -56.28354165237397|-108.74080670066178|\n# |  2|  a| 10|  1|     ROB| 50.546925463713706|-104.60825988091528|\n# |  3|  a|  7| -1|     SLX|-13.985343240201473|-114.89280310459435|\n# |  4|  a|  5|  1|     ACP| -47.15050248429179|-168.96175763569772|\n# |  5|  b|  3| -1|     EEK|-49.020595396868885|-105.57821027934551|\n# +---+---+---+---+--------+-------------------+-------------------+\n# only showing top 5 rows\n```\n\nvacuum `DeltaTables`:\n\n```r\ndlt_for_path(\"/tmp/target\") %\u003e%\n  dlt_vacuum()\n\n# Deleted 0 files and directories in a total of 1 directories.\n```\n\ngenerate manifests:\n\n```r\ndlt_for_path(\"/tmp/target\") %\u003e%\n  dlt_generate_manifest(\"symlink_format_manifest\")\n```\n\nand upgrade Delta protocols\n\n```r\nkey_val_log_path \u003c- \"/tmp/key-val/_delta_log/*json\"\n\nread.json(key_val_log_path) %\u003e%\n  select(\"metaData.id\", \"commitInfo.operation\", \"protocol\") %\u003e%\n  showDF()\n\n# +--------------------+------------+--------+\n# |                  id|   operation|protocol|\n# +--------------------+------------+--------+\n# |                null|CREATE TABLE|    null|\n# |                null|        null|  {1, 2}|\n# |72d4784b-e656-44a...|        null|    null|\n# +--------------------+------------+--------+\n\ndlt_for_path(\"/tmp/key-val\") %\u003e% \n  dlt_upgrade_table_protocol(1, 3)\n\nread.json(key_val_log_path) %\u003e%\n  select(\"metaData.id\", \"commitInfo.operation\", \"protocol\") %\u003e%\n  showDF()\n\n# +--------------------+----------------+--------+\n# |                  id|       operation|protocol|\n# +--------------------+----------------+--------+\n# |                null|    CREATE TABLE|    null|\n# |                null|            null|  {1, 2}|\n# |72d4784b-e656-44a...|            null|    null|\n# |                null|UPGRADE PROTOCOL|    null|\n# |                null|            null|  {1, 3}|\n# +--------------------+----------------+--------+\n```\n\nThis package also provides an interface for layout optimizations. One can \n[compact](https://docs.delta.io/latest/optimizations-oss.html#compaction-bin-packing) with \n\n```r\ntarget %\u003e%\n  repartition(10) %\u003e%\n  dlt_write(\"/tmp/target-optimize-compact\", mode = \"overwrite\", partitionBy = \"key\")\n\ndlt_for_path(\"/tmp/target-optimize-compact\") %\u003e%\n  dlt_optimize() %\u003e% \n  dlt_where(\"key = 'a'\") %\u003e%\n  dlt_execute_compaction() %\u003e%\n  select(\"metrics\") %\u003e%\n  showDF(truncate = FALSE)\n\n# +---------------------------------------------------------------------------------------------------------------------+\n# |metrics                                                                                                              |\n# +---------------------------------------------------------------------------------------------------------------------+\n# |{1, 4, {1750, 1750, 1750.0, 1, 1750}, {1633, 1633, 1633.0, 4, 6532}, 1, null, 1, 4, 0, false, 0, 0, 1669291200256, 0}|\n# +---------------------------------------------------------------------------------------------------------------------+\n```\n\nor without partition filter.\n\n```r\n\ndlt_for_path(\"/tmp/target-optimize-compact\") %\u003e%\n  dlt_optimize() %\u003e%\n  dlt_execute_compaction() %\u003e%\n  select(\"metrics\") %\u003e%\n  showDF(truncate = FALSE)\n\n# +-----------------------------------------------------------------------------------------------------------------------+\n# |metrics                                                                                                                |\n# +-----------------------------------------------------------------------------------------------------------------------+\n# |{2, 8, {1753, 1780, 1766.5, 2, 3533}, {1632, 1633, 1632.75, 8, 13062}, 2, null, 2, 9, 1, false, 0, 0, 1669291227707, 0}|\n# +-----------------------------------------------------------------------------------------------------------------------+\n```\n\nSimilarly, one can [Z-Order](https://docs.delta.io/latest/optimizations-oss.html#z-ordering-multi-dimensional-clustering) \nthe files with\n\n```r\ntarget %\u003e%\n  repartition(10) %\u003e%        \n  dlt_write(\"/tmp/target-optimize-zorderby\", mode=\"overwrite\", partitionBy=\"key\")\n\ndlt_for_path(\"/tmp/target-optimize-compact\") %\u003e%\n  dlt_optimize() %\u003e% \n  dlt_where(\"key = 'a'\") %\u003e%\n  dlt_execute_z_order_by(\"lat\", \"long\") %\u003e%\n  select(\"metrics\") %\u003e%\n  showDF(truncate = FALSE)\n\n# +----------------------------------------------------------------------------------------------------------------------------------------------------------------+\n# |metrics                                                                                                                                                         |\n# +----------------------------------------------------------------------------------------------------------------------------------------------------------------+\n# |{1, 1, {1750, 1750, 1750.0, 1, 1750}, {1750, 1750, 1750.0, 1, 1750}, 1, {all, {0, 0}, {1, 1750}, 0, {1, 1750}, 1, null}, 1, 1, 0, false, 0, 0, 1669291309308, 0}|\n# +----------------------------------------------------------------------------------------------------------------------------------------------------------------+\n```\n\nor without partition filter\n\n```r\ndlt_for_path(\"/tmp/target-optimize-compact\") %\u003e%\n  dlt_optimize() %\u003e%\n  dlt_execute_z_order_by(\"lat\", \"long\") %\u003e%\n  select(\"metrics\") %\u003e%\n  showDF(truncate = FALSE)\n\n# +----------------------------------------------------------------------------------------------------------------------------------------------------------------+\n# |metrics                                                                                                                                                         |\n# +----------------------------------------------------------------------------------------------------------------------------------------------------------------+\n# |{3, 3, {1750, 1780, 1761.0, 3, 5283}, {1750, 1780, 1761.0, 3, 5283}, 3, {all, {0, 0}, {3, 5283}, 0, {3, 5283}, 3, null}, 3, 3, 0, false, 0, 0, 1669291365663, 0}|\n# +----------------------------------------------------------------------------------------------------------------------------------------------------------------+\n```\n\n## Notes\n\nExamples use `source` and `target` datasets as described in `tests/testthat/data/README.md`.\n\n## Acknowledgments \n\nLogo based on [Yellow wasp, m, left, Kruger National Park, South Africa](https://flickr.com/photos/54563451@N08/45531028154) \nby [USGS Bee Inventory and Monitoring Lab](https://www.flickr.com/photos/usgsbiml/).\n\n## Disclaimer\n\nDelta is a trademark of the LF Projects LLC. This project is not owned, endorsed or sponsored by the LF Projects LLC.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzero323%2Fdlt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzero323%2Fdlt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzero323%2Fdlt/lists"}