{"id":19538246,"url":"https://github.com/dask-contrib/dask-deltatable","last_synced_at":"2025-10-11T05:08:46.316Z","repository":{"id":46720230,"uuid":"405898839","full_name":"dask-contrib/dask-deltatable","owner":"dask-contrib","description":"A Delta Lake reader for Dask","archived":false,"fork":false,"pushed_at":"2025-07-29T09:51:29.000Z","size":282,"stargazers_count":53,"open_issues_count":18,"forks_count":17,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-10-11T05:08:43.504Z","etag":null,"topics":["dask","dask-dataframes","delta-lake","parquet","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dask-contrib.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2021-09-13T08:50:05.000Z","updated_at":"2025-07-29T09:50:48.000Z","dependencies_parsed_at":"2024-01-03T12:24:03.480Z","dependency_job_id":"4f2b2055-9ca2-4597-8bbd-316426baa330","html_url":"https://github.com/dask-contrib/dask-deltatable","commit_stats":{"total_commits":74,"total_committers":12,"mean_commits":6.166666666666667,"dds":0.5675675675675675,"last_synced_commit":"3ff657b07c770f3eb60c120b10d3e1b6b8617f51"},"previous_names":["dask-contrib/dask-deltatable"],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/dask-contrib/dask-deltatable","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dask-contrib%2Fdask-deltatable","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dask-contrib%2Fdask-deltatable/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dask-contrib%2Fdask-deltatable/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dask-contrib%2Fdask-deltatable/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dask-contrib","download_url":"https://codeload.github.com/dask-contrib/dask-deltatable/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dask-contrib%2Fdask-deltatable/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279006348,"owners_count":26084084,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dask","dask-dataframes","delta-lake","parquet","python"],"created_at":"2024-11-11T02:33:08.710Z","updated_at":"2025-10-11T05:08:46.310Z","avatar_url":"https://github.com/dask-contrib.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Dask-DeltaTable\n\nReading and writing to Delta Lake using Dask engine.\n\n### Installation\n\n`dask-deltatable` is available on PyPI:\n\n```\npip install dask-deltatable\n```\n\nAnd conda-forge:\n\n```\nconda install -c conda-forge dask-deltatable\n```\n\n### Features:\n\n1. Read the parquet files from Delta Lake and parallelize with Dask\n2. Write Dask dataframes to Delta Lake (limited support)\n3. Supports multiple filesystems (s3, azurefs, gcsfs)\n4. Subset of Delta Lake features:\n   - Time Travel\n   - Schema evolution\n   - Parquet filters\n     - row filter\n     - partition filter\n\n### Not supported\n\n1. Writing to Delta Lake is still in development.\n2. `optimize` API to run a bin-packing operation on a Delta Table.\n\n### Reading from Delta Lake\n\n```python\nimport dask_deltatable as ddt\n\n# read delta table\ndf = ddt.read_deltalake(\"delta_path\")\n\n# with specific version\ndf = ddt.read_deltalake(\"delta_path\", version=3)\n\n# with specific datetime\ndf = ddt.read_deltalake(\"delta_path\", datetime=\"2018-12-19T16:39:57-08:00\")\n```\n\n`df` is a Dask DataFrame that you can work with in the same way you normally would. See\n[the Dask DataFrame documentation](https://docs.dask.org/en/stable/dataframe.html) for\navailable operations.\n\n### Accessing remote file systems\n\nTo be able to read from S3, azure, gcsfs, and other remote filesystems,\nyou ensure the credentials are properly configured in environment variables\nor config files. For AWS, you may need `~/.aws/credential`; for gcsfs,\n`GOOGLE_APPLICATION_CREDENTIALS`. Refer to your cloud provider documentation\nto configure these.\n\n```python\nddt.read_deltalake(\"s3://bucket_name/delta_path\", version=3)\n```\n\n### Accessing AWS Glue catalog\n\n`dask-deltatable` can connect to AWS Glue catalog to read the delta table.\nThe method will look for `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`\nenvironment variables, and if those are not available, fall back to\n`~/.aws/credentials`.\n\nExample:\n\n```python\nddt.read_deltalake(catalog=\"glue\", database_name=\"science\", table_name=\"physics\")\n```\n\n### Accessing Unity catalog\n\n`dask-deltatable` can connect to Unity catalog to read the delta table.\nThe method will look for `DATABRICKS_HOST` and `DATABRICKS_TOKEN` environment\nvariables or try to find them as `kwargs` with the same name but lowercase.\n\nExample:\n\n```python\nddt.read_unity_catalog(\n    catalog_name=\"projects\",\n    schema_name=\"science\",\n    table_name=\"physics\"\n)\n```\n\n### Writing to Delta Lake\n\nTo write a Dask dataframe to Delta Lake, use `to_deltalake` method.\n\n```python\nimport dask.dataframe as dd\nimport dask_deltatable as ddt\n\ndf = dd.read_csv(\"s3://bucket_name/data.csv\")\n# do some processing on the dataframe...\nddt.to_deltalake(\"s3://bucket_name/delta_path\", df)\n```\n\nWriting to Delta Lake is still in development, so be aware that some features\nmay not work.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdask-contrib%2Fdask-deltatable","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdask-contrib%2Fdask-deltatable","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdask-contrib%2Fdask-deltatable/lists"}