Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/danielbeach/lakescum
A Python package to help Databricks Unity Catalog users to read and query Delta Lake tables with Polars, DuckDb, or PyArrow.
https://github.com/danielbeach/lakescum
Last synced: 10 days ago
JSON representation
A Python package to help Databricks Unity Catalog users to read and query Delta Lake tables with Polars, DuckDb, or PyArrow.
- Host: GitHub
- URL: https://github.com/danielbeach/lakescum
- Owner: danielbeach
- License: apache-2.0
- Created: 2024-03-25T01:22:50.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-03-25T13:57:08.000Z (8 months ago)
- Last Synced: 2024-10-02T19:24:29.428Z (about 1 month ago)
- Language: Python
- Homepage:
- Size: 346 KB
- Stars: 22
- Watchers: 2
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## LakeScum
A Python pacakge to help Databricks Unity Catalog users to read and query
Delta Lake tables with `Polars`, `DuckDb`, or `PyArrow`.Unity Catalog does not place nice out-of-the-box with many of
these tools using built in features like `polars.read_delta()` for
example.`LakeScum` takes that difficulty away.
### Installation
`LakeScum` can be installed for Python with a simple `pip` command.`pip install lakescum`
### Usage
There are currently the methods to read and query a Unity Catalog `Delta Lake` with ...- `Polars`
- `DuckDb`
- `PyArrow`### Polars
You can query and return a `Polars` Dataframe from a Unity Catalog `Delta Lake` table with
the following method.
`unity_catalog_delta_to_polars()`It takes 2 required parameters, and one optional.
```
spark: str - Spark Session
table_name: str - Unity Catalog table name
sql_filter - Optional SQL WHERE clause filter
```Example ...
```
polars_df = unity_catalog_delta_to_polars(spark,
'production.default.fact_orders',
sql_filter="year = 2024 and month = 3 and day = 10")print(polars_df.head(10))
order_id | product_id | order_date | quantity
1 | 4567 | '2024-03-10' | 5
```##### DuckDb
This method will register a Unity Catalog Delta Table as a `DuckDB` table so you
can query it with `DuckDB`.
`unity_catalog_delta_register_to_duckdb()`It takes 3 required parameters, and one optional.
```
spark: str - Spark Session
unity_table_name: str - Unity Catalog table name
duck_table_name: str - Desired DuckDB table name
sql_filter - Optional SQL WHERE clause filter
```Example ...
```
unity_catalog_delta_register_to_duckdb(spark,
"production.default.fact_orders",
"test",
sql_filter="year = 2024 and month =3 and day = 19")results = duckdb.sql("SELECT * FROM test")
print(results)
order_id | product_id | order_date | quantity
1 | 4567 | '2024-03-10' | 5
```##### PyArrow
This method will return a `PyArrow` Table from a Unity Catalog Delta Table.
`unity_catalog_delta_to_pyarrow()`It takes 2 required parameters, and one optional.
```
pa = unity_catalog_delta_to_pyarrow(spark,
"production.default.fact_orders",
sql_filter="year = 2024 and month =3 and day = 19")print(pa)
order_id | product_id | order_date | quantity
1 | 4567 | '2024-03-10' | 5
```